How to build a surrogate decision tree RAG system with intelligent query routing, self-checking, and iterative refinement?

by admin · October 27, 2025

In this tutorial, we build an advanced agent retrieval augmented generation (RAG) system that goes beyond simple question answering. We designed it to intelligently route queries to the correct knowledge source, perform self-checks to assess answer quality, and iteratively refine responses to improve accuracy. We use open source tools such as FAISS, SentenceTransformers and Flan-T5 to implement the entire system. As we progress, we explore how routing, retrieval, generation, and self-evaluation can be combined to form a decision tree-style RAG pipeline that mimics real-world agent reasoning. Check The complete code is here.

print("🔧 Setting up dependencies...")
import subprocess
import sys
def install_packages():
   packages = ['sentence-transformers', 'transformers', 'torch', 'faiss-cpu', 'numpy', 'accelerate']
   for package in packages:
       print(f"Installing {package}...")
       subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])
try:
   import faiss
except ImportError:
   install_packages()
   print("✓ All dependencies installed! Importing modules...n")
import torch
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import faiss
from typing import List, Dict, Tuple
import warnings
warnings.filterwarnings('ignore')
print("✓ All modules loaded successfully!n")

We first install all necessary dependencies, including Transformers, FAISS, and SentenceTransformers, to ensure smooth local execution. We verify the installation and install essential modules for embedding, retrieval, and generation, such as NumPy, PyTorch, and FAISS. Before continuing with the main pipeline, we confirm that all libraries have been loaded successfully. Check The complete code is here.

class VectorStore:
   def __init__(self, embedding_model="all-MiniLM-L6-v2"):
       print(f"Loading embedding model: {embedding_model}...")
       self.embedder = SentenceTransformer(embedding_model)
       self.documents = []
       self.index = None
   def add_documents(self, docs: List[str], sources: List[str]):
       self.documents = [{"text": doc, "source": src} for doc, src in zip(docs, sources)]
       embeddings = self.embedder.encode(docs, show_progress_bar=False)
       dimension = embeddings.shape[1]
       self.index = faiss.IndexFlatL2(dimension)
       self.index.add(embeddings.astype('float32'))
       print(f"✓ Indexed {len(docs)} documentsn")
   def search(self, query: str, k: int = 3) -> List[Dict]:
       query_vec = self.embedder.encode([query]).astype('float32')
       distances, indices = self.index.search(query_vec, k)
       return [self.documents[i] for i in indices[0]]

We designed the VectorStore class to efficiently store and retrieve documents using FAISS-based similarity search. We use a transformer model to embed each document and build an index for fast retrieval. This allows us to quickly obtain the most relevant context for any incoming query. Check The complete code is here.

class QueryRouter:
   def __init__(self):
       self.categories = {
           'technical': ['how', 'implement', 'code', 'function', 'algorithm', 'debug'],
           'factual': ['what', 'who', 'when', 'where', 'define', 'explain'],
           'comparative': ['compare', 'difference', 'versus', 'vs', 'better', 'which'],
           'procedural': ['steps', 'process', 'guide', 'tutorial', 'how to']
       }
   def route(self, query: str) -> str:
       query_lower = query.lower()
       scores = {}
       for category, keywords in self.categories.items():
           score = sum(1 for kw in keywords if kw in query_lower)
           scoresAgentic AI = score
       best_category = max(scores, key=scores.get)
       return best_category if scores[best_category] > 0 else 'factual'

We introduce the QueryRouter class to classify queries by intent, technology, fact, comparison, or procedure. We use keyword matching to determine which category best fits the input question. This routing step ensures that the retrieval strategy dynamically adapts to different query styles. Check The complete code is here.

class AnswerGenerator:
   def __init__(self, model_name="google/flan-t5-base"):
       print(f"Loading generation model: {model_name}...")
       self.generator = pipeline('text2text-generation', model=model_name, device=0 if torch.cuda.is_available() else -1, max_length=256)
       device_type = "GPU" if torch.cuda.is_available() else "CPU"
       print(f"✓ Generator ready (using {device_type})n")
   def generate(self, query: str, context: List[Dict], query_type: str) -> str:
       context_text = "nn".join([f"[{doc['source']}]: {doc['text']}" for doc in context])
      
Context:
{context_text}


Question: {query}


Answer:"""
       answer = self.generator(prompt, max_length=200, do_sample=False)[0]['generated_text']
       return answer.strip()
   def self_check(self, query: str, answer: str, context: List[Dict]) -> Tuple[bool, str]:
       if len(answer)

We built the AnswerGenerator class to handle answer creation and self-evaluation. Using the Flan-T5 model, we generate textual responses based on retrieved documents. We then perform a self-check to evaluate the length, context, and relevance of the answers to ensure our output is meaningful and accurate. Check The complete code is here.

class AgenticRAG:
   def __init__(self):
       self.vector_store = VectorStore()
       self.router = QueryRouter()
       self.generator = AnswerGenerator()
       self.max_iterations = 2
   def add_knowledge(self, documents: List[str], sources: List[str]):
       self.vector_store.add_documents(documents, sources)
   def query(self, question: str, verbose: bool = True) -> Dict:
       if verbose:
           print(f"n{'='*60}")
           print(f"🤔 Query: {question}")
           print(f"{'='*60}")
       query_type = self.router.route(question)
       if verbose:
           print(f"📍 Route: {query_type.upper()} query detected")
       k_docs = {'technical': 2, 'comparative': 4, 'procedural': 3}.get(query_type, 3)
       iteration = 0
       answer_accepted = False
       while iteration

We combine all components into the AgenticRAG system, which coordinates routing, retrieval, production and quality checking. The system iteratively refines its answers based on self-assessment feedback, adjusting the query or expanding context when necessary. This creates a feedback-driven decision tree RAG that automatically improves performance. Check The complete code is here.

def main():
   print("n" + "="*60)
   print("🚀 AGENTIC RAG WITH ROUTING & SELF-CHECK")
   print("="*60 + "n")
   documents = [
       "RAG (Retrieval-Augmented Generation) combines information retrieval with text generation. It retrieves relevant documents and uses them as context for generating accurate answers."
   ]
   sources = ["Python Documentation", "ML Textbook", "Neural Networks Guide", "Deep Learning Paper", "Transformer Architecture", "RAG Research Paper"]
   rag = AgenticRAG()
   rag.add_knowledge(documents, sources)
   test_queries = ["What is Python?", "How does machine learning work?", "Compare neural networks and deep learning"]
   for query in test_queries:
       result = rag.query(query, verbose=True)
       print(f"n{'='*60}")
       print(f"📊 FINAL RESULT:")
       print(f"   Answer: {result['answer']}")
       print(f"   Query Type: {result['query_type']}")
       print(f"   Iterations: {result['iterations']}")
       print(f"   Accepted: {result['accepted']}")
       print(f"{'='*60}n")
if __name__ == "__main__":
   main()

We complete the demonstration by loading a small knowledge base and running a test query through the Agentic RAG pipeline. We observe how the model incrementally routes, retrieves, and refines answers, and prints intermediate results for greater transparency. Finally, we confirm that our system successfully provides accurate, self-verifying answers using only local computation.

In summary, we created a fully functional Agentic RAG framework that can autonomously retrieve, reason, and refine its answers. We witnessed how the system dynamically routes different query types, evaluates its own responses, and improves them through iterative feedback, all within a lightweight local environment. Through this exercise, we deepened our understanding of the RAG architecture and experienced how the agent component transforms a static retrieval system into a self-improving intelligent agent.

Check The complete code is here. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an AI media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.