AI

Guide to coding a modular and self-corrected quality quality system using DSPY

In this tutorial, we explore how to use this system to build an intelligent and self-corrected questioning system DSPY Framework, integrated with Google’s Gemini 1.5 Flash model. We first define a structured signature that clearly outlines the behavior of input and output, and DSPY is used as the basis for building reliable pipelines. Through DSPY’s declarative programming approach, we build composable modules such as AdvancedQa and Simpleerag to answer questions using context and retrieval effect generation. By combining the modularity of DSPY with the powerful reasoning of Gemini, we have created an AI system that provides accurate, step-by-step answers. As we progress, we also utilize DSPY’s optimization tools such as BootstrapFewShot to automatically improve performance based on training examples.

!pip install dspy-ai google-generativeai


import dspy
import google.generativeai as genai
import random
from typing import List, Optional


GOOGLE_API_KEY = "Use Your Own API Key"  
genai.configure(api_key=GOOGLE_API_KEY)


dspy.configure(lm=dspy.LM(model="gemini/gemini-1.5-flash", api_key=GOOGLE_API_KEY))

We first installed the required libraries for DSPY and Google-generativeai for declarative AI pipelines to access Google’s Gemini models. After importing the necessary modules, we use the API key to configure Gemini. Finally, we set up DSPY to use the Gemini 1.5 Flash model as our language model backend.

class QuestionAnswering(dspy.Signature):
    """Answer questions based on given context with reasoning."""
    context: str = dspy.InputField(desc="Relevant context information")
    question: str = dspy.InputField(desc="Question to answer")
    reasoning: str = dspy.OutputField(desc="Step-by-step reasoning")
    answer: str = dspy.OutputField(desc="Final answer")


class FactualityCheck(dspy.Signature):
    """Verify if an answer is factually correct given context."""
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.InputField()
    is_correct: bool = dspy.OutputField(desc="True if answer is factually correct")

We define two DSPY signatures to build the input and output of the system. First, QuestionAnswering expects a context and question that can both return inference and final answers, allowing the model to explain its thinking process. Next, FactualityCheck aims to help us build a self-corrected quality check system by returning simple booleans.

class AdvancedQA(dspy.Module):
    def __init__(self, max_retries: int = 2):
        super().__init__()
        self.max_retries = max_retries
        self.qa_predictor = dspy.ChainOfThought(QuestionAnswering)
        self.fact_checker = dspy.Predict(FactualityCheck)
       
    def forward(self, context: str, question: str) -> dspy.Prediction:
        prediction = self.qa_predictor(context=context, question=question)
       
        for attempt in range(self.max_retries):
            fact_check = self.fact_checker(
                context=context,
                question=question,
                answer=prediction.answer
            )
           
            if fact_check.is_correct:
                break
               
            refined_context = f"{context}nnPrevious incorrect answer: {prediction.answer}nPlease provide a more accurate answer."
            prediction = self.qa_predictor(context=refined_context, question=question)
       
        return prediction

We have created an advanced QA module to add self-correction capabilities to the quality quality system. It first uses imagined predictors to generate answers through reasoning. It then uses fact check predictors to check fact accuracy. If the answer is incorrect, we will refine the context and then try again to the specified number of times to ensure a more reliable output.

class SimpleRAG(dspy.Module):
    def __init__(self, knowledge_base: List[str]):
        super().__init__()
        self.knowledge_base = knowledge_base
        self.qa_system = AdvancedQA()
       
    def retrieve(self, question: str, top_k: int = 2) -> str:
        # Simple keyword-based retrieval (in practice, use vector embeddings)
        scored_docs = []
        question_words = set(question.lower().split())
       
        for doc in self.knowledge_base:
            doc_words = set(doc.lower().split())
            score = len(question_words.intersection(doc_words))
            scored_docs.append((score, doc))
       
        # Return top-k most relevant documents
        scored_docs.sort(reverse=True)
        return "nn".join([doc for _, doc in scored_docs[:top_k]])
   
    def forward(self, question: str) -> dspy.Prediction:
        context = self.retrieve(question)
        return self.qa_system(context=context, question=question)

We built a simple module to simulate the generation of the retrieval demonstration using DSPY simulation. We provide a knowledge base and implement a basic keyword-based searcher to obtain the most relevant documentation for a given problem. These documents are the background of advanced QA modules, which then perform inference and self-correction to produce accurate answers.

knowledge_base = [
    “Use Your Context and Knowledge Base Here”
]


training_examples = [
    dspy.Example(
        question="What is the height of the Eiffel Tower?",
        context="The Eiffel Tower is located in Paris, France. It was constructed from 1887 to 1889 and stands 330 meters tall including antennas.",
        answer="330 meters"
    ).with_inputs("question", "context"),
   
    dspy.Example(
        question="Who created Python programming language?",
        context="Python is a high-level programming language created by Guido van Rossum. It was first released in 1991 and emphasizes code readability.",
        answer="Guido van Rossum"
    ).with_inputs("question", "context"),
   
    dspy.Example(
        question="What is machine learning?",
        context="ML focuses on algorithms that can learn from data without being explicitly programmed.",
        answer="Machine learning focuses on algorithms that learn from data without explicit programming."
    ).with_inputs("question", "context")
]

We define a small knowledge base that contains various facts on a variety of topics, including history, programming, and science. This is the context source we searched. Next to it, we prepared a set of training examples to guide the optimization process of DSPY. Each example includes a question, its relevant context, and the correct answers to help our system learn how to respond more accurately.

def accuracy_metric(example, prediction, trace=None):
    """Simple accuracy metric for evaluation"""
    return example.answer.lower() in prediction.answer.lower()


print("🚀 Initializing DSPy QA System with Gemini...")
print("📝 Note: Using Google's Gemini 1.5 Flash (free tier)")
rag_system = SimpleRAG(knowledge_base)


basic_qa = dspy.ChainOfThought(QuestionAnswering)


print("n📊 Before Optimization:")
test_question = "What is the height of the Eiffel Tower?"
test_context = knowledge_base[0]
initial_prediction = basic_qa(context=test_context, question=test_question)
print(f"Q: {test_question}")
print(f"A: {initial_prediction.answer}")
print(f"Reasoning: {initial_prediction.reasoning}")


print("n🔧 Optimizing with BootstrapFewShot...")
optimizer = dspy.BootstrapFewShot(metric=accuracy_metric, max_bootstrapped_demos=2)
optimized_qa = optimizer.compile(basic_qa, trainset=training_examples)


print("n📈 After Optimization:")
optimized_prediction = optimized_qa(context=test_context, question=test_question)
print(f"Q: {test_question}")
print(f"A: {optimized_prediction.answer}")
print(f"Reasoning: {optimized_prediction.reasoning}")

We first define a simple precision metric to check if the predicted answer contains the correct response. After initializing the SimpleRag system and the baseline chain surface quality check module, we test it on the sample problem before any optimization. Then, using DSPY’s BootstrapFewshot optimizer, we fine-tuned the QA system through training examples. This enables the model to automatically generate more efficient prompts, thereby improving accuracy, which we verified by comparing the responses before and after optimization.

def evaluate_system(qa_module, test_cases):
    """Evaluate QA system performance"""
    correct = 0
    total = len(test_cases)
   
    for example in test_cases:
        prediction = qa_module(context=example.context, question=example.question)
        if accuracy_metric(example, prediction):
            correct += 1
   
    return correct / total


print(f"n📊 Evaluation Results:")
print(f"Basic QA Accuracy: {evaluate_system(basic_qa, training_examples):.2%}")
print(f"Optimized QA Accuracy: {evaluate_system(optimized_qa, training_examples):.2%}")


print("n✅ Tutorial Complete! Key DSPy Concepts Demonstrated:")
print("1. 🔤 Signatures - Defined input/output schemas")
print("2. 🏗️  Modules - Built composable QA systems")
print("3. 🔄 Self-correction - Implemented iterative improvement")
print("4. 🔍 RAG - Created retrieval-augmented generation")
print("5. ⚡ Optimization - Used BootstrapFewShot to improve prompts")
print("6. 📊 Evaluation - Measured system performance")
print("7. 🆓 Free API - Powered by Google Gemini 1.5 Flash")

We run an advanced RAG demo by asking multiple questions in different domains. For each question, the Simpleerag system retrieves the most relevant context and then uses a self-corrected advanced QA module to generate good answers. We print answers and previews of reasoning, showing how DSPY combines retrieval and thoughtful generation to provide reliable replies.

In short, we successfully demonstrated DSPY’s full potential in building advanced quality inspection pipelines. We see how DSPY simplifies the design of smart modules with clear interfaces, supports self-correction loops, integrates basic retrieval, and enables a small amount of timely and timely optimization with minimal code. With only a few lines, we configure and evaluate our model using real-world examples to measure performance improvements. This hands-on experience shows how DSPY can enable us to quickly prototype, test and scale complex language applications without boilerplate or complex logic when used with Google’s Gemini API.


Check Code. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter,,,,, Youtube and Spotify And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button