AI

Use the on-stage API and Langchain to create a grounding verification tool

riseThe basic inspection service provides a powerful API for verifying that AI-generated responses are securely fixed in reliable source materials. By pairing the context – answer – to the ascending endpoint, we can immediately determine whether the provided context supports the given answer and gain a confidence assessment of that basis. In this tutorial, we demonstrate how to leverage the core capabilities of the superior, including single-time verification, batch processing, and multi-domain testing to ensure that our AI system produces factual and trustworthy content in different topic areas.

!pip install -qU langchain-core langchain-upstage


import os
import json
from typing import List, Dict, Any
from langchain_upstage import UpstageGroundednessCheck


os.environ["UPSTAGE_API_KEY"] = "Use Your API Key Here"

We installed the latest Langchain Core and Upstage Integration packages, imported the necessary Python modules for data processing and typing, and set up our superior API keys in the environment to verify all subsequent ground check requests.

class AdvancedGroundednessChecker:
    """Advanced wrapper for Upstage Groundedness Check with batch processing and analysis"""
   
    def __init__(self):
        self.checker = UpstageGroundednessCheck()
        self.results = []
   
    def check_single(self, context: str, answer: str) -> Dict[str, Any]:
        """Check groundedness for a single context-answer pair"""
        request = {"context": context, "answer": answer}
        response = self.checker.invoke(request)
       
        result = {
            "context": context,
            "answer": answer,
            "grounded": response,
            "confidence": self._extract_confidence(response)
        }
        self.results.append(result)
        return result
   
    def batch_check(self, test_cases: List[Dict[str, str]]) -> List[Dict[str, Any]]:
        """Process multiple test cases"""
        batch_results = []
        for case in test_cases:
            result = self.check_single(case["context"], case["answer"])
            batch_results.append(result)
        return batch_results
   
    def _extract_confidence(self, response) -> str:
        """Extract confidence level from response"""
        if hasattr(response, 'lower'):
            if 'grounded' in response.lower():
                return 'high'
            elif 'not grounded' in response.lower():
                return 'low'
        return 'medium'
   
    def analyze_results(self) -> Dict[str, Any]:
        """Analyze batch results"""
        total = len(self.results)
        grounded = sum(1 for r in self.results if 'grounded' in str(r['grounded']).lower())
       
        return {
            "total_checks": total,
            "grounded_count": grounded,
            "not_grounded_count": total - grounded,
            "accuracy_rate": grounded / total if total > 0 else 0
        }


checker = AdvancedGroundednessChecker()

The Advanced GroundedChecker class wraps Upstage’s grounding API into a simple reusable interface, allowing us to run single and batch contexts simultaneously – checking while accumulating results. It also includes an assistant method to extract confidence tags from each response and calculate overall accuracy statistics in all checks.

print("=== Test Case 1: Height Discrepancy ===")
result1 = checker.check_single(
    context="Mauna Kea is an inactive volcano on the island of Hawai'i.",
    answer="Mauna Kea is 5,207.3 meters tall."
)
print(f"Result: {result1['grounded']}")


print("n=== Test Case 2: Correct Information ===")
result2 = checker.check_single(
    context="Python is a high-level programming language created by Guido van Rossum in 1991. It emphasizes code readability and simplicity.",
    answer="Python was made by Guido van Rossum & focuses on code readability."
)
print(f"Result: {result2['grounded']}")


print("n=== Test Case 3: Partial Information ===")
result3 = checker.check_single(
    context="The Great Wall of China is approximately 13,000 miles long and took over 2,000 years to build.",
    answer="The Great Wall of China is very long."
)
print(f"Result: {result3['grounded']}")


print("n=== Test Case 4: Contradictory Information ===")
result4 = checker.check_single(
    context="Water boils at 100 degrees Celsius at sea level atmospheric pressure.",
    answer="Water boils at 90 degrees Celsius at sea level."
)
print(f"Result: {result4['grounded']}")

We performed four independent ground checks using Advanced Groundednesschecker, covering highly factual errors, correct descriptions, fuzzy partial matches and contradictory claims. It prints each on-top result to illustrate how the service mark is grounded in these different situations with ungrounded answers.

print("n=== Batch Processing Example ===")
test_cases = [
    {
        "context": "Shakespeare wrote Romeo and Juliet in the late 16th century.",
        "answer": "Romeo and Juliet was written by Shakespeare."
    },
    {
        "context": "The speed of light is approximately 299,792,458 meters per second.",
        "answer": "Light travels at about 300,000 kilometers per second."
    },
    {
        "context": "Earth has one natural satellite called the Moon.",
        "answer": "Earth has two moons."
    }
]


batch_results = checker.batch_check(test_cases)
for i, result in enumerate(batch_results, 1):
    print(f"Batch Test {i}: {result['grounded']}")


print("n=== Results Analysis ===")
analysis = checker.analyze_results()
print(f"Total checks performed: {analysis['total_checks']}")
print(f"Grounded responses: {analysis['grounded_count']}")
print(f"Not grounded responses: {analysis['not_grounded_count']}")
print(f"Groundedness rate: {analysis['accuracy_rate']:.2%}")


print("n=== Multi-domain Testing ===")
domains = {
    "Science": {
        "context": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, & water into glucose and oxygen.",
        "answer": "Plants use photosynthesis to make food from sunlight and CO2."
    },
    "History": {
        "context": "World War II ended in 1945 after the surrender of Japan following the atomic bombings.",
        "answer": "WWII ended in 1944 with Germany's surrender."
    },
    "Geography": {
        "context": "Mount Everest is the highest mountain on Earth, located in the Himalayas at 8,848.86 meters.",
        "answer": "Mount Everest is the tallest mountain and is located in the Himalayas."
    }
}


for domain, test_case in domains.items():
    result = checker.check_single(test_case["context"], test_case["answer"])
    print(f"{domain}: {result['grounded']}")

We performed a series of batch grounding checks on the predefined test cases, printed individual on-stage judgments, and then calculated and displayed overall accuracy metrics. It also has multi-domain validation in science, history and geography to illustrate how ascension in different subject areas is handled in roots.

def create_test_report(checker_instance):
    """Generate a detailed test report"""
    report = {
        "summary": checker_instance.analyze_results(),
        "detailed_results": checker_instance.results,
        "recommendations": []
    }
   
    accuracy = report["summary"]["accuracy_rate"]
    if accuracy  0.9:
        report["recommendations"].append("High accuracy - system performing well")
   
    return report


print("n=== Final Test Report ===")
report = create_test_report(checker)
print(f"Overall Performance: {report['summary']['accuracy_rate']:.2%}")
print("Recommendations:", report["recommendations"])


print("n=== Tutorial Complete ===")
print("This tutorial demonstrated:")
print("• Basic groundedness checking")
print("• Batch processing capabilities")
print("• Multi-domain testing")
print("• Results analysis and reporting")
print("• Advanced wrapper implementation")

Finally, we define a create_test_report assistant that compiles all accumulated ground checks into a summary report with overall accuracy and tailored suggestions, and then prints out the final performance metrics and a review of the main demonstration of the tutorial.

All in all, with Upstage’s rooted inspection, we obtain scalable, under-domain solutions for real-time fact verification and confidence scores. Whether we are validating isolated claims or handling large amounts of responses, the upward phase can provide clear, rooted judgment and confidence indicators that allow us to monitor accuracy rates and produce viable quality reports. By integrating this service into our workflow, we can improve the reliability of outputs generated by AI and maintain strict standards of fact integrity across all applications.


Check Code. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.


Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button