Build an encoding implementation of advanced Web Intelligence Agent with Tavily and Gemini AI

by admin · June 4, 2025

In this tutorial, we introduce advanced interactive web intelligent proxy powered by Tavily and Google’s Gemini AI. We will learn how to configure and use this smart proxy to seamlessly extract structured content from web pages, perform complex AI-driven analytics, and propose insightful results. With user-friendly, interactive prompts, reliable error handling, and a visually appealing terminal interface, the tool provides an intuitive and powerful environment for exploring web content extraction and AI-based content analysis.

import os
import json
import asyncio
from typing import List, Dict, Any
from dataclasses import dataclass
from rich.console import Console
from rich.progress import track
from rich.panel import Panel
from rich.markdown import Markdown

We will import and set up the necessary libraries for handling data structures, asynchronous programming, and typing comments, as well as a rich library that enables visually appealing terminal output. Together, these modules promote the efficient, structured and interactive execution of Web Intelligence tasks in notebooks.

from langchain_tavily import TavilyExtract
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent

We initialized the required Langchain components: TavilyExtract enables advanced web content retrieval, init_chat_model sets up the Gemini AI-Power chat model, and Create_reaCt_Agent builds a dynamic, inference-based proxy that can have intelligent decision-making in web analytics tasks. Together, these tools form the core engine of complex AI-driven intelligent web workflows.

@dataclass
class WebIntelligence:
    """Web Intelligence Configuration"""
    tavily_key: str = os.getenv("TAVILY_API_KEY", "")
    google_key: str = os.getenv("GOOGLE_API_KEY", "")
    extract_depth: str = "advanced"
    max_urls: int = 10

Check out the notebook here

WebIntelligence Dataclass is used as a structured configuration container, holding API keys for Tavily and Google Gemini, and setting extract parameters (such as Extract_depth) and maximum number of URLs (max_urls). It simplifies management and access to critical settings, ensuring seamless integration and customization of web content extraction tasks in intelligence agents.

@dataclass
class WebIntelligence:
    """Web Intelligence Configuration"""
    tavily_key: str = os.getenv("TAVILY_API_KEY", "")
    google_key: str = os.getenv("GOOGLE_API_KEY", "")
    extract_depth: str = "advanced"
    max_urls: int = 10
The WebIntelligence dataclass serves as a structured configuration container, holding API keys for Tavily and Google Gemini, and setting extraction parameters like extract_depth and the maximum number of URLs (max_urls). It simplifies the management and access of crucial settings, ensuring seamless integration and customization of web content extraction tasks within the intelligence agent.

class SmartWebAgent:
    """Intelligent Web Content Extraction & Analysis Agent"""
   
    def __init__(self, config: WebIntelligence):
        self.config = config
        self.console = Console()
        self._setup_environment()
        self._initialize_tools()
   
    def _setup_environment(self):
        """Setup API keys with interactive prompts"""
        if not self.config.tavily_key:
            self.config.tavily_key = input("🔑 Enter Tavily API Key: ")
            os.environ["TAVILY_API_KEY"] = self.config.tavily_key
           
        if not self.config.google_key:
            self.config.google_key = input("🔑 Enter Google Gemini API Key: ")
            os.environ["GOOGLE_API_KEY"] = self.config.google_key
   
    def _initialize_tools(self):
        """Initialize AI tools and agents"""
        self.console.print("🛠️  Initializing AI Tools...", style="bold blue")
       
        try:
            self.extractor = TavilyExtract(
                extract_depth=self.config.extract_depth,
                include_images=False,  
                include_raw_content=False,
                max_results=3
            )
           
            self.llm = init_chat_model(
                "gemini-2.0-flash",
                model_provider="google_genai",
                temperature=0.3,
                max_tokens=1024
            )
           
            test_response = self.llm.invoke("Say 'AI tools initialized successfully!'")
            self.console.print(f"✅ LLM Test: {test_response.content}", style="green")
           
            self.agent = create_react_agent(self.llm, [self.extractor])
           
            self.console.print("✅ AI Agent Ready!", style="bold green")
           
        except Exception as e:
            self.console.print(f"❌ Initialization Error: {e}", style="bold red")
            self.console.print("💡 Check your API keys and internet connection", style="yellow")
            raise
   
    def extract_content(self, urls: List[str]) -> Dict[str, Any]:
        """Extract and structure content from URLs"""
        results = {}
       
        for url in track(urls, description="🌐 Extracting content..."):
            try:
                response = self.extractor.invoke({"urls": [url]})
                content = json.loads(response.content) if isinstance(response.content, str) else response.content
                results[url] = {
                    "status": "success",
                    "data": content,
                    "summary": content.get("summary", "No summary available")[:200] + "..."
                }
            except Exception as e:
                results[url] = {"status": "error", "error": str(e)}
       
        return results
   
    def analyze_with_ai(self, query: str, urls: List[str] = None) -> str:
        """Intelligent analysis using AI agent"""
        try:
            if urls:
                message = f"Use the tavily_extract tool to analyze these URLs and answer: {query}nURLs: {urls}"
            else:
                message = query
               
            self.console.print(f"🤖 AI Analysis: {query}", style="bold magenta")
           
            messages = [{"role": "user", "content": message}]
           
            all_content = []
            with self.console.status("🔄 AI thinking..."):
                try:
                    for step in self.agent.stream({"messages": messages}, stream_mode="values"):
                        if "messages" in step and step["messages"]:
                            for msg in step["messages"]:
                                if hasattr(msg, 'content') and msg.content and msg.content not in all_content:
                                    all_content.append(str(msg.content))
                except Exception as stream_error:
                    self.console.print(f"⚠️ Stream error: {stream_error}", style="yellow")
           
            if not all_content:
                self.console.print("🔄 Trying direct AI invocation...", style="yellow")
                try:
                    response = self.llm.invoke(message)
                    return str(response.content) if hasattr(response, 'content') else str(response)
                except Exception as direct_error:
                    self.console.print(f"⚠️ Direct error: {direct_error}", style="yellow")
                   
                    if urls:
                        self.console.print("🔄 Extracting content first...", style="blue")
                        extracted = self.extract_content(urls)
                        content_summary = "n".join([
                            f"URL: {url}nContent: {result.get('summary', 'No content')}n"
                            for url, result in extracted.items() if result.get('status') == 'success'
                        ])
                       
                        fallback_query = f"Based on this content, {query}:nn{content_summary}"
                        response = self.llm.invoke(fallback_query)
                        return str(response.content) if hasattr(response, 'content') else str(response)
           
            return "n".join(all_content) if all_content else "❌ Unable to generate response. Please check your API keys and try again."
           
        except Exception as e:
            return f"❌ Analysis failed: {str(e)}nnTip: Make sure your API keys are valid and you have internet connectivity."
   
    def display_results(self, results: Dict[str, Any]):
        """Beautiful result display"""
        for url, result in results.items():
            if result["status"] == "success":
                panel = Panel(
                    f"🔗 [bold blue]{url}[/bold blue]nn{result['summary']}",
                    title="✅ Extracted Content",
                    border_style="green"
                )
            else:
                panel = Panel(
                    f"🔗 [bold red]{url}[/bold red]nn❌ Error: {result['error']}",
                    title="❌ Extraction Failed",
                    border_style="red"
                )
            self.console.print(panel)

Check out the notebook here

The SmartWebagent class encapsulates an intelligent web content extraction and analysis system, leveraging the API of Tavily and Google’s Gemini AI. It can interactively set up basic tools, safely process API keys, extract structured data from provided URLs, and perform profound content analysis with an AI-driven proxy. Furthermore, it utilizes rich visual output to convey results, thereby enhancing readability and user experience in interactive tasks.

def run_async_safely(coro):
    """Run async function safely in any environment"""
    try:
        loop = asyncio.get_running_loop()
        import nest_asyncio
        nest_asyncio.apply()
        return asyncio.run(coro)
    except RuntimeError:
        return asyncio.run(coro)
    except ImportError:
        print("⚠️  Running in sync mode. Install nest_asyncio for better performance.")
        return None

Check out the notebook here

The run_async_safely function ensures that asynchronous functions are executed reliably in different python environments, such as standard scripts and interactive notebooks. It tries to adjust the existing event loop with nest_asyncio; if it is not available, it can gracefully handle the scene, inform the user and default to synchronous execution as a fallback.

def main():
    """Interactive Web Intelligence Demo"""
    console = Console()
    console.print(Panel("🚀 Web Intelligence Agent", style="bold cyan", subtitle="Powered by Tavily & Gemini"))
   
    config = WebIntelligence()
    agent = SmartWebAgent(config)
   
    demo_urls = [
        "
        "
        "
    ]
   
    while True:
        console.print("n" + "="*60)
        console.print("🎯 Choose an option:", style="bold yellow")
        console.print("1. Extract content from URLs")
        console.print("2. AI-powered analysis")
        console.print("3. Demo with sample URLs")
        console.print("4. Exit")
       
        choice = input("nEnter choice (1-4): ").strip()
       
        if choice == "1":
            urls_input = input("Enter URLs (comma-separated): ")
            urls = [url.strip() for url in urls_input.split(",")]
            results = agent.extract_content(urls)
            agent.display_results(results)
           
        elif choice == "2":
            query = input("Enter your analysis query: ")
            urls_input = input("Enter URLs to analyze (optional, comma-separated): ")
            urls = [url.strip() for url in urls_input.split(",") if url.strip()] if urls_input.strip() else None
           
            try:
                response = agent.analyze_with_ai(query, urls)
                console.print(Panel(Markdown(response), title="🤖 AI Analysis", border_style="blue"))
            except Exception as e:
                console.print(f"❌ Analysis failed: {e}", style="bold red")
           
        elif choice == "3":
            console.print("🎬 Running demo with AI & Quantum Computing URLs...")
            results = agent.extract_content(demo_urls)
            agent.display_results(results)
           
            response = agent.analyze_with_ai(
                "Compare AI, ML, and Quantum Computing. What are the key relationships?",
                demo_urls
            )
            console.print(Panel(Markdown(response), title="🧠 Comparative Analysis", border_style="magenta"))
           
        elif choice == "4":
            console.print("👋 Goodbye!", style="bold green")
            break
        else:
            console.print("❌ Invalid choice!", style="bold red")


if __name__ == "__main__":
    main()

Check out the notebook here

The main features provide an interactive command line demonstration of the smart web intelligent proxy. It provides users with an intuitive menu that allows them to extract web content from custom URLs, perform complex AI-driven analysis of selected topics, or explore predefined demonstrations involving AI, machine learning, and quantum computing. Rich visual formats enhance user engagement, making complex web analytics tasks direct and user-friendly.

All in all, by following this comprehensive tutorial, we have built an enhanced Tavely Web Intelligent Agent that is able to use Google’s Gemini AI for complex web content extraction and intelligent analysis. With structured data extraction, dynamic AI queries and visually appealing results, this powerful agent simplifies research tasks, enriches your data analytics workflow, and promotes deeper insights into your web content. With this foundation, we now have the ability to further scale the agent, customize it into a specific use case, and leverage the combined capabilities of AI and Web intelligence to increase productivity and decision-making in projects.

Check out the notebook here. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.