How to build multiple rounds of deep research agents using Gemini, DuckDuckgo API and automatic reporting?

by admin · August 28, 2025

We started this tutorial by designing a modular deep research system that runs directly on Google Colab. We configure Gemini as a core reasoning engine, integrate DuckDuckgo’s instant answers API for lightweight web searches, and coordinate multiple rounds of queries with deduplication and delay processing. We emphasize efficiency by limiting API calls, parsing concise snippets, and using structured hints to extract points, topics, and insights. From source collection to JSON-based analytics, each component allows us to experiment quickly and adjust the workflow to understand more in-depth or broader research queries. Check The complete code is here.

import os
import json
import time
import requests
from typing import List, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from urllib.parse import quote_plus
import re

We first need to import the basic Python library that handles system operations, JSON processing, web requests and data structures. We also combine Google’s Generative AI SDK with utilities such as URL encoding to ensure our research system runs smoothly. Check The complete code is here.

@dataclass
class ResearchConfig:
   gemini_api_key: str
   max_sources: int = 10
   max_content_length: int = 5000
   search_delay: float = 1.0


class DeepResearchSystem:
   def __init__(self, config: ResearchConfig):
       self.config = config
       genai.configure(api_key=config.gemini_api_key)
       self.model = genai.GenerativeModel('gemini-1.5-flash')


   def search_web(self, query: str, num_results: int = 5) -> List[Dict[str, str]]:
       """Search web using DuckDuckGo Instant Answer API"""
       try:
           encoded_query = quote_plus(query)
           url = f"


           response = requests.get(url, timeout=10)
           data = response.json()


           results = []


           if 'RelatedTopics' in data:
               for topic in data['RelatedTopics'][:num_results]:
                   if isinstance(topic, dict) and 'Text' in topic:
                       results.append({
                           'title': topic.get('Text', '')[:100] + '...',
                           'url': topic.get('FirstURL', ''),
                           'snippet': topic.get('Text', '')
                       })


           if not results:
               results = [{
                   'title': f"Research on: {query}",
                   'url': f"
                   'snippet': f"General information and research about {query}"
               }]


           return results


       except Exception as e:
           print(f"Search error: {e}")
           return [{'title': f"Research: {query}", 'url': '', 'snippet': f"Topic: {query}"}]


   def extract_key_points(self, content: str) -> List[str]:
       """Extract key points using Gemini"""
       prompt = f"""
       Extract 5-7 key points from this content. Be concise and factual:


       {content[:2000]}


       Return as numbered list:
       """


       try:
           response = self.model.generate_content(prompt)
           return [line.strip() for line in response.text.split('n') if line.strip()]
       except:
           return ["Key information extracted from source"]


   def analyze_sources(self, sources: List[Dict[str, str]], query: str) -> Dict[str, Any]:
       """Analyze sources for relevance and extract insights"""
       analysis = {
           'total_sources': len(sources),
           'key_themes': [],
           'insights': [],
           'confidence_score': 0.7
       }


       all_content = " ".join([s.get('snippet', '') for s in sources])


       if len(all_content) > 100:
           prompt = f"""
           Analyze this research content for the query: "{query}"


           Content: {all_content[:1500]}


           Provide:
           1. 3-4 key themes (one line each)
           2. 3-4 main insights (one line each)
           3. Overall confidence (0.1-1.0)


           Format as JSON with keys: themes, insights, confidence
           """


           try:
               response = self.model.generate_content(prompt)
               text = response.text
               if 'themes' in text.lower():
                   analysis['key_themes'] = ["Theme extracted from analysis"]
                   analysis['insights'] = ["Insight derived from sources"]
           except:
               pass


       return analysis


   def generate_comprehensive_report(self, query: str, sources: List[Dict[str, str]],
                                   analysis: Dict[str, Any]) -> str:
       """Generate final research report"""


       sources_text = "n".join([f"- {s['title']}: {s['snippet'][:200]}"
                                for s in sources[:5]])


       prompt = f"""
       Create a comprehensive research report on: "{query}"


       Based on these sources:
       {sources_text}


       Analysis summary:
       - Total sources: {analysis['total_sources']}
       - Confidence: {analysis['confidence_score']}


       Structure the report with:
       1. Executive Summary (2-3 sentences)
       2. Key Findings (3-5 bullet points)
       3. Detailed Analysis (2-3 paragraphs)
       4. Conclusions & Implications (1-2 paragraphs)
       5. Research Limitations


       Be factual, well-structured, and insightful.
       """


       try:
           response = self.model.generate_content(prompt)
           return response.text
       except Exception as e:
           return f"""
# Research Report: {query}


## Executive Summary
Research conducted on "{query}" using {analysis['total_sources']} sources.


## Key Findings
- Multiple perspectives analyzed
- Comprehensive information gathered
- Research completed successfully


## Analysis
The research process involved systematic collection and analysis of information related to {query}. Various sources were consulted to provide a balanced perspective.


## Conclusions
The research provides a foundation for understanding {query} based on available information.


## Research Limitations
Limited by API constraints and source availability.
           """


   def conduct_research(self, query: str, depth: str = "standard") -> Dict[str, Any]:
       """Main research orchestration method"""
       print(f"🔍 Starting research on: {query}")


       search_rounds = {"basic": 1, "standard": 2, "deep": 3}.get(depth, 2)
       sources_per_round = {"basic": 3, "standard": 5, "deep": 7}.get(depth, 5)


       all_sources = []


       search_queries = [query]


       if depth in ["standard", "deep"]:
           try:
               related_prompt = f"Generate 2 related search queries for: {query}. One line each."
               response = self.model.generate_content(related_prompt)
               additional_queries = [q.strip() for q in response.text.split('n') if q.strip()][:2]
               search_queries.extend(additional_queries)
           except:
               pass


       for i, search_query in enumerate(search_queries[:search_rounds]):
           print(f"🔎 Search round {i+1}: {search_query}")
           sources = self.search_web(search_query, sources_per_round)
           all_sources.extend(sources)
           time.sleep(self.config.search_delay)


       unique_sources = []
       seen_urls = set()
       for source in all_sources:
           if source['url'] not in seen_urls:
               unique_sources.append(source)
               seen_urls.add(source['url'])


       print(f"📊 Analyzing {len(unique_sources)} unique sources...")


       analysis = self.analyze_sources(unique_sources[:self.config.max_sources], query)


       print("📝 Generating comprehensive report...")


       report = self.generate_comprehensive_report(query, unique_sources, analysis)


       return {
           'query': query,
           'sources_found': len(unique_sources),
           'analysis': analysis,
           'report': report,
           'sources': unique_sources[:10]
       }

We define a Research Conconfig data category to manage parameters such as API keys, source restrictions, and delays, and then build the DeepresearchSystem class that integrates Gemini with DuckDuckgo search. We implement methods for web search, keypoint extraction, source analysis and report generation, allowing us to coordinate multiple rounds of research and generate structured insights in a streamlined workflow. Check The complete code is here.

def setup_research_system(api_key: str) -> DeepResearchSystem:
   """Quick setup for Google Colab"""
   config = ResearchConfig(
       gemini_api_key=api_key,
       max_sources=15,
       max_content_length=6000,
       search_delay=0.5
   )
   return DeepResearchSystem(config)

We create a Settings_RESEARCH_SYSTEM feature that simplifies initialization of Google Colab by wrapping our configuration in ResearchConfig and returning an off-the-shelf DeepResearchSystem instance with custom restrictions and delays. Check The complete code is here.

if __name__ == "__main__":
   API_KEY = "Use Your Own API Key Here"


   researcher = setup_research_system(API_KEY)


   query = "Deep Research Agent Architecture"
   results = researcher.conduct_research(query, depth="standard")


   print("="*50)
   print("RESEARCH RESULTS")
   print("="*50)
   print(f"Query: {results['query']}")
   print(f"Sources found: {results['sources_found']}")
   print(f"Confidence: {results['analysis']['confidence_score']}")
   print("n" + "="*50)
   print("COMPREHENSIVE REPORT")
   print("="*50)
   print(results['report'])


   print("n" + "="*50)
   print("SOURCES CONSULTED")
   print("="*50)
   for i, source in enumerate(results['sources'][:5], 1):
       print(f"{i}. {source['title']}")
       print(f"   URL: {source['url']}")
       print(f"   Preview: {source['snippet'][:150]}...")
       print()

We add a major execution block where we initialize the research system using API keys, query the “Deep Study Agent Architecture” and then display the structured output. We print the results of the study, a comprehensive report generated by Gemini, and a list of consulting resources with titles, URLs, and previews.

In short, we see how the entire pipeline consistently translates unstructured fragments into well-structured reports. We successfully combined search, language modeling and analysis layers to simulate the complete research workflow within COLAB. By using Gemini for extraction, synthesis, and reporting, while DuckDuckgo for free search access, we create a reusable foundation for more advanced proxy research systems. This notebook provides a practical, technically detailed template that we can now scale with other models, custom rankings, or domain-specific integrations, while still retaining a compact end-to-end architecture.

Check The complete code is here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.