A coded implementation of an agent artificial intelligence framework that performs literature analysis, hypothesis generation, experiment planning, simulations, and scientific reporting

In this tutorial, we’ll walk through building a complete scientific discovery agent and experience how each component works together to form a coherent research workflow. We first load the literature corpus, build the search and LLM modules, and then assemble agents for searching papers, generating hypotheses, designing experiments, and generating structured reports. Through the snippet mentioned below, we see how agent pipelines emerge naturally, allowing us to explore scientific questions from initial curiosity to full analysis in a single integrated system. Check The complete code is here.

import sys, subprocess


def install_deps():
   pkgs = ["transformers", "scikit-learn", "numpy"]
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)


try:
   from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
   from sklearn.feature_extraction.text import TfidfVectorizer
   from sklearn.metrics.pairwise import cosine_similarity
   import numpy as np
except ImportError:
   install_deps()
   from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
   from sklearn.feature_extraction.text import TfidfVectorizer
   from sklearn.metrics.pairwise import cosine_similarity
   import numpy as np


from dataclasses import dataclass
from typing import List, Dict, Any


np.random.seed(42)


LITERATURE = [
   {"id": "P1","title": "Self-Supervised Protein Language Models for Structure Prediction","field": "computational biology",
    "abstract": "We explore transformer-based protein language models trained on millions of sequences. The models learn residue-level embeddings that improve secondary structure prediction and stability estimation."},
   {"id": "P2","title": "CRISPR Off-Target Detection Using Deep Learning","field": "genome editing",
    "abstract": "We propose a convolutional neural network architecture for predicting CRISPR-Cas9 off-target effects directly from genomic sequences, achieving state-of-the-art accuracy on GUIDE-seq datasets."},
   {"id": "P3","title": "Foundation Models for Scientific Equation Discovery","field": "scientific ML",
    "abstract": "Large language models are combined with symbolic regression to recover governing equations from noisy experimental observations in physics and fluid dynamics."},
   {"id": "P4","title": "Active Learning for Materials Property Optimization","field": "materials science",
    "abstract": "We integrate Bayesian optimization with graph neural networks to actively select candidate materials that maximize target properties while reducing experimental cost."},
   {"id": "P5","title": "Graph-Based Retrieval for Cross-Domain Literature Review","field": "NLP for science",
    "abstract": "We construct a heterogeneous citation and concept graph over multi-domain scientific papers and show that graph-aware retrieval improves cross-domain literature exploration."},
]


corpus_texts = [p["abstract"] + " " + p["title"] for p in LITERATURE]
vectorizer = TfidfVectorizer(stop_words="english")
corpus_matrix = vectorizer.fit_transform(corpus_texts)


MODEL_NAME = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)


def generate_text(prompt: str, max_new_tokens: int = 256) -> str:
   inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
   outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, num_beams=4, early_stopping=True)
   return tokenizer.decode(outputs[0], skip_special_tokens=True)

We laid the foundation for our scientific agent by loading libraries, preparing a literature corpus, and initializing our language model. We built a TF-IDF vectorizer and embedded all abstracts for later retrieval of relevant papers. With the model loaded and the data structured, we create the computational backbone for everything that comes next. Check The complete code is here.

@dataclass
class PaperHit:
   paper: Dict[str, Any]
   score: float


class LiteratureAgent:
   def __init__(self, vectorizer, corpus_matrix, papers: List[Dict[str, Any]]):
       self.vectorizer = vectorizer
       self.corpus_matrix = corpus_matrix
       self.papers = papers


   def search(self, query: str, k: int = 3) -> List[PaperHit]:
       q_vec = self.vectorizer.transform([query])
       sims = cosine_similarity(q_vec, self.corpus_matrix)[0]
       idxs = np.argsort(-sims)[:k]
       hits = [PaperHit(self.papers[i], float(sims[i])) for i in idxs]
       return hits

We implemented the literature search component of the agent. We transform user queries into vector space and use cosine similarity to identify the most relevant scientific papers. Through this, we enable our system to base its reasoning on the closest matching prior work. Check The complete code is here.

@dataclass
class ExperimentPlan:
   system: str
   hypothesis: str
   variables: Dict[str, Any]
   protocol: List[str]


@dataclass
class ExperimentResult:
   plan: ExperimentPlan
   metrics: Dict[str, float]


class ExperimentAgent:
   def design_experiment(self, question: str, hypothesis: str, hits: List[PaperHit]) -> ExperimentPlan:
       top_field = hits[0].paper["field"] if hits else "computational science"
       protocol = [
           f"Construct dataset combining ideas from: {', '.join(h.paper['id'] for h in hits)}.",
           "Split data into train/validation/test.",
           "Compare baseline model vs. augmented model implementing the hypothesis.",
           "Evaluate using appropriate metrics and perform ablation analysis.",
       ]
       variables = {
           "baseline_model": "sequence CNN",
           "augmented_model": "protein language model + CNN",
           "n_train_samples": 5000,
           "n_validation_samples": 1000,
           "metric": "AUROC",
       }
       system = f"{top_field} system related to: {question}"
       return ExperimentPlan(system=system, hypothesis=hypothesis, variables=variables, protocol=protocol)


   def run_experiment(self, plan: ExperimentPlan) -> ExperimentResult:
       base = 0.78 + 0.02 * np.random.randn()
       gain = abs(0.05 + 0.01 * np.random.randn())
       metrics = {
           "baseline_AUROC": round(base, 3),
           "augmented_AUROC": round(base + gain, 3),
           "estimated_gain": round(gain, 3),
       }
       return ExperimentResult(plan=plan, metrics=metrics)

We design and simulate experiments based on the retrieved literature and generated hypotheses. We automatically define variables, build protocols, and generate comprehensive metrics that mimic the dynamics of real scientific assessments. This takes us from theoretical ideas to feasible experimental plans. Check The complete code is here.

class ReportAgent:
   def write_report(self, question: str, hits: List[PaperHit], plan: ExperimentPlan, result: ExperimentResult) -> str:
       related_work = "n".join(f"- {h.paper['title']} ({h.paper['field']})" for h in hits)
       protocol_str = "n".join(f"- {step}" for step in plan.protocol)
       prompt = f"""
You are an AI research assistant writing a concise research-style report.


Research question:
{question}


Hypothesis:
{plan.hypothesis}


Relevant prior work:
{related_work}


Planned experiment:
System: {plan.system}
Variables: {plan.variables}
Protocol:
{protocol_str}


Simulated results:
{result.metrics}


Write a clear report with the following sections:
1. Background
2. Proposed Approach
3. Experimental Setup
4. Results and Discussion
5. Limitations and Future Work
"""
       return generate_text(prompt.strip(), max_new_tokens=320)

We use the LL.M. to generate complete research-style reports. We assemble hypotheses, protocols, results, and related work into a structured document with clearly defined sections. This allows us to transform the raw output of the pipeline into sound scientific communication. Check The complete code is here.

class ScientificAgent:
   def __init__(self):
       self.lit_agent = LiteratureAgent(vectorizer, corpus_matrix, LITERATURE)
       self.exp_agent = ExperimentAgent()
       self.report_agent = ReportAgent()


   def propose_hypothesis(self, question: str, hits: List[PaperHit]) -> str:
       context = " ".join(h.paper["abstract"] for h in hits)
       prompt = f"""
You are an AI scientist. Given a research question and related abstracts,
propose a single, testable hypothesis in 2-3 sentences.


Research question:
{question}


Related abstracts:
{context}
"""
       return generate_text(prompt.strip(), max_new_tokens=96)


   def run_pipeline(self, question: str) -> str:
       hits = self.lit_agent.search(question, k=3)
       hypothesis = self.propose_hypothesis(question, hits)
       plan = self.exp_agent.design_experiment(question, hypothesis, hits)
       result = self.exp_agent.run_experiment(plan)
       report = self.report_agent.write_report(question, hits, plan, result)
       return report


if __name__ == "__main__":
   research_question = (
       "How can protein language model embeddings improve CRISPR off-target "
       "prediction compared to sequence-only CNN baselines?"
   )
   agent = ScientificAgent()
   final_report = agent.run_pipeline(research_question)
   print(final_report)

We orchestrated the entire process, searching the literature, generating hypotheses, designing experiments, running simulations, and writing reports. We then execute the system against a real research problem and observe the entire workflow in action. This step combines all modules into a unified scientific agent.

In summary, we saw how a compact codebase evolved into a functional AI co-researcher capable of searching, reasoning, simulating, and summarizing. We understand how each piece contributes to the overall pipeline, and how agent components amplify each other when combined. In addition, we extend the agent with richer literature sources, more realistic models, and more complex experimental logic to further advance our scientific exploration with each iteration.


Check The complete code is here. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.


Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an artificial intelligence media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

You may also like...