Stanford researchers introduce Biomni: a biomedical AI agent for automation across different tasks and data types

by admin · May 30, 2025

Biomedical research is a rapidly developing field that seeks to improve human health by identifying the mechanisms behind the disease, identifying new therapeutic targets and developing effective treatments. The field includes a variety of fields including genetics, molecular biology, pharmacology and clinical research, which require professional tools and in-depth expertise. The increasing complexity of biomedical data, experiments and literature presents both opportunities and challenges. Researchers must integrate findings from genomics, proteomics and other data sources to generate hypotheses, design experiments, and interpret results. The ability to effectively manage this complexity is crucial to accelerate scientific discoveries and translate them into clinical applications.

The core challenge of biomedical research is the huge challenge of data, methods and tools that must seek to produce meaningful results. Researchers often face fragmented workflows, relying on numerous professional tools that blend with each other. This creates bottlenecks when trying to design experiments, process large data sets, or interpret multimodal biomedical information. The fact that the limited availability of expert researchers aligns with growing scientific knowledge further exacerbates the problem. As a result, important portions of biomedical data remain underutilized and connections between findings in different sub-fields are often missed. Addressing these issues requires a new approach that extends expertise, handles data complexity and supports integrated workflows in a variety of biomedical fields.

Existing biomedical research tools often focus on narrow tasks such as specific gene analysis, protein structure prediction, or drug target interaction studies. These tools require careful setup, domain-specific knowledge and manual integration into a wider workflow. Although large language models (LLMS) show promise in tasks such as answers to biomedical questions, they often cannot interact directly with dedicated tools or databases. Past efforts to create AI agents for biomedical tasks rely on predefined workflows or templates, limiting their flexibility. Therefore, researchers have been working to find AI systems that can adapt to a variety of biomedical tasks, dynamically form new workflows or perform complex analytics.

Researchers from Stanford University, Genentech, ARC Institute, University of Washington, Princeton University and University of California, San Francisco, introduced Biomnia universal biomedical AI agent. Biomni combines the basic biomedical environment, Biomni-E1with advanced task execution architecture, BioMNI-A1. BioMni-E1 is a unified biomedical action space that extracts 150 professional tools, 105 software packages and 59 databases through mining in tens of thousands of biomedical publications in 25 subfields. BiomNi-A1 enables the system to adapt to a variety of biomedical problems by generating and running dynamic code selection tools, creating plans and performing tasks. Inference, code-based execution and resource selection integration allows Biomni to perform a wide range of tasks autonomously, including bioinformatics analysis, hypothesis generation and protocol design. Unlike the static function call model, Biomni’s architecture allows it to flexibly interleave code execution, data query, and tool calls, creating a seamless pipeline for complex biomedical workflows.

BIOMNI-A1 uses an LLM-based tool selection mechanism to identify related resources based on user targets. It applies code to a common interface to form complex workflows using process logic, including loops, parallelization, and conditional steps. An adaptive planning strategy enables Biomni to iteratively refine planning while performing tasks, ensuring context-aware and responsive behavior. Biomni’s performance is rigorously evaluated through multiple benchmarks. On laboratory-based benchmarks, Biomni has an accuracy of 74.4% in DBQA and 81.9% in SEQQA, which outperforms human experts (74.7% and 78.8% respectively). In the HLE benchmark that covered 14 subfields, Biomni scored 17.3%, over 402.3% of the base LLM, the encoding agent increased by 43.0%, and its own ablation variant increased by 20.4%. Real-world case studies show that Biomni’s ability to independently generate 10-step pipelines can independently analyze 458 wearable sensor files, thereby determining the rise in temperature after dining to 2.19°C. It also analyzed sleep data over 227 nights and found the importance of such a mid-term peak in sleep efficiency as well as the circadian regularity in total sleep duration.

Biomni’s ability to handle real-world research problems extends to complex tamological analysis, processing more than 336,000 single-nuclear RNA-Seq and ATAC-SEQ profiles from human embryonic skeleton data. Biomni constructs a 10-stage analysis pipeline to predict transcription factor-target gene links, filters results using chromatin accessibility data, and summarizes findings in structured reports. Agent handles all aspects of analysis, including code generation, error debugging, and result interpretation, generating outputs such as trajectory maps, heat maps and PCA double rods. These capabilities demonstrate Biomni’s ability to manage large-scale, multi-modal datasets, identify biological patterns and accelerate paths from raw data to testable hypotheses. By performing 6 to 24 different steps per task, integrating up to 4 specialized tools, 8 software packages and three unique data lake projects, Biomni reflects the workflow of human scientists while greatly reducing manual efforts.

Several key points of Biomni’s research include:

Biomni-E1 includes 150 specialized tools, 105 software packages and 59 databases, all of which integrate biomedical research.
Biomni’s average performance growth: 402.3% of the base LLM, 43.0% of the encoder, and 20.4% of the reaction rate of BioMni.
Biomni independently performed a 10-step pipeline, analyzing 458 wearable sensor files, showing that the average post-meal temperature rises to 2.19°C.
In laboratory benchmarks, Biomni has a precision of 74.4% in DBQA and 81.9% in SEQQA, which outperforms human experts.
Biomni processed 336,162 profiles and generated complex multimodal tube datasets that could interpretable outputs, including gene regulatory networks and motif enrichment analysis.
The average task execution involves 6-24 steps, using up to 4 tools, 8 software packages and 3 data lake projects.
Biomni’s flexible architecture enables it to automatically generate PCA maps, heat maps, trajectory maps and cluster maps, thereby generating readable human-readable reports without manual intervention.

In short, Biomni represents an important step in biomedical AI, integrating inference, code execution, and dynamic resources into a single system. Researchers show that it can generalize across tasks, perform complex workflows without manual templates, and produce results that compete with or exceed human expertise in several areas. The system’s ability to process large data sets, form complex pipelines and generate human-readable reports suggests that it has the potential to significantly accelerate biomedical discoveries, relieve researchers’ burdens and achieve new insights.

Check out the paper, code and try it here. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.