Meet Kosmos: The AI ​​Scientist Who Automates Data-Driven Discovery

Built by Edison Scientific, Kosmos is an autonomous discovery system that enables long-term research campaigns on a single target. Given a dataset and an open natural language target, it performs an iterative cycle of data analysis, literature search, and hypothesis generation, then synthesizes the results into a fully cited scientific report. A typical run lasted up to 12 hours and included approximately 200 agent deployments, executing approximately 42,000 lines of code, and reading approximately 1,500 papers.

Architecture, World Model, and Agent Roles

A core design choice is a structured world model that serves as the system’s long-term memory. The world model is a database of entities, relationships, experimental results, and open questions that is updated after each mission. Unlike a normal context window, it is queryable and structured so that information from earlier steps remains accessible tens of thousands of tokens later.

Kosmos uses two main agents, a data analysis agent and a literature search agent. Each cycle, the system proposes up to 10 specific tasks based on the research goals and the current world model. Examples include running differential abundance analyzes on metabolomic data sets, or searching for pathways linking candidate genes to disease phenotypes. The agent writes the code, runs it in a notebook environment, or retrieves and reads the paper, and then writes structured output and citations back to the world model.

This cycle repeats for many cycles. At the end of the run, a separate synthesis component traverses the world model and emits a report where each statement is linked to a Jupyter notebook cell or a specific passage in the original literature. This clear provenance is important in scientific settings because it allows human collaborators to audit individual claims rather than treating the system as a black box.

Comparable accuracy and research time

The team assessed report quality by sampling 102 statements from 3 representative Kosmos reports and asking domain experts to classify each statement as supporting or refuting. Overall, 79.4% of the statements were judged to be accurate. Data analysis statements were the most reliable at about 85.5%, literature statements were correct about 82.1% of the time, and comprehensive statements combined with evidence were correct about 57.9% of the time.

To estimate equivalent human effort, the authors assumed that a typical data analysis trajectory would take 2 hours and reading a paper would take 15 minutes, and then calculated the trajectory and paper for each run. Assuming a 40 hour work week, a typical operating expert month is approximately 4.1 hours. In another survey, seven cooperating scientists rated a 20-step Kosmos run as equivalent to the roughly 6.14 months they themselves would have spent working on the same goal, and this perceived effort scaled roughly linearly with the number of cycles, up to 20 times.

representative findings

Kosmos was tested on 7 case studies in metabolomics, materials science, neuroscience, statistical genetics, and neurodegenerative diseases. In 3 cases, it independently reproduces previous human results without accessing the original preprint during runtime. In 4 cases, it proposed mechanisms that the authors described as novel contributions to the literature.

In the first discovery, Kosmos analyzed metabolomic data from hypothermia experiments in mice. It identified nucleotide metabolism as the major altered pathway in the hypothermic brain, with decreases in precursor bases and nucleosides and increases in monophosphate products. This system concluded that the nucleotide rescue pathway dominates over de novo synthesis during protective hypothermia, which matches an unpublished independent human analysis of the run.

In a second discovery, Kosmos analyzed environmental logs from a perovskite solar cell manufacturing system. It recovers human results that absolute humidity during thermal annealing is a major determinant of device efficiency and identifies a critical humidity threshold described as a lethal filter above which devices fail. This finding matches a preprint in materials science that Kosmos cannot access at runtime due to model training cutoffs and retrieval limitations.

In a third discovery, Kosmos performed neuron-level reconstructions across multiple species and fitted distributions of neurite length, extent, and synaptic count. The conclusion is that extent and synapse distributions are best modeled as lognormal distributions rather than scale-free distributions and recover the power-law scaling between neurite length and synapse count in most data sets. These results are consistent with connection rules reported in earlier neuroscience preprints.

The remaining four findings were described as novel. They include Mendelian randomization analysis (implicating circulating superoxide dismutase 2 as a protective factor against myocardial fibrosis), definition of a mechanistic ranking score (integrating post-inclusion probability and multi-omic evidence for type 2 diabetes loci), proteomic analysis (ranking molecular events along a pseudo-timeline of Alzheimer’s disease), and large-scale single-nucleus transcriptome analysis (linking age-related loss of flippase expression and phosphatidylserine signaling exposure to age). Neuronal vulnerability in the entorhinal cortex.

Main points

  1. Kosmos is an autonomous artificial intelligence scientist that runs up to 12 hours per target, executing approximately 42,000 lines of code and reading approximately 1,500 papers per run, and is coordinated through a structured world model.
  2. The system uses parallel data analysis and literature search agents that share a central world model, which allows Kosmos to maintain coherent long-term reasoning across approximately 200 agent deployments.
  3. Expert evaluators found that the accuracy of sampling report statements was 79.4%, the accuracy of data analysis and literature statements was over 80%, while the reliability of interpretation statements remained low.
  4. The collaborators rated 20 cycles of Kosmos runs as equivalent to approximately 6 months of expert research work, with the number of valuable discoveries being approximately linear in the number of cycles, up to 20 cycles.
  5. Across seven case studies in metabolomics, materials science, neuroscience, statistical genetics, and neurodegenerative diseases, Kosmos reproduces unpublished or post-cutoff results and proposes novel mechanisms, while still requiring human scientists for dataset selection and validation.

Kosmos demonstrates what happens when structured world models and domain-agnostic Edison agents are pushed to the limits of current LLM tools, providing measurable gains in depth of inference, reproducibility, and traceability, while still relying on scientists for data management, goal setting, and interpretation of synthesis statements that remain less reliable than data analysis and literature presentations. Overall, Kosmos is a powerful template for AI to accelerate science, not a replacement for human researchers.


Check Paper and technical details. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.


Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex data sets into actionable insights.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

You may also like...