AI

Meet Biological Season: The world’s first biological reasoning model, enabling AI to reason with biology experts such as genomics

The main obstacle to using AI for genomics is the lack of interpretable, step-by-step reasoning for complex DNA data. Although DNA Foundation models excel in learning rich sequence patterns for tasks such as variant prediction and gene regulation, they often run as black boxes, providing limited insights into potential biological mechanisms. Meanwhile, large language models show impressive inference skills in various fields, but are not designed to deal with original genomic sequences. The gap between powerful DNA representation and deep biological reasoning prevents AI from reaching expert-level understanding and limits the potential to drive scientific discovery through meaningful hypothesis-driven explanations.

The basic DNA model has made significant progress by learning rich representations directly from genomic sequences, showing strong performance in a range of biological tasks. Models like Evo2 have remote capabilities that highlight their potential, but the lack of explanatory limits deeper biological insights. Meanwhile, large language models perform well in the reasoning of biomedical texts, but usually do not interact directly with the original genomic data. Attempts such as Genegpt and Txgemma represent early efforts to bridge this gap. Current genomic benchmarks evaluate task performance, but lack in evaluating reasoning and hypothesis generation.

Researchers from Vector Institute, University Health Network (UHN), ARC Institute, Cohere, University of California, San Francisco and Google DeepMind have launched Bioreason, a groundbreaking AI system that combines DNA foundation models with LLM. This integration allows biological seasons to analyze original genomic sequences while applying LLM-based reasoning to generate clear biological rooted insights. Through supervised fine-tuning and enhanced learning training, it achieved performance growth of 15% or higher than traditional models, with up to 97% accuracy in KEGG-based disease pathway predictions. This approach provides interpretable, step-by-step output to improve biological understanding and facilitate hypothesis generation.

The Bioseason Model is a multimodal framework designed to support deep, interpretable biological reasoning by combining genomic sequences with natural language queries. It uses a DNA foundation model to extract rich context embeddings from the original DNA input and integrates them with tokenized text queries to form a unified input for LLM, especially Qwen3. The system is trained to generate step-by-step explanations of biological processes. DNA embeddings project the embedding of DNA into the LLM’s space using a learnable layer and combine inputs with location-rich encoding. In addition, strengthening learning through group relative policy optimization can improve its reasoning ability.

The researchers evaluated biological seasons of three datasets focusing on interpretation of DNA variation and biological reasoning. It outperforms the DNA-only and LLM-only models in predicting disease outcomes for genomic mutations. The best-performing version combining EVO2 and QWEN3-4B achieved high accuracy and F1 scores in all tasks. A notable case study involved a PFN1 mutation associated with ALS, in which the mutation accurately predicted the disease and produced a 10-step explanation to track the effect of this variant on actin dynamics and motor neuron degeneration. This not only shows that its intensity is not only in accurate predictions, but also provides the intensity that provides transparent biologically basic inference paths.

In summary, biological seasons combine DNA encoders with large language models to achieve detailed, interpretable reasoning for genomic data. Unlike traditional models, it not only makes accurate predictions, but also uses step-by-step output to explain the biological logic behind it. This helps scientists better understand the mechanisms of disease and generate new research questions. Despite the challenges of strong biological seasons, such as high computational costs and limited uncertainty measures. Future work aims to address these issues by increasing scalability, combining other biological data such as RNA and proteins, and applying them to a wider range of tasks including GWAS. Overall, the biological seasons show promise in advancing precision medicine and genomic research.


View paper, GitHub pages, and project pages. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.


Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button