From EVO 1 to EVO 2: How NVIDIA redefines genomic research and AI-driven biological innovation

Imagine a world where we can only predict the behavior of life by analyzing a series of letters. This is not a sci-fi world or a magic world, but a real world where scientists have been working to achieve this for years. These sequences consist of four nucleotides (A, T, C and G) that contain the basic instructions for life on Earth, from the smallest microorganisms to the largest mammals. Decoding these sequences has the potential to unravel complex biological processes, thus changing areas such as personalized medicine and environmental sustainability.
But despite its huge potential, even the simplest microbial genome is a highly complex task. These genomes are composed of millions of DNA base pairs that regulate the interaction between DNA, RNA and proteins, three key elements in the dogma of the center of molecular biology. From a single molecule to an entire genome, this complexity exists at multiple levels, creating a wide range of genetic information that has evolved over billions of years.
Traditional computing tools have been working hard to deal with the complexity of biological sequences. However, with the rise of generative AI, trillions of sequences can now be expanded and complex relationships across token sequences can be understood. Building on this advancement, researchers at the ARC Institute, Stanford University and Nvidia have been working to build an AI system that can understand biological sequences, such as large language models, that can understand human texts. Now they have made groundbreaking developments by creating a model that captures the multimodal nature and evolutionary complexity of central dogma. From individual molecules to the entire genome, this innovation could lead to the prediction and design of new biological sequences. In this article, we will explore how the technology works, its potential applications, challenges faced, and the future of genome modeling.
EVO 1: A pioneering model for genome modeling
The study attracted attention in late 2024, when NVIDIA and its collaborators introduced EVO 1, a groundbreaking model for analyzing and generating biological sequences across DNA, RNA and proteins. The model, trained in 2.7 million prokaryotes and phage genomes, has a total of 300 billion nucleotide tokens. The model focuses on integrating central dogmas of molecular biology and modeling the flow of genetic information from DNA to RNA to RNA to RNA to proteins. Its stripe architecture is a hybrid model using convolutional filters and gates that effectively process novels of up to 131,072 tokens. This design allows EVO 1 to link small sequence changes to a wider system-wide and biological-level effects, thus bridging the gap between molecular biology and evolutionary genomics.
EVO 1 is the first step in computing modeling of biological evolution. It successfully predicts molecular interactions and genetic variation by analyzing evolutionary patterns in genetic sequences. However, because scientists aim to apply it to more complex eukaryotic genomes, the limitations of this model become apparent. EVO 1 fights single nucleotide resolution on long DNA sequences and is computationally expensive for larger genomes. These challenges lead to the need for more advanced models that enable the integration of biological data across multiple scales.
EVO 2: The basic model of genome modeling
Building on lessons learned from EVO-1, the researchers launched EVO 2 in February 2025, pushing the field of biological sequence modeling. Trained with a staggering 9.3 trillion DNA base pairs, the model has learned to understand and predict the functional consequences of genetic variation in all areas of life, including bacteria, archaea, plants, plants, fungi and animals. The EVO-2 model has over 40 billion parameters that can handle unprecedented sequence lengths, up to 1 million base pairs, which is the previous model, including EVO-1, cannot be managed.
What distinguishes EVO 2 from its predecessor is that it can not only model DNA sequences, but also model the interactions between DNA, RNA and proteins, that is, the entire central dogma of molecular biology. This allows EVO 2 to accurately predict the effects of genetic mutations, which were previously impossible, from the smallest nucleotide changes to the greater structural changes.
A key feature of EVO 2 is its strong zero prediction capability, which enables it to predict the functional effects of mutations without task-specific fine-tuning. For example, it accurately classifies clinically significant BRCA1 variants by analyzing DNA sequences individually, a key factor in breast cancer research.
Potential applications of biomolecular science
The function of EVO 2 opens up new areas in genomics, molecular biology and biotechnology. Some of the most promising applications include:
- Healthcare and Drug Discovery: EVO 2 can predict which genetic variants are associated with specific diseases, thus facilitating the development of targeted therapies. For example, in tests with BRCA1 variants associated with breast cancer, EVO 2 has more than 90% accuracy in predicting which mutations are benign versus potential pathogens. This insight can accelerate the development of new drugs and personalized treatments.
- Synthetic Biology and Genetic Engineering: The ability of EVO 2 to produce the entire genome provides new avenues for designing synthetic organisms with the required characteristics. Researchers can use EVO 2 as an engineering gene with specific functions to advance the development of biofuels, environmentally friendly chemicals and novel therapeutics.
- Agricultural Biotechnology: It can be used to design genetically modified crops with improved characteristics such as drought tolerance or pest resilience, which contributes to global food security and agricultural sustainability.
- Environmental Science: EVO 2 can be applied to design biofuels or engineer proteins that destroy environmental pollutants such as oils or plastics, thus contributing to sustainability.
Challenges and future directions
Despite its impressive features, the EVO 2 faces challenges. A key obstacle is the computational complexity involved in training and running the model. With a context window of 1 million basis pairs and 40 billion parameters, EVO 2 requires a large number of computing resources to operate effectively. This makes it difficult for smaller research teams to fully utilize their potential without having to obtain high-performance computing infrastructure.
Furthermore, although EVO 2 performs well in predicting genetic mutation effects, there is still a lot to know about how to design novel biological systems from scratch. Generating a realistic biological sequence is only the first step. The real challenge is understanding how this capability can be leveraged to create functionally sustainable biological systems.
Accessibility and democratization of AI genomics
One of the most exciting aspects of EVO 2 is its open source availability. To democratize access to advanced genome modeling tools, NVIDIA has publicly provided model parameters, training codes, and datasets. This open access approach allows researchers from around the world to explore and expand EVO 2 capabilities, thereby accelerating innovation across the scientific community.
Bottom line
EVO 2 is a major advance in genome modeling, using AI to decode complex genetic languages of life. Its DNA sequence modeling and its ability to interact with RNA and proteins opens up new possibilities for healthcare, drug discovery, synthetic biology, and environmental science. EVO 2 can predict genetic mutations and design new biological sequences, providing transformative potential for personalized medicine and sustainable solutions. However, its computational complexity presents challenges, especially for smaller research teams. By making EVO 2 open source, NVIDIA enables researchers around the world to explore and expand their capabilities, thereby driving innovation in genomics and biotechnology. As technology continues to evolve, it has the potential to reshape the future of biological sciences and environmental sustainability.