NVIDIA and Arc Institute Unveil Evo 2, a Groundbreaking Foundation Model for Biomolecular Sciences

ODSC - Open Data Science
3 min readFeb 20, 2025

--

Scientists worldwide now have access to Evo 2, a cutting-edge foundation model designed to advance biomolecular research. Built using NVIDIA DGX Cloud on Amazon Web Services, Evo 2 is the largest publicly available AI model trained on genomic data. It was developed through a collaboration led by the nonprofit Arc Institute and Stanford University.

The Evo 2 model is available via the NVIDIA BioNeMo platform, including as an NVIDIA NIM microservice, enabling streamlined and secure AI deployment for developers.

Transforming Genetic Research with AI

Evo 2 was trained on an extensive dataset comprising nearly 9 trillion nucleotides — the fundamental units of DNA and RNA. This allows the model to analyze genetic information across species, offering applications in healthcare, agriculture, and industrial biotechnology.

The model’s capabilities extend to predicting protein structures based on genetic sequences, identifying novel therapeutic molecules, and evaluating the impact of gene mutations on cellular functions. These insights can accelerate scientific breakthroughs that previously required time-intensive laboratory work.

Evo 2 represents a major milestone for generative genomics,” said Patrick Hsu, Arc Institute cofounder and core investigator, as well as assistant professor of bioengineering at the University of California, Berkeley. “By advancing our understanding of these fundamental building blocks of life, we can pursue solutions in healthcare and environmental science that are unimaginable today.”

Accelerated Research Through Advanced Computing

The Evo 2 project leveraged 2,000 NVIDIA H100 GPUs through the DGX Cloud platform on AWS, providing researchers with scalable computing resources. This infrastructure enabled rapid model development and performance optimization, supported by NVIDIA’s research team.

DGX Cloud offers short-term access to large-scale GPU clusters, empowering scientists to handle computationally intensive projects without long-term infrastructure investments. Researchers can further customize Evo 2 using their proprietary datasets through the open-source NVIDIA BioNeMo Framework.

Brian Hie, assistant professor of chemical engineering at Stanford University and an Arc Institute innovation investigator, emphasized the model’s impact on biological research: “With Evo 2, we make the biological design of complex systems more accessible to researchers, enabling the creation of new and beneficial advances in a fraction of the time it would previously have taken.”

Broad Applications in Biomolecular Sciences

Evo 2’s ability to process genetic sequences up to 1 million tokens opens new avenues for studying gene expression, cell function, and disease mechanisms. In tests involving the BRCA1 gene, linked to breast cancer, Evo 2 accurately predicted the functional impact of previously unknown mutations with 90% accuracy.

The model also holds promise for agricultural innovation, aiding in the development of climate-resilient crops and nutrient-rich plant varieties. Additionally, it could facilitate the design of biofuels and environmentally friendly enzymes capable of breaking down pollutants.

Deploying a model like Evo 2 is like sending a powerful new telescope out to the farthest reaches of the universe,” said Dave Burke, Arc Institute’s chief technology officer. “We know there’s immense opportunity for exploration, but we don’t yet know what we’re going to discover.

For further information on Evo 2, visit the NVIDIA Technical Blog or review Arc Institute’s technical report

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

No responses yet

Write a response