Understanding how genes interact in complex biological systems has long been a cornerstone of molecular biology. One of the most powerful ways to study these interactions is through perturbation experiments, where scientists selectively disrupt genes to observe their effects on cellular functions. Techniques like Perturb-seq have revolutionized this field by enabling large-scale interventions and single-cell sequencing to map genetic influences. However, the sheer volume of data and the high costs of conducting these experiments present major barriers to their widespread use.
Thanks to machine learning (ML) and artificial intelligence (AI), it is possible to predict cellular responses and extract meaningful insights without the need for exhaustive laboratory experiments. But there’s a problem: many current AI models treat biological data as just numbers, failing to capture the semantic richness of genetic relationships. They focus on raw correlations rather than deeper biological reasoning, limiting their ability to support meaningful discoveries.
A recent study led by Menghua Wu (MIT), Russell Littman, Jacob Levine, David Richmond, Tommaso Biancalani, Jan-Christian Hütter (Genentech), and Lin Qiu (Meta AI) proposes a new approach. They introduce PERTURBQA, a benchmark designed to align AI-driven perturbation models with real biological decision-making. More importantly, they demonstrate how large language models (LLMs)—the same technology that powers AI chatbots—can be repurposed for biological research. Their method, called SUMMER (SUMMarize, retrievE, and answeR), shows that AI can interpret and reason over perturbation experiments using natural language, potentially outperforming existing models.
Why current AI approaches fall shortThe biggest limitation of perturbation experiments is their cost. These experiments rely on single-cell RNA sequencing (scRNA-seq), a technique that allows scientists to measure how gene expression changes when specific genes are knocked down or overexpressed. While powerful, these experiments are expensive and time-consuming, requiring thousands of cells and intricate data analysis.
To address this, machine learning models attempt to predict how genes will behave under perturbation before actually conducting experiments. These models use knowledge graphs—databases of known biological interactions—to infer how a new gene disruption might affect a cell. However, this approach has several shortcomings:
AI now handles molecular simulations: Thanks to MDCrow
A language-based alternativeTo overcome these limitations, the research team proposes a language-based approach. Instead of treating genes as mere data points, they argue that biological relationships should be represented through natural language—the way scientists naturally describe genetic interactions.
This is where large language models (LLMs) come in.
PERTURBQA: A new benchmark for AI in biologyTo test whether language models can reason about genetic perturbations, the researchers created PERTURBQA, a benchmark designed to evaluate AI models on three real-world biological tasks:
Unlike previous benchmarks, which mostly assess whether AI can recall existing biological knowledge, PERTURBQA is designed to predict and reason about new, unseen perturbations. The dataset includes five large-scale Perturb-seq experiments that cover multiple cell types.
SUMMER: An AI model that thinks like a biologistTo solve the PERTURBQA tasks, the researchers introduced SUMMER, a language-based AI framework that outperforms traditional machine learning models in reasoning over perturbation data.
SUMMER works in three key steps:
Unlike conventional models that blindly correlate genes, SUMMER explains why a perturbation might cause a certain effect, making its predictions more interpretable.
How well does SUMMER perform?The researchers tested SUMMER against state-of-the-art AI models, including:
The results showed that SUMMER outperformed all baseline models on both differential expression and gene set enrichment tasks. Notably, models without structured reasoning or experimental retrieval performed no better than random guessing, highlighting the importance of SUMMER’s approach.
Can AI describe biological patterns?One of the most impressive achievements of SUMMER was in gene set enrichment. Traditionally, scientists use statistical tests to group genes into functional sets, but these methods struggle with poorly characterized genes. SUMMER, on the other hand, was able to generate accurate, interpretable descriptions of gene clusters, often matching or exceeding human annotations.
For example, when analyzing a gene cluster involved in RNA modification, traditional statistical methods failed to provide meaningful insights. SUMMER, however, generated the following description:
“M6A Methylation Complex-Associated Genes: This set includes genes regulating N6-methyladenosine (m6A) methylation of RNAs, influencing mRNA splicing and RNA processing.”
Such descriptions are not only more readable but also capture the broader biological significance of gene interactions.
While SUMMER represents a major step forward, biological reasoning with AI is far from a solved problem. The study highlights several future directions:
Featured image credit: digitale.de/Unsplash
All Rights Reserved. Copyright 2025, Central Coast Communications, Inc.