AI Is Learning Biology's Hidden Rules, Not Just Recognizing Patterns
Artificial intelligence in biology has crossed a critical threshold: it's no longer just sorting through data, but uncovering the fundamental principles that govern living systems. At the EMBL and EMBO Symposium on AI and Biology held in March 2026 at EMBL Heidelberg, researchers from around the world discussed how AI is being used to understand not just what happens in cells and proteins, but why it happens .
Why Can't We Just Trust AI's Answers in Biology?
One of the most pressing challenges facing AI in biology is the "black box" problem. Deep learning models can achieve impressive accuracy, but scientists often cannot see how they arrived at their conclusions. This opacity becomes dangerous in medicine, where understanding the reasoning behind a diagnosis or prediction can be as important as the prediction itself .
Researchers are now demanding that AI systems explain their logic. Oded Regev from New York University demonstrated how his team successfully converted highly complex deep-learning models for RNA splicing into transparent, interpretable frameworks. His work revealed a critical flaw: while deep learning achieves high accuracy, it can suffer from blind spots, sometimes mistaking genomic context like CpG islands for true regulatory elements .
"By deliberately feeding the model partial data, his team proved that the AI isn't just memorising structures; it internalises the physical and energetic rules of protein folding," explained Mohammed AlQuraishi, Columbia University, in research probing how AlphaFold2, the celebrated AI tool that predicts the three-dimensional shape of proteins, learns to make its predictions.
Mohammed AlQuraishi, Columbia University
This shift from "what will the AI predict?" to "why is the AI predicting that?" represents a fundamental maturation of AI in biology. Understanding the machinery behind the output, not just the output itself, is now recognized as essential .
How Are Scientists Building AI Systems That Reason About Biology?
The conference highlighted several emerging approaches to creating AI systems that don't just recognize patterns but reason about biological systems:
- Structural Protein Search: Martin Steinegger from Seoul National University introduced Foldseek-multimer, a tool capable of rapidly searching through hundreds of millions of protein structures by analyzing structural similarities rather than just genetic sequences, discovering previously unknown protein interactions and evolutionary links.
- Protein Language Models: Anne Florence Bitbol from EPFL presented ProteomeLM, which can "read" entire proteomes and predict complex protein-protein interactions across species with unprecedented speed, bypassing the need for traditional, time-consuming sequence alignments.
- Spatial Biology Reconstruction: Feng Bao from Fudan University presented isoST, a deep learning model that bridges the gap between 2D and 3D by using stochastic differential equations to reconstruct smooth, isotropic 3D tissue volumes from sparse, 2D spatial transcriptomic slices, allowing scientists to finally see the spatial genome in three dimensions.
- Digital Organism Simulation: Eric Xing from MBZUAI and Carnegie Mellon University outlined a framework for an AI-Driven Digital Organism (AIDO), which acts as a "world model" for biology, allowing scientists to simulate how a living system would respond to perturbations across molecular, cellular, and tissue scales.
These tools represent a shift from asking "what will happen?" to asking "why, and what if?" This conceptual change points toward a future where AI is not just reading biology, but reasoning about it .
Is AI Ready for the Clinic?
The symposium made a compelling case that biological AI is already transitioning from research labs to patient care. Jakob Nikolas Kather from the Technical University Dresden illustrated how specialized AI systems are already functioning as medical devices, predicting crucial biomarkers such as microsatellite instability directly from standard tumor histology slides .
Faisal Mahmood from Harvard Medical School expanded on this clinical horizon by presenting Apollo, a system-scale temporal foundation model that integrates multimodal data from histology and genomics to patient visit notes and lab results. Models like Apollo act as clinical agents, predicting disease progression and treatment outcomes with significant accuracy .
Perhaps most ambitiously, researchers are building "virtual patients" and "virtual cells," computational models that can forecast how a tumor behaves, test a treatment before it is ever administered, or reproduce the complexity of a living cell. One group is integrating a patient's imaging, genomic, and treatment history to simulate cancer progression and test therapies in silico. Another presented a vision of AI models capable of simulating biology from the molecular level upwards, with a demonstration of using such a system to design a more effective vaccine sequence .
What's the Hidden Problem With AI Training Data in Genomics?
A critical realization emerged during the conference: the quality of data fed into AI systems matters just as much as the sophistication of the algorithms themselves. Widely used AI tools for predicting how genes are processed, a fundamental step in how DNA gives rise to functional proteins, have been quietly learning the wrong lessons from flawed training data .
This can lead to errors in clinically significant cases. An automated audit of the standard datasets used to benchmark genomic AI found widespread hidden biases baked into their construction. Like a student who memorizes the answer sheet rather than understanding the subject, these models ace the exam but can fail the real test. One example involved a key mutation responsible for cystic fibrosis that was misidentified due to biased training data .
This discovery has profound implications for AI deployment in medicine. Before an AI system can be trusted to guide clinical decisions, the data it was trained on must be rigorously audited for hidden biases and errors that could propagate through predictions affecting patient care.
Why Does Context Matter More Than We Thought?
A recurring realization across the symposium was that cells, proteins, and tumors do not exist in isolation. AI systems that ignore context consistently underperform those that embrace it .
Dana Pe'er from Sloan Kettering Institute elegantly tackled this by introducing the Wasserstein Wormhole, a computational method that creates biologically meaningful "latent spaces" to track how cellular niches reorganize during cancer emergence. This approach recognizes that a cell's function is determined not just by its internal state, but by its spatial and social context within tissue .
In oncology, combining a patient's tissue images with their genomic data, treatment history, and clinical notes produces far more accurate predictions than any single data type alone. This contextual approach mirrors how human doctors think about disease, integrating multiple sources of information to form a complete picture.
The gap between current capability and genuine cellular simulation remains vast. But the conceptual shift from asking "what will happen?" to asking "why, and what if?" points toward a future where AI is not just reading biology, but reasoning about it. As researchers continue to pry open the black box of AI, demand causality from their models, and build systems that respect biological context, the role of AI in biology will deepen from a tool for pattern recognition into a partner in understanding life itself .