AI Is Learning to Read the 98% of Your DNA Scientists Still Don't Understand
Three leading institutions have launched a three-year collaboration to use artificial intelligence and large language models to decode the 98% of the human genome that remains poorly understood. While only 2% of human DNA has been thoroughly characterized, the remaining 98% contains critical regulatory and functional elements that could hold the key to preventing disease, improving diagnosis, and tailoring treatments to individual patients .
What Is the "Dark Matter" of the Human Genome?
For decades, scientists dismissed most of the human genome as "junk DNA" because they didn't understand its function. But modern research has revealed that this 98% of unexplored genetic material actually contains important regulatory mechanisms and functional elements that influence how our bodies work . These hidden regions control which genes turn on and off, how cells respond to disease, and why some people respond differently to the same medications. Decoding these patterns could revolutionize how doctors approach treatment, moving from one-size-fits-all medicine to truly personalized care based on an individual's unique genetic makeup.
The challenge has always been scale. The human genome contains roughly 3 billion base pairs, and analyzing how thousands of genomic regions interact with each other requires computational power and pattern-recognition abilities that traditional methods simply cannot handle efficiently. This is where artificial intelligence enters the picture.
How Are Researchers Using AI to Unlock Genomic Secrets?
The collaboration, announced in November 2025, brings together three powerhouses in healthcare, research, and technology . Sheba Medical Center in Israel and the Icahn School of Medicine at Mount Sinai in New York are contributing extensive genomic datasets, clinical insights, and AI research expertise. NVIDIA is providing advanced computational architecture, AI development platforms, and scientific expertise to power the effort. Together, they are building what researchers call a "Genomic Foundation Model," a specialized AI system trained to recognize patterns and relationships across vast amounts of genetic data.
The approach mirrors how large language models work in other fields. Just as AI can learn to predict the next word in a sentence by analyzing billions of examples, a genomic foundation model learns to predict genetic function and disease risk by analyzing millions of genomic sequences and their associated health outcomes. The initial focus will be on areas of medicine where genetic complexity has long hindered scientific progress, such as understanding how thousands of genomic regions work together to influence disease susceptibility and treatment response .
- Data Sources: Researchers from Sheba and Mount Sinai are contributing extensive genomic datasets collected from their patient populations, providing real-world clinical context for the AI to learn from.
- Computational Power: NVIDIA's full-stack AI platform provides the infrastructure needed to process and analyze genomic data at scale, something that would be prohibitively expensive or time-consuming with traditional computing methods.
- Clinical Translation: The partnership bridges the gap between AI researchers and clinicians, ensuring that discoveries translate into actionable insights for patient care rather than remaining theoretical.
"While approximately two percent of the human genome has been thoroughly characterized, the remaining 98 percent, which was once labeled junk DNA, is increasingly recognized as containing critical regulatory and functional elements," stated Prof. Gidi Rechavi, Head of the Sheba Cancer Research Center and the Wohl Institute of Translational Medicine.
Prof. Gidi Rechavi, Head of the Sheba Cancer Research Center and the Wohl Institute of Translational Medicine
Why Does This Matter for Your Health?
The practical implications are profound. If AI can reliably decode how genetic variation influences disease risk and treatment response, doctors could eventually sequence a patient's entire genome and use that information to prevent disease before it starts, choose the most effective medications with fewer side effects, and design therapies tailored to an individual's unique biology. This is the promise of precision medicine, and it has remained largely out of reach because the science simply wasn't advanced enough to interpret most of the genetic code .
"This collaboration is an important step toward a future where every person can benefit from the power of whole genome sequencing. By bringing advanced AI into genomic research, we're moving closer to making personalized, precision medicine a reality for all," explained Dr. Alexander Charney, Director of the Charles Bronfman Institute for Personalized Medicine at Mount Sinai.
Dr. Alexander Charney, Director of the Charles Bronfman Institute for Personalized Medicine at Mount Sinai
The partnership also signals a broader shift in how biotech research is being conducted. Rather than relying solely on traditional computational biology tools that require specialized programming knowledge, researchers are increasingly turning to AI systems that can work with natural language instructions and handle complex, multi-step analyses automatically. This democratizes genomic research, allowing biologists without deep coding expertise to ask sophisticated questions of their data .
How Can Researchers Access These AI Tools Today?
- Codex for Bioinformatics: OpenAI's Codex, an AI coding assistant integrated into ChatGPT, can help biologists write Python scripts for DNA sequence analysis, read counting, and pipeline development without requiring extensive programming experience. In educational settings, biology students have successfully used Codex to generate functional code for DNA manipulation tasks with 100% success rates .
- Cloud-Based Execution: Unlike earlier code-completion tools, Codex can actually run code in a secure sandbox environment, test it, debug errors, and propose corrections, making it a true collaborative partner rather than just a suggestion engine .
- Multi-File Project Management: Codex can navigate entire code repositories, make edits across multiple files, and manage complex bioinformatics pipelines, which is essential for handling the large-scale data analysis that genomics requires .
Codex, which launched as a research preview in May 2025, is now available through ChatGPT Plus, Pro, and Enterprise plans . The tool has demonstrated error rates around 30 to 50% per attempt, meaning it doesn't always get things right on the first try, but researchers can iteratively refine prompts and have the AI fix issues until the code works correctly. This is a significant improvement over static code-completion tools that simply suggest the next line of code without understanding whether it actually solves the problem .
The convergence of these two developments, AI-powered genomic analysis and AI-assisted coding, suggests that the next phase of biological discovery will be fundamentally different from the past. Scientists will spend less time wrestling with programming syntax and more time asking biological questions. The AI handles the computational grunt work, freeing researchers to focus on interpretation and validation. As these tools mature and more genomic data flows into AI systems trained to find patterns, the hidden 98% of the human genome may finally begin to reveal its secrets.