Basecamp Research announced the Trillion Gene Atlas, a landmark initiative that will expand the genetic data available to AI systems by 100 times by collecting DNA from over 100 million new species across thousands of global sites. This massive expansion aims to solve a critical bottleneck in AI-driven drug discovery: the fact that current biological AI models are trained on an extremely narrow slice of Earth's genetic diversity, limiting their ability to design effective new medicines. The project represents a fundamental shift in how AI learns to design therapeutics. Today, about 80 percent of sequence-based foundation models rely on the same public genetic databases, with most trained on repositories containing fewer than 250 million sequences. By contrast, Basecamp's proprietary genomic database, called BaseData, is already more than 10 times larger than all public resources combined, and the Trillion Gene Atlas will expand that advantage even further. What Makes This Different From Previous Genomics Projects? The Trillion Gene Atlas isn't just about collecting more data; it's about collecting better, more diverse data from ecosystems that traditional laboratories have never reached. Over the past six years, Basecamp Research built a network of scientific collaborators across 31 countries, establishing what the company calls a "scalable evolutionary genomics pipeline purpose-built for AI training". The company uses fully off-grid DNA sequencing technologies to collect high-quality genomic data from remote ecosystems, and it's now announcing new partnerships in Chile, Argentina, and an expanded collaboration in Antarctica. This approach is grounded in what Basecamp calls "equitable Access and Benefit-Sharing agreements aligned with emerging Digital Sequence Information regulations," meaning the company invests in scientific infrastructure and training within partner regions rather than simply extracting genetic data. This framework addresses a longstanding ethical concern in genomics: that wealthy institutions in developed countries have historically benefited from genetic resources collected in developing nations without adequate compensation or capacity building. How Does This Enable AI to Design Better Medicines? The connection between genetic diversity and AI drug design capability is surprisingly direct. Basecamp's EDEN foundation models, released in January 2026, demonstrated that when AI systems train on larger, richer biological datasets, their capabilities improve dramatically. The company calls this discovery "new scaling laws" for AI in biology, meaning that performance improvements follow much steeper trajectories as data quality and diversity increase. To illustrate the practical impact: EDEN became the first AI model capable of designing diverse therapeutics directly from a disease prompt, without requiring human or clinical data. In wet-lab validation, the model demonstrated "zero-shot activity" in primary human T-cells, meaning it designed treatments that worked on the first try without prior training on human cell data. The model has also designed targeted antimicrobial peptides with a 97 percent hit rate against priority pathogens and pioneered a technique called AI-Programmable Gene Insertion (aiPGI) to insert healthy genes into cells. "Bigger models alone aren't enough," explained Phil Lorenz, Chief Technology Officer of Basecamp Research. "EDEN showed that performance in biological AI follows much steeper scaling trajectories with higher quality and fully contextualized data. The Trillion Gene Atlas extends that principle 100-fold". Steps to Understand the Technical Infrastructure Behind the Atlas - DNA Sequencing Technology: Basecamp partnered with Ultima Genomics and PacBio to deliver industrial-scale sequencing. Ultima's UG200 Series uses a wafer-based sequencing architecture to enable high-throughput whole-genome sequencing at low cost, while PacBio's HiFi sequencing delivers highly accurate long reads that preserve full genomic context. - Computing Infrastructure: NVIDIA's accelerated computing infrastructure will process vast quantities of genetic data at the petabase scale, using NVIDIA Parabricks to accelerate metagenomic assembly, which is the process of reconstructing genomes from complex environmental samples. - AI Integration: Anthropic is contributing Claude, its large language model (LLM), to combine advanced reasoning capabilities with EDEN's therapeutic design abilities, creating an integrated workflow for interpreting clinical data and translating it into therapeutic designs. The computational challenge is staggering. Processing quadrillions of DNA base pairs, a task that would have previously taken over 20 years, is expected to take less than two years thanks to parallelized data processing, automated annotation, and large-scale model training. This compression of sequencing, assembly, annotation, and model training is intended to expand the performance and scope of biological foundation models across therapeutic development. Why Is Genetic Diversity the Missing Piece in AI Drug Discovery? The fundamental problem that the Trillion Gene Atlas addresses is what researchers call the "data wall" in evolutionary biology. As Gilad Almogy, Founder and Chief Executive Officer of Ultima Genomics, explained: "Biology has been fundamentally data-starved when compared to other fields like language or computer vision as researchers have lacked the tools required to generate data at scale". This data scarcity has constrained AI's ability to learn from the full diversity of life on Earth. By expanding known evolutionary genetic diversity by 100 times, the Trillion Gene Atlas aims to provide AI systems with a much richer understanding of how genes function across different species, environments, and evolutionary contexts. This expanded knowledge base should enable AI to design medicines that work across a wider range of diseases and treatment modalities, rather than being limited to the narrow genetic patterns present in existing public databases. Glen Gowers, Co-founder and Chief Executive Officer of Basecamp Research, framed the initiative's significance at SXSW: "Today's biological AI models are trained on a narrow slice of life on Earth. The Trillion Gene Atlas expands the known genetic universe by orders of magnitude beyond what is in public databases. Training models at this scale establishes a new paradigm for programmable therapeutic design". The scale of this initiative is comparable to the Human Genome Project, the landmark effort that sequenced the entire human genome in the early 2000s. However, while the Human Genome Project focused on a single species, the Trillion Gene Atlas is collecting data from over 100 million species, representing an exponential increase in biological diversity. The implications for medicine are profound. If AI systems can learn from the genetic strategies that evolution has tested across millions of species, they may be able to design therapeutics that are more effective, more specific, and less likely to trigger resistance or side effects. The next few years will reveal whether this hypothesis holds true in clinical practice.