The $300 Million Plan to Turn America's Forests Into a Drug Discovery Database

America's Living Library Act would create a national genomic database from organisms on federal lands, providing AI systems with the biological data needed to discover new medicines. The bill proposes a $300 million investment over five years to sequence and catalog genomes across hundreds of millions of acres of preserved ecosystems, addressing a critical bottleneck in AI-enabled drug discovery .

Why Are Scientists Missing Out on Nature's Drug Discoveries?

Nature has always been a pharmacy. More than half of all small molecule drugs approved between 1981 and 2019 were derived from or inspired by natural products, according to research published in the Journal of Natural Products . Penicillin came from mold, the cancer drug paclitaxel from a yew tree, and cyclosporine, the immunosuppressant that made organ transplantation possible, from a soil fungus. Yet the vast majority of nature's biosynthetic potential remains untapped.

The problem is data scarcity. A 2024 study in BMC Microbiology analyzed 554 genomes from Paenibacillus, a relatively well-studied bacterial genus, and identified 848 biosynthetic gene clusters (BGCs), which are the genetic instructions that organisms use to produce biologically active compounds like antibiotics and anticancer agents . The striking finding: 84 percent of these clusters were previously unknown. If similar patterns hold across less-studied organisms on U.S. public lands, the scale of undiscovered therapeutic potential is enormous.

The challenge is that many BGCs are not expressed under standard laboratory conditions and can only be identified through genome sequencing. Existing genomic databases are incomplete and skewed toward organisms that are easier to access and study, leaving vast swaths of biological diversity uncharacterized .

How Could AI Use Genomic Data to Accelerate Drug Discovery?

Modern drug development increasingly relies on AI systems trained on large, diverse, and high-quality datasets. Recent advances in protein structure prediction, generative chemistry, and AI-enabled drug design all depend on access to comprehensive biological information . The Living Library Act directly addresses this need by proposing a standardized, publicly accessible genomic database built from organisms found on federal lands.

Genome-mining tools such as antiSMASH and DeepBGC can already identify BGCs and predict their chemical outputs, but their performance is constrained by limited data diversity . Expanding sequencing across underexplored ecosystems would increase the range of biological patterns available for AI model training, improving the ability of these systems to identify novel compounds and biological mechanisms. U.S. public lands encompass hundreds of millions of acres of preserved ecological diversity, including extreme environments that remain largely unexplored at the genomic level. Organisms found in these environments may encode novel proteins, enzymes, or metabolic pathways with significant therapeutic potential.

The bill takes a coordinated, interagency approach spanning the Department of the Interior, the Smithsonian Institution, HHS, USDA, NIH, NSF, and DOE to coordinate data collection, standardization, and access . This coordination is critical because reproducibility and data integration are essential for downstream research applications in biopharmaceutical research and development.

Steps to Build a Strategic National Biological Data Infrastructure

  • Standardization and Interoperability: Policymakers should ensure that genomic data infrastructure is coordinated and standardized to enable effective use across research domains, from drug discovery to synthetic biology and precision therapeutics .
  • Expanded Sequencing Programs: The bill proposes whole-genome sequencing for life on federal lands, targeting organisms in extreme environments and underexplored ecosystems that may encode novel therapeutic compounds .
  • Strategic Investment Recognition: Policymakers should treat biological data infrastructure as a strategic national asset essential to sustaining U.S. leadership in science, technology, and innovation, particularly as global competition in biotechnology intensifies .

Why Is This a Matter of National Competitiveness?

Biotechnology is now an area of intensifying global competition. Leadership in biological data and AI capabilities will shape the next generation of breakthroughs in medicine, agriculture, and advanced manufacturing . China, for example, has prioritized large-scale sequencing and data infrastructure development through institutions such as the Beijing Genomics Institute. Expanding U.S. capabilities in biological data collection would help sustain American leadership in these critical areas.

The total investment of $300 million over five years is modest relative to the potential returns . Even a single therapeutic breakthrough enabled by expanded genomic datasets could yield substantial health and economic benefits. The bill aligns with the White House AI Action Plan's call to "build world-class scientific datasets" by establishing infrastructure that would serve researchers across multiple disciplines and industries for decades to come.