Artificial intelligence has moved from a promising experiment in drug discovery to a permanent part of how pharmaceutical companies and research labs design new medicines. But beneath the headlines about AI-discovered antibiotics and generative models lies a harder truth: most AI initiatives in biotech are stalling not because the algorithms don't work, but because organizations haven't built the foundational systems to make them work reliably in the real world .

What Changed in 2025: AI Became Infrastructure, Not Just a Tool

For years, AI in drug discovery was treated as a specialized experiment. A team would train a model, test it on a dataset, publish results, and move on. But in 2025, something shifted. As AI models became embedded into actual lab workflows, data pipelines, and regulatory submissions, they started behaving like infrastructure. That means they need monitoring, version control, clear ownership, and change management .

A 2025 study by MIT found that nearly 95% of enterprise generative AI pilots failed to deliver measurable business impact, most often because systems remained disconnected from real workflows, data foundations, and organizational ownership . For biotech companies, this disconnect is particularly costly. A model that works in isolation doesn't help if it can't integrate with existing lab systems, if the data feeding it is inconsistent, or if nobody owns the responsibility for keeping it running.

The market itself is growing steadily. The U.S. AI in biotechnology market reached approximately $2.1 billion in 2025, with growth driven primarily by adoption in drug discovery, genomics, and precision medicine . But that growth masks a deeper challenge: scaling AI requires solving problems that have nothing to do with machine learning.

Why Data Quality Matters More Than Data Volume?

One of the biggest lessons from 2025 was that more data doesn't automatically mean better AI. Teams across the industry discovered that the real bottleneck is data maturity: whether datasets are consistently annotated, properly labeled with context, and aligned across different experiments .

Consider a real example from recent research. Studies combining gene expression data with high-content imaging showed that model performance improved not by simply adding more data types, but by explicitly aligning experimental conditions, cell states, and metadata across those data types. When gene expression profiles are interpreted alongside imaging features under the same experimental context, models can distinguish mechanism-driven effects from noise and batch artifacts. The value comes from integration quality, not volume .

This distinction is reshaping how biotech companies invest in AI. Instead of funding new models, teams are now prioritizing:

Data governance: Standardized schemas, controlled vocabularies, and clear data lineage so models can be reused across programs
Metadata consistency: Biological context plus technical consistency, ensuring that datasets collected for different purposes can still be combined reliably
Integration quality: The ability to align multimodal data (transcriptomics, imaging, perturbation experiments) so AI can learn meaningful patterns rather than noise

From Discovery to Clinic: How AI-Designed Antibiotics Are Actually Moving Forward

While infrastructure challenges slow many AI projects, some are breaking through. The most concrete example comes from MIT researchers who used generative AI to design entirely new antibiotics from scratch. In 2025, they published results showing how genetic algorithms and variational autoencoders (a type of neural network) generated millions of candidate molecules. After computational filtering and medicinal chemistry review, they synthesized 24 compounds and tested them experimentally .

Seven showed selective antibacterial activity. Two were particularly promising: NG1, which eradicated multidrug-resistant Neisseria gonorrhoeae including strains resistant to first-line therapies, and DN1, which targeted methicillin-resistant Staphylococcus aureus (MRSA) and cleared infections in mice through broad membrane disruption. Both were non-toxic and showed low rates of resistance .

"In 2025, our lab published a study in Cell demonstrating how generative AI can be used to design completely new antibiotics from scratch. We used genetic algorithms and variational autoencoders to generate millions of candidate molecules, exploring both fragment-based designs and entirely unconstrained chemical space," said James Collins, the Termeer Professor of Medical Engineering and Science at MIT.
James Collins, Termeer Professor of Medical Engineering and Science, MIT

What makes this work is the integration of computational design with experimental validation. Collins' lab didn't just rely on AI predictions; they synthesized candidates and tested them in biological systems. This combination of AI-generated hypotheses with wet lab confirmation is becoming the standard for credible drug discovery .

How Regulatory Guidance Is Reshaping AI in Drug Development

In January 2025, the FDA published draft guidance outlining a risk-based credibility assessment framework for AI models used to generate information supporting regulatory decisions. The guidance emphasizes "context of use" and ongoing performance evaluation . This is important because it signals that regulators are moving beyond skepticism toward structured oversight.

The FDA also outlined plans to reduce, refine, or potentially replace certain animal testing requirements and to promote new approach methodologies (NAMs) in appropriate contexts, including organoids, organ-on-chip systems, and computational models . This creates an opening for AI-supported digital representations of biological systems, sometimes called "digital cells," which integrate multimodal data to simulate cellular behavior under different conditions .

For discovery teams, the practical takeaway is clear: if AI outputs could later influence regulated claims or submissions, traceability and validation are easier to build early than retrofit later. This is pushing more biotech companies to invest in documentation and governance infrastructure alongside their AI models.

Steps to Build AI Infrastructure That Actually Works in Biotech

Start with data governance: Before deploying any AI model, establish standardized schemas, controlled vocabularies, and clear data lineage. This foundation makes it possible to reuse models across programs and compare experiments reliably
Integrate AI into existing workflows: Don't treat AI as a separate experiment. Build it into lab automation, data pipelines, and decision-support systems so it becomes part of how work actually gets done
Plan for ongoing maintenance: Once deployed, AI systems need monitoring, versioning, and change control. Budget for data drift detection, model retraining, and performance evaluation as recurring costs, not one-time expenses
Combine computational predictions with experimental validation: AI-generated hypotheses are only credible when tested in biological systems. Plan for synthesis, testing, and iterative refinement rather than relying on predictions alone
Document for regulatory readiness: If AI outputs could influence regulatory submissions, build traceability and validation into the system from the start. This is much cheaper than retrofitting documentation later

What's Next: Platforms Over Point Solutions

2025 reinforced a shift toward platform-oriented strategies in biotech. Large pharmaceutical companies are increasingly pursuing longer-term approaches that integrate data, models, and workflows rather than adopting isolated tools . This reflects a maturation in how the industry thinks about AI: not as a breakthrough algorithm, but as a system that needs to be built, maintained, and continuously improved.

The nonprofit Phare Bio, co-founded by Collins, exemplifies this approach. The organization received a grant from ARPA-H (Advanced Research Projects Agency for Health) to use generative AI to design 15 new antibiotics and develop them as preclinical candidates. By integrating generative AI, biology, and translational partnerships, the goal is to create a pipeline that can respond more rapidly to antibiotic resistance .

This is the real story of AI in drug discovery in 2026 and beyond: not whether AI can design molecules, but whether organizations can build the infrastructure, governance, and partnerships to move those molecules from the computer to the clinic. The algorithms are ready. The infrastructure is still catching up.