The Data Infrastructure Problem Nobody's Talking About: Why AI Drug Discovery Needs Better Plumbing

Q: Why Data Infrastructure Matters More Than You'd Think?

Drug development generates enormous amounts of valuable data, but historically, this information stays siloed within individual companies or research institutions. Christopher Lunt, who spent seven years building the technology infrastructure for the National Institutes of Health's All of Us Research Program, recently joined the Critical Path Institute as Chief Data and Technology Officer. His mission: to make shared data truly accessible and useful across the pharmaceutical ecosystem . At the All of Us Research Program, Lunt led the development of a secure, cloud-based research platform that now supports more than 6,000 registered researchers and has generated more than 170 peer-reviewed publications. The program collects genetic, lifestyle, environmental, and clinical data from more than 1 million diverse volunteers across the United States. This experience directly informs his new role at C-Path, where he will oversee the Critical Path Data and Analytics Platform (CP-DAP) and the Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP), which serve more than 1,600 scientists across a dozen active consortia . "Drug development generates enormous amounts of valuable data, but we shouldn't continue to miss the opportunity to maximize its value through data sharing and integration. When scientists can query a standardized dataset and get an answer in minutes instead of months, that changes the trajectory for patients and families who are waiting," said Christopher Lunt, Chief Data and Technology Officer at Critical Path Institute. The problem Lunt is addressing is deceptively simple but extraordinarily difficult to solve. Researchers across different organizations use different data formats, different naming conventions, and different quality standards. A biomarker measured in one lab might be recorded differently in another. A clinical outcome in one study might be defined slightly differently in the next. These small inconsistencies compound, making it

Q: How AI Agents Are Making Data Analysis Accessible to Biologists?

While C-Path focuses on the infrastructure layer, Insilico Medicine is tackling a related problem from a different angle: making sophisticated AI-driven analysis accessible to biologists who aren't trained in computational methods. The company launched PandaClaw in March 2026, a new feature of its PandaOmics platform that combines AI agents with biological and bioinformatics workflows . Traditionally, AI-enabled drug discovery has required "bilingual" professionals who are fluent in both biomedicine and artificial intelligence. Training such talent often takes longer than developing the software itself. PandaClaw addresses this bottleneck by enabling biologists to conduct sophisticated analyses through a simple natural language interface, without requiring specialized computational training . The tool operates through three core components. First, an Agent Core modeled on the workflow-driven logic of experienced biologists. Second, proprietary Data Warehouses curated by cross-disciplinary teams of data scientists and biologists. Third, a Skills library built on the analytical reasoning of veteran bioinformaticians. When presented with a research objective, the Agent Core autonomously formulates a multi-step analytical workflow, parsing natural language requests into distinct tasks . "PandaClaw is far more than a sophisticated search engine; it is a comprehensive autonomous agent designed to mirror the logic and expertise of seasoned biologists and bioinformaticians. While our foundational PandaOmics platform has long provided industry-leading quantitative results in target rankings and indication prioritization, PandaClaw transcends these capabilities by delivering qualitative real-time multi-omics analyses with in-depth data interpretations," explained Dr. Frank Pun, Head of Insilico Medicine Hong Kong. The system integrates over 140 specialized scientific skills and more than 1,000 bioinformatics tools. It aggregates and cross-references multi-omics datasets by dr

Q: What's the Real Bottleneck in AI Drug Discovery Today?

These two developments reveal a crucial insight: the bottleneck in AI drug discovery isn't computational power or algorithmic sophistication. It's the ability to access, standardize, and interpret data at scale. Faster computers and smarter models won't help if researchers can't query datasets efficiently or if data from different sources can't be combined meaningfully . C-Path's appointment of Lunt signals that the pharmaceutical industry recognizes this challenge. The organization now serves more than 1,600 scientists across multiple consortia, generating demand for integrated analytics and standardized datasets that didn't exist five years ago. Similarly, Insilico's investment in PandaClaw suggests that companies building AI drug discovery tools recognize they need to democratize access to these capabilities, not just improve their underlying algorithms . The convergence of these two trends points toward a future where drug discovery accelerates not because of a single breakthrough algorithm, but because of better infrastructure that allows researchers to collaborate more effectively, share data more easily, and leverage AI tools without requiring specialized computational expertise. For patients waiting for new treatments, that infrastructure investment might ultimately matter more than the next generation of AI models.

FrontierNews.ai AI Research Desk

FrontierNews.ai