The real barrier to faster drug discovery isn't smarter algorithms,it's the ability to access, standardize, and share data across competing organizations. While headlines celebrate AI breakthroughs in molecular design and protein folding, a quieter but equally important shift is happening behind the scenes. Two major announcements in March 2026 reveal that the pharmaceutical industry is finally investing in the foundational data infrastructure that could make AI drug discovery tools actually useful at scale. Why Data Infrastructure Matters More Than You'd Think? Drug development generates enormous amounts of valuable data, but historically, this information stays siloed within individual companies or research institutions. Christopher Lunt, who spent seven years building the technology infrastructure for the National Institutes of Health's All of Us Research Program, recently joined the Critical Path Institute as Chief Data and Technology Officer. His mission: to make shared data truly accessible and useful across the pharmaceutical ecosystem. At the All of Us Research Program, Lunt led the development of a secure, cloud-based research platform that now supports more than 6,000 registered researchers and has generated more than 170 peer-reviewed publications. The program collects genetic, lifestyle, environmental, and clinical data from more than 1 million diverse volunteers across the United States. This experience directly informs his new role at C-Path, where he will oversee the Critical Path Data and Analytics Platform (CP-DAP) and the Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP), which serve more than 1,600 scientists across a dozen active consortia. "Drug development generates enormous amounts of valuable data, but we shouldn't continue to miss the opportunity to maximize its value through data sharing and integration. When scientists can query a standardized dataset and get an answer in minutes instead of months, that changes the trajectory for patients and families who are waiting," said Christopher Lunt, Chief Data and Technology Officer at Critical Path Institute. Christopher Lunt, Chief Data and Technology Officer at Critical Path Institute The problem Lunt is addressing is deceptively simple but extraordinarily difficult to solve. Researchers across different organizations use different data formats, different naming conventions, and different quality standards. A biomarker measured in one lab might be recorded differently in another. A clinical outcome in one study might be defined slightly differently in the next. These small inconsistencies compound, making it nearly impossible to combine datasets or run analyses across multiple studies without months of manual data cleaning. How to Build Better Data Infrastructure for Drug Discovery - Standardize Data Formats: Establish common definitions and formats across consortia so that datasets from different organizations can be combined without extensive manual cleaning and reformatting. - Invest in Secure Cloud Platforms: Build centralized, cloud-based research platforms that allow thousands of researchers to access standardized datasets while maintaining strict security and privacy controls. - Integrate AI and Machine Learning: Layer AI capabilities on top of standardized data infrastructure to enable researchers to query datasets in natural language and receive answers in minutes rather than months. C-Path's expansion into data infrastructure comes at a critical moment. The organization is entering a period of significant growth, with increasing demand for integrated analytics, standardized datasets, and AI-enabled tools across its programs. Lunt's priorities include advancing the CP-DAP platform, expanding the RDCA-DAP for rare disease research, improving data standardization and interoperability across consortia, and integrating AI and machine learning capabilities into regulatory-grade drug development tools. How AI Agents Are Making Data Analysis Accessible to Biologists? While C-Path focuses on the infrastructure layer, Insilico Medicine is tackling a related problem from a different angle: making sophisticated AI-driven analysis accessible to biologists who aren't trained in computational methods. The company launched PandaClaw in March 2026, a new feature of its PandaOmics platform that combines AI agents with biological and bioinformatics workflows. Traditionally, AI-enabled drug discovery has required "bilingual" professionals who are fluent in both biomedicine and artificial intelligence. Training such talent often takes longer than developing the software itself. PandaClaw addresses this bottleneck by enabling biologists to conduct sophisticated analyses through a simple natural language interface, without requiring specialized computational training. The tool operates through three core components. First, an Agent Core modeled on the workflow-driven logic of experienced biologists. Second, proprietary Data Warehouses curated by cross-disciplinary teams of data scientists and biologists. Third, a Skills library built on the analytical reasoning of veteran bioinformaticians. When presented with a research objective, the Agent Core autonomously formulates a multi-step analytical workflow, parsing natural language requests into distinct tasks. "PandaClaw is far more than a sophisticated search engine; it is a comprehensive autonomous agent designed to mirror the logic and expertise of seasoned biologists and bioinformaticians. While our foundational PandaOmics platform has long provided industry-leading quantitative results in target rankings and indication prioritization, PandaClaw transcends these capabilities by delivering qualitative real-time multi-omics analyses with in-depth data interpretations," explained Dr. Frank Pun, Head of Insilico Medicine Hong Kong. Dr. Frank Pun, Head of Insilico Medicine Hong Kong The system integrates over 140 specialized scientific skills and more than 1,000 bioinformatics tools. It aggregates and cross-references multi-omics datasets by drawing from native access to the PandaOmics platform, internal data warehouses, external biological databases, and proprietary user data. To ensure scientific integrity, the agent autonomously diagnoses and self-corrects formatting issues or data anomalies within an isolated local sandbox before producing high-quality, figure-rich reports. The practical impact is substantial. Insilico has nominated 20 preclinical drug candidates from 2021 to 2024, with an average timeline from project initiation to preclinical candidate nomination of just 12 to 18 months per program. In contrast, traditional early-stage drug discovery typically requires an average of 4.5 years. Each Insilico program synthesized and tested only 60 to 200 molecules, compared to the thousands typically required in conventional approaches. What's the Real Bottleneck in AI Drug Discovery Today? These two developments reveal a crucial insight: the bottleneck in AI drug discovery isn't computational power or algorithmic sophistication. It's the ability to access, standardize, and interpret data at scale. Faster computers and smarter models won't help if researchers can't query datasets efficiently or if data from different sources can't be combined meaningfully. C-Path's appointment of Lunt signals that the pharmaceutical industry recognizes this challenge. The organization now serves more than 1,600 scientists across multiple consortia, generating demand for integrated analytics and standardized datasets that didn't exist five years ago. Similarly, Insilico's investment in PandaClaw suggests that companies building AI drug discovery tools recognize they need to democratize access to these capabilities, not just improve their underlying algorithms. The convergence of these two trends points toward a future where drug discovery accelerates not because of a single breakthrough algorithm, but because of better infrastructure that allows researchers to collaborate more effectively, share data more easily, and leverage AI tools without requiring specialized computational expertise. For patients waiting for new treatments, that infrastructure investment might ultimately matter more than the next generation of AI models.