Artificial intelligence is reshaping how doctors identify rare diseases by analyzing real-world patient data across entire health systems, catching conditions earlier and improving outcomes for patients who might otherwise wait years for a diagnosis. A major national initiative called PANDA is deploying AI-driven models across more than 20 health systems and tens of millions of patient records to identify rare diseases in their earliest stages using clinical data that already exists in electronic health records. Why Has Rare Disease Diagnosis Been So Challenging? Rare diseases present a unique diagnostic challenge. By definition, they affect small populations, which means most doctors encounter them infrequentlyâsometimes never in their entire careers. This rarity creates a diagnostic bottleneck: patients often visit multiple specialists over years before receiving a correct diagnosis, during which time their condition may worsen significantly. The shortage of specialists familiar with rare conditions compounds the problem, leaving many patients in diagnostic limbo. Traditional diagnostic approaches rely on pattern recognition built from common diseases. When a patient presents with unusual symptoms, clinicians may miss rare conditions entirely because they're trained to think of common explanations first. This is where AI changes the equation. Machine learning algorithms can process vast amounts of clinical dataâlab results, imaging reports, medication histories, and symptom patternsâto recognize rare disease signatures that human pattern recognition might overlook. How Is AI Using Real-World Patient Data to Catch Rare Diseases Earlier? The PANDA initiative represents a new approach to AI-driven diagnosis. Rather than relying on small research datasets, PANDA integrates data from real clinical environments across multiple health systems, capturing how diseases actually present in diverse patient populations. This real-world evidence is crucial because rare diseases often manifest differently depending on a patient's age, genetics, and other health conditions. The key innovation is federated learningâa privacy-preserving method where AI models are trained across decentralized data without moving sensitive patient information to a central location. Each health system keeps its data secure while contributing to a shared AI model that becomes smarter with every new patient record analyzed. This approach addresses one of healthcare's biggest barriers: the tension between needing large datasets for AI training and protecting patient privacy. "Our work spans statistics, machine learning, and biomedical informatics, with an emphasis on causal reasoning, robustness, and deployment in real clinical environments," explains Dr. Yong Chen, who leads the Penn Computing, Inference, and Learning Lab and serves as founding director of the Center for Health AI and Synthesis of Evidence. "Biomedical data are inherently complexâheterogeneous, decentralized, and shaped by clinical workflows, institutional constraints, and societal expectations around privacy and trust. I view these realities not as limitations, but as the core scientific challenges that medical AI must confront to be credible, useful, and responsible in practice". Steps to Implementing AI-Driven Rare Disease Diagnosis in Clinical Practice - Establish Multi-Site Data Partnerships: Health systems must collaborate to share anonymized patient data through federated learning networks, allowing AI models to learn from millions of records while maintaining privacy and institutional control over sensitive information. - Validate AI Models Against Real Clinical Outcomes: Before deployment, AI diagnostic tools must be tested across diverse patient populations to ensure they work accurately for different age groups, ethnicities, and comorbidities, reducing the risk of bias in rare disease detection. - Integrate AI Into Clinical Workflows: Rather than replacing clinician judgment, AI tools should be embedded into existing diagnostic processes as decision-support systems that flag potential rare disease patterns for specialist review and confirmation. - Monitor and Audit for Fairness: Ongoing evaluation of AI system performance across different demographic groups ensures that rare disease diagnosis remains equitable and that no patient population is systematically missed or misdiagnosed. Real-World Impact: From Data to Faster Diagnoses The evidence for AI's diagnostic power is compelling. For rare diseases, the impact is significant. Dr. Chen's team has led national studies spanning tens of millions of patients, advancing privacy-preserving analytics to address high-impact clinical questions at scale. During the COVID-19 pandemic, his team delivered weekly, actionable evidence to the National Institutes of Health (NIH), Centers for Disease Control and Prevention (CDC), Food and Drug Administration (FDA), and the White House by analyzing data from over 200 researchers and stakeholders across 40 health systems covering the healthcare experiences of more than 12 million childrenârepresenting more than 10% of the U.S. pediatric population. This same infrastructure is now being applied to rare disease diagnosis. The ability to rapidly identify disease patterns across millions of patient records means that a child presenting with unusual symptoms in one health system can be compared against similar presentations across the entire network, dramatically increasing the chances of early recognition. Dr. Chen currently serves as Contact Principal Investigator of PANDA, which brings together more than 20 health systems and tens of millions of patients to develop and deploy AI-driven models for early identification and diagnosis of rare diseases using real-world clinical data. Challenges Remain: Privacy, Bias, and Trust Despite AI's promise, significant challenges must be addressed before widespread adoption. Data privacy remains paramountâhealthcare AI relies on sensitive personal information, and breaches could have serious consequences. Security must be engineered at the core of AI systems, not added afterward, through robust encryption, federated learning, and strict access controls. Bias in training datasets also threatens equity. If AI systems learn from narrow demographic pools, diagnostic errors may disproportionately affect marginalized groups who are underrepresented in medical data. Researchers are now promoting diverse, globally representative datasets and ongoing algorithm audits to uphold fairness. Building trust between clinicians and AI systems is equally important. Doctors need to understand how an AI system reached its conclusionâa concept called explainabilityâso they can evaluate whether the recommendation makes clinical sense for their specific patient. The Future: AI as a Clinical Partner in Rare Disease Detection As research speeds forward and regulatory frameworks mature, AI's role in rare disease diagnosis will deepen. The convergence of AI with emerging technologies promises to turn diagnosis from a reactive process into a proactive one. For patients with rare diseases, this transformation could mean the difference between years of diagnostic uncertainty and early identification when treatment is most effective. For clinicians, AI offers a powerful tool to extend their expertise across conditions they may never encounter in their own practice. The key to success lies in building AI systems that are not just technically sophisticated, but also trustworthy, fair, and integrated into the real-world complexity of clinical care.