How AI Just Cracked the Code on Finding New Drugs Faster
Researchers at Michigan State University have developed an artificial intelligence system that can predict how chemical compounds will influence gene expression based solely on their molecular structure, dramatically accelerating the drug discovery process. The team trained a machine learning model called GPS (Gene expression profile Predictor on chemical Structures) on millions of experimental measurements, then used it to identify promising therapeutic candidates for hepatocellular carcinoma, the third leading cause of cancer-related death worldwide, and idiopathic pulmonary fibrosis, a chronic lung disease with a median survival rate of three years after diagnosis .
How Does This AI Approach Actually Work?
The GPS system operates by learning patterns from enormous amounts of published biological data. Rather than requiring researchers to manually test millions of chemicals against hundreds or thousands of genes, the AI can predict outcomes based on chemical structure alone. The process mirrors how neural networks learn to classify images as cats or dogs, except the classification problem is biological rather than visual.
"In our approach, instead of looking at cats or dogs, we want to know whether the compound is either going to regulate up or down the expression of a specific gene. It's still a classification problem, but more biologically driven," said Bin Chen, associate professor at the College of Human Medicine in the departments of Pediatrics and Human Development and Pharmacology and Toxicology.
Bin Chen, Associate Professor, Michigan State University College of Human Medicine
One of the key challenges the team overcame was handling messy biological data. Unlike clean datasets, biological information often contains fuzzy signals and potentially misleading examples. Jiayu Zhou, formerly at MSU and now at the University of Michigan, helped develop an approach that allows the model to separate strong signals from weak ones, preventing noise from throwing off the learning process .
What Results Did the Study Actually Produce?
The research, published in the journal Cell, involved over 20 researchers across multiple disciplines and institutes. When the team tested their AI-discovered compounds in real-world laboratory settings, the results were promising. For hepatocellular carcinoma, two new compounds reduced tumor size when tested on mice. For idiopathic pulmonary fibrosis, researchers identified one repurposed drug and two new compounds that showed promise in both mouse models and samples of human lung tissue .
The human lung tissue testing was particularly significant because it came from samples provided by Corewell Health's lung transplant program in Grand Rapids, Michigan, the busiest such program in the state. Because pulmonary fibrosis is the leading indicator for lung transplants, the program had access to explanted lung tissue that researchers could test as live cultures.
"I think this is the best way to advance medical knowledge, for clinicians to work side by side with biologists, and now, computational people. That is really key to advance research," said Reda Girgis, medical director of the transplant program and a pulmonologist.
Reda Girgis, Medical Director, Corewell Health Lung Transplant Program
Steps to Validate AI-Discovered Drugs for Clinical Use
- Computational Prediction: The GPS model predicts how chemical compounds will influence gene expression based on molecular structure, narrowing down millions of possibilities to the most promising candidates.
- Chemical Synthesis and Optimization: Once identified, compounds must be created in the laboratory and optimized into safe and effective drugs, a critical step that requires expertise in medicinal chemistry.
- Cell Line Testing: Compounds are tested on cell lines in the laboratory to confirm their influence on genes and identify leading candidates for further testing.
- Animal Model Validation: Promising compounds are tested in living organisms, typically mice, to assess their effectiveness and safety in a biological system.
- Human Tissue Testing: For certain diseases, compounds can be tested on human tissue samples, providing additional validation before moving toward clinical trials.
Edmund Ellsworth, director of the MSU Medicinal Chemistry Facility and a professor in the Department of Pharmacology and Toxicology, emphasized that this validation phase represents just the beginning of a complex process. He noted that drug discovery requires diverse expertise and persistence to overcome the many obstacles that arise during development .
"To move forward, it must be recognized that drug discovery is a team sport, and not for the faint of heart. It's complicated, all sorts of things happen, and you need the diversity of experts to overcome and be successful," said Ellsworth.
Edmund Ellsworth, Director, MSU Medicinal Chemistry Facility
Why Does This Matter for Future Drug Development?
Traditional drug discovery has struggled with both of the diseases targeted in this study. For hepatocellular carcinoma, the incidence continues to increase in the United States, creating an urgent need for novel and more effective compounds that can address the molecular diversity of the disease. For idiopathic pulmonary fibrosis, there have been numerous failures to identify new drugs over the past 20 years, making any new approach potentially transformative .
Xiaopeng Li, associate professor in the Department of Pediatrics and Human Development at MSU, noted that the AI component helped researchers probe the problem differently and more systematically than previous approaches. The ability to examine how chemicals influence thousands of genes simultaneously represents a fundamental shift in how drug discovery can be conducted .
The research team has made their approach accessible to the broader scientific community by sharing their code and developing a web portal for researchers to use GPS for virtual compound screening. This democratization of the technology could accelerate drug discovery efforts across multiple disease areas beyond the two diseases studied in this research .
"I think this already has been proved that this platform can be applied to two totally different diseases. So this platform can be used for other diseases, to just unleash the potential," said Bin Chen.
Bin Chen, Associate Professor, Michigan State University College of Human Medicine
The research was supported by the National Institutes of Health, the National Science Foundation, a Michigan State University Strategic Partnership Grant, Corewell Health-Michigan State University Alliance Corporation, and several foundations focused on liver cancer research . The interdisciplinary collaboration that produced these results demonstrates how combining computational methods, bench science, and clinical expertise can accelerate the path from molecular discovery to potential therapeutic benefit.