Researchers at the Department of Energy's SLAC National Accelerator Laboratory have developed an AI model that can reconstruct the original 3D structure of molecules after they're blasted apart by powerful X-rays, solving a computational puzzle that has limited molecular imaging for decades. The breakthrough, published in Nature Communications, demonstrates how machine learning can tackle what scientists call "inverse problems," where you work backward from fragmented data to discover hidden structures. What Makes This Molecular Imaging Technique Different? The research focuses on a technique called Coulomb explosion imaging, which works by hitting a single molecule in a vacuum chamber with an X-ray pulse. This rips away the molecule's electrons, leaving behind positive ions that explosively repel from each other and slam into a detector. The detector captures their momentum, which theoretically can be used to reconstruct the original molecular structure. The challenge has always been computational. "It's kind of like breaking a glass and trying to put it back together from how the pieces flew apart," explained Phay Ho, a physicist with the Department of Energy's Argonne National Laboratory and co-author of the study. "Many problems in modern physics and chemistry involve reconstructing hidden structures from indirect measurements. This work demonstrates how AI can help tackle such inverse problems". Unlike electron microscopy, which requires samples to be fixed in place, or diffraction-based techniques, which need dense samples to generate strong detector signals, Coulomb explosion imaging can isolate individual molecules and capture chemically relevant details that would otherwise remain hidden. How Did Researchers Train the AI to Solve This Problem? The team, led by Xiang Li, an associate scientist at SLAC's Linac Coherent Light Source (LCLS), developed a generative AI model they named MOLEXA, short for "molecular structure reconstruction from Coulomb explosion imaging." The key to making it work was a two-step training approach that combined datasets of different sizes and precision levels. First, the researchers generated training data using a physics simulation built by Ho. This simulation analyzed molecular structures and calculated the momentum of their ions following a Coulomb explosion. Running for over a month, the computing-intensive simulation produced a dataset of 76,000 molecular samples using both quantum mechanics and classical physics equations. When the team initially trained MOLEXA on this dataset alone, the model predicted inaccurate structures. So they added a second, much larger dataset derived using only classical physics. Though less precise, this second dataset was roughly 100 times larger than the first one. "We found that this two-step training process suppressed the prediction error by a factor of two," Li noted. - First Training Dataset: 76,000 molecular samples generated from a month-long quantum and classical physics simulation, providing high precision but limited volume - Second Training Dataset: A classical physics-only dataset roughly 100 times larger than the first, offering broader coverage despite lower individual precision - Combined Approach: The two-step training reduced prediction errors by 50 percent, enabling accurate molecular structure reconstruction - Model Testing: MOLEXA was validated on experimental data from the European X-ray Free-Electron Laser facility, testing molecules including water, tetrafluoromethane, and ethanol What Were the Real-World Results? The team tested MOLEXA with experimental datasets recorded at the Small Quantum Systems instrument of the European X-ray Free-Electron Laser facility in Germany. They entered ion momentum data from real experiments into the model, reconstructed molecular structures, and compared the results to known structures listed by the National Institute of Standards and Technology. The predictions largely overlapped with established structures. Overall, the bonds were in the right spots, with only slight variations in their angles. The errors in position were generally less than half the length of a typical chemical bond, with the model often performing even better than that baseline. How Could This Technology Transform Chemistry and Medicine? The immediate applications focus on scaling up the technique. The current model works well for molecules made of fewer than ten atoms, but researchers plan to expand MOLEXA's capabilities to handle larger molecular systems. The real game-changer would be applying the model to time-resolved experiments, creating what Li calls "flip-book-like molecular movies" that show how chemical reactions unfold in real time. "We will be able to study systems that are more biologically or industrially relevant," Li explained. Proteins, for instance, can consist of thousands of atoms, making them far more complex than the current test molecules. If researchers can overcome the challenge of reconstructing molecules from incomplete detector data, the technique could become widely applicable in biology and chemistry research. The team is already testing whether MOLEXA can reconstruct molecules even when the detector misses some ions produced in the Coulomb explosion. This real-world scenario happens frequently in experiments, so solving it would make the technique much more practical for everyday research. Steps to Advance Molecular Imaging With AI - Expand Atomic Scale: Scale the machine learning model to reconstruct molecules with significantly more atoms, moving beyond the current ten-atom limit toward biologically relevant proteins and complex compounds - Implement Time-Resolved Imaging: Apply MOLEXA to time-resolved experiments at SLAC's Linac Coherent Light Source and the European XFEL to capture snapshots of molecules during active chemical reactions - Handle Incomplete Data: Develop the model's ability to reconstruct accurate molecular structures even when detector systems miss some ions, improving real-world applicability in laboratory settings - Integrate With High-Pulse-Rate Lasers: Adapt the technique to work with SLAC's superconducting X-ray laser, which delivers X-ray pulses at rates that current reconstruction methods struggle to interpret This breakthrough represents a significant step forward in molecular imaging, but it also highlights a broader trend in scientific research. Across the Department of Energy's national laboratories, researchers are increasingly merging artificial intelligence with advanced instrumentation to push the boundaries of what's possible. At Oak Ridge National Laboratory, for example, scientists are combining AI, automation, and leading-edge microscopes to create autonomous workflows that enhance researchers' abilities to answer some of the toughest questions in science, from advances in computer memory to quantum technology. The MOLEXA research demonstrates that when AI is applied thoughtfully to well-defined scientific problems, it can overcome computational barriers that have limited human researchers for decades. As these techniques mature and scale to larger molecular systems, they could fundamentally change how scientists study chemical reactions, design new materials, and develop new medicines.