How Long-Read Sequencing Is Unlocking Rare Disease Targets Hidden in Complex DNA

For decades, rare disease researchers have been blocked by a fundamental technical limitation: they couldn't read the parts of the genome that matter most. Short-read DNA sequencing, the industry standard, fragments genetic code into tiny pieces and reassembles them like a puzzle. But when those pieces fall into repetitive or duplicated regions, the puzzle falls apart. Now, advances in long-read sequencing technology are changing that equation, opening doors to precision therapeutics for conditions once considered untreatable.

Why Can't Traditional Sequencing Read Complex Genomic Regions?

The challenge lies in how short-read sequencing works. The technology breaks DNA into small fragments, sequences each one individually, then attempts to align them back to a reference genome. While this approach is powerful and scalable, the short read length makes it nearly impossible to reconstruct long repetitive stretches, distinguish between highly similar gene copies, or fully characterize structural variations. For rare disease research, this creates a critical gap: genuine pathogenic mechanisms are sometimes only partially characterized or misclassified entirely.

Long-read sequencing captures extended DNA molecules in single contiguous reads, allowing entire repeat regions to be sequenced without assembly. This delivers precise sizing and structural characterization that short-read methods cannot achieve. While long-read platforms were once considered costly and limited in throughput, newer generations now deliver higher output at substantially lower cost, making genomic resolution more accessible to research and translational settings.

What Three Types of Genomic Complexity Are Now Becoming Tractable?

As resolution improves, three categories of genomic complexity that have long hindered rare disease diagnostics and drug development are becoming solvable:

  • Repeat Expansions: Short DNA sequences repeated multiple times in succession. In healthy individuals, these repeats fall within a stable range, but when the number exceeds a pathogenic threshold, gene function is disrupted. Huntington's disease, caused by expansion of a CAG trinucleotide repeat within the HTT gene, is one of the most well-known examples. When this repeat grows beyond a defined limit, toxic protein aggregates form and drive neurodegeneration. Short-read sequencing struggles because fragmented reads cannot reliably span long, repetitive stretches, obscuring the true length and full sequence of the repeat.
  • Paralogous Genes: Duplicated copies of genes that share very similar DNA sequences, often sitting within segmental duplications that are difficult to analyze. Spinal muscular atrophy illustrates the challenge: the condition is caused by changes in the SMN1 gene, but a nearly identical copy, SMN2, sits alongside it. The number and integrity of SMN2 copies influence disease severity and treatment response. Short-read sequencing often cannot reliably assign variants to the correct paralogue because fragmented reads lack sufficient context, introducing uncertainty in variant calling and copy number estimation.
  • Epigenetic Signals: Regulatory modifications, such as DNA methylation, that switch genes on or off without changing the DNA sequence itself. When these regulatory signals go awry, the consequences can be as severe as a mutation. Traditional sequencing approaches focus primarily on identifying sequence variants and often require separate assays to detect epigenetic changes, leaving important regulatory mechanisms unexplored. Long-read sequencing enables the detection of methylation patterns directly from native DNA alongside sequence information, providing an integrated view that captures both genetic variants and their regulatory context in a single experiment.

How Is This Technology Reshaping Drug Development for Rare Diseases?

The real-world impact became visible in 2025 with a milestone therapeutic development for Huntington's disease. A new therapy demonstrated sustained slowing of disease progression, a breakthrough long considered impossible due to the underlying repeat expansion. This success illustrates what becomes possible when complex genomic architecture is defined with precision. Greater clarity at the sequence level enables new therapeutic strategies to be designed, strengthening the link between genetic insight and clinical intervention. Regions once regarded as technically out of reach are now entering the arena of viable target discovery, reshaping how rare disease biology is translated into precision therapeutics.

For conditions like spinal muscular atrophy, long-read sequencing provides extended DNA reads that span distinguishing regions, enabling accurate phasing and copy number assessment within duplicated loci. For drug discovery, that clarity turns paralogous regions from technical obstacles into defined and potentially viable therapeutic targets. Similarly, by capturing epigenetic signals together with sequence data, researchers move closer to defining targets based on function rather than sequence alone.

What Role Does AI Play in Bridging Genomics and Clinical Medicine?

While sequencing technology advances are critical, another bottleneck remains: translating genomic insights into clinical practice. Carlos Bustamante, a population geneticist who recently joined Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) as Adjunct Professor of Personalized Medicine, has spent his career at the intersection of genomics, statistics, and increasingly, artificial intelligence. He notes that advances in technology have outpaced a healthcare system that could benefit from them.

"Healthcare today is still in many ways what it has been for the past 100 years. It's reactive and based on people going to see their physicians after they become sick," said Carlos Bustamante.

Carlos Bustamante, Adjunct Professor of Personalized Medicine at MBZUAI

The gap is substantial. Sequencing technologies can read more than 90% of the human genome, but the fraction of genes that practitioners know how to meaningfully interpret is much smaller. Of the approximately 20,000 genes in the human genome, clinical geneticists have only ordered tests on perhaps 1,500 or 2,000 at most. For many of these genes, the ability to determine harmful mutations from benign ones remains incomplete.

Bustamante sees artificial intelligence as playing a critical role in medicine, particularly in integrating and interpreting disparate sources of data. Today's health records capture episodic data, such as annual checkups and lab work, missing the continuous, accumulating picture of how a patient is doing over time. New technologies like smart watches and other wearables have the ability to catalog much of this information, but the data is scattered across consumer platforms and largely remains unused in clinical settings. AI can help bridge that gap, while also being used to speed pre-clinical development of new therapies and help determine who might benefit most from certain medications.

Steps to Translate Genomic Discoveries Into Clinical Practice

  • Resolve Complex Genomic Architecture: Use long-read sequencing to precisely characterize repeat expansions, paralogous genes, and epigenetic signals that short-read methods cannot reliably interpret, creating a foundation for target discovery and therapeutic development.
  • Integrate Disparate Data Sources: Deploy artificial intelligence systems to combine episodic health records, continuous wearable data, and genomic information into a unified clinical picture that clinicians can act upon in real time.
  • Expand Interpretable Gene Panels: Work to increase the number of genes for which practitioners can confidently determine pathogenic variants from benign ones, moving beyond the current 1,500 to 2,000 genes that have ordered clinical tests.
  • Develop Precision Therapeutics: Use clarified genomic targets to design targeted interventions, such as antisense oligonucleotide therapies for repeat-driven disorders, that address the underlying biological mechanism rather than symptoms alone.

The convergence of long-read sequencing, artificial intelligence, and a renewed focus on translational medicine is creating a new era in rare disease therapeutics. As Bustamante notes, the future of precision medicine is here, but it is not yet evenly distributed. The challenge now is to accelerate the adoption of these technologies in healthcare systems worldwide, ensuring that genomic insights translate into meaningful clinical benefit for patients.