Long-read Sequencing Revolutionizes Structural Variant Detection
Long-read sequencing technologies have fundamentally changed how we approach structural variant (SV) detection in the human genome. Unlike short-read platforms that struggle with repetitive regions and complex rearrangements, technologies like PacBio HiFi and Oxford Nanopore can span entire structural variants in single reads.
Why Long Reads Matter for SVs
Traditional short-read sequencing (Illumina) produces reads of 150-300bp, which are often insufficient to resolve:
- Large insertions and deletions (>50bp)
- Inversions within repetitive regions
- Complex rearrangements involving multiple breakpoints
- Transposon insertions in heterochromatic regions
Long-read platforms routinely generate reads of 10-20kb (PacBio HiFi) or even >100kb (ONT ultra-long), enabling direct observation of these events.
Applications in Neurodevelopmental Disorders
Our lab has been applying HiFi sequencing to characterize SVs in ASD-associated loci. Several key findings have emerged:
- De novo SVs are more common than previously estimated by short-read studies
- Complex SVs involving multiple breakpoints are frequently missed by conventional pipelines
- Repeat expansions at known NDD loci can now be accurately sized
Pipeline Recommendations
For researchers getting started with long-read SV calling, we recommend:
# PacBio HiFi alignment
pbmm2 align ref.fa reads.bam aligned.bam --preset CCS
# SV calling with pbsv
pbsv discover aligned.bam ref.svsig.gz
pbsv call ref.fa ref.svsig.gz output.vcf
The combination of HiFi accuracy (>99.9%) with long read lengths makes it the current gold standard for clinical SV detection.
As sequencing costs continue to drop, we expect long-read approaches to become standard in both research and clinical genomics settings.