DATE: December 18,2018
TIME: 10:00am PST, 1:00pm EST
Precision medicine anticipates the clinical application of whole-genome sequencing (WGS), as evidenced by the commitment of large-scale clinical programs such as All of Us to generating and reporting WGS data for millions of individuals. Execution of the required data analysis, from sample accession to variant calling to interpretation and return of results, requires scientific innovation in clinically compliant environments, at unprecedented scales. To fully leverage the value of whole-genome sequencing, variant detection methods must capture the full spectrum of genetic architecture across strongly heterogeneous sets. Here we describe an at-scale informatics model for at-scale WGS, drawing from methods hardened in existing large-scale programs that span many ethnicities, sequencing platforms, data quality and experimental design.
Experience with large-scale WGS has resulted in a collection of software methods, cloud-based protocols, and harmonized best practices for comprehensive variant data analysis across multiple experimental designs. The unified HGSC application includes full small (xAtlas, GLnexus) and structural variant (Parliament2, muCNV) calling via consensus methods optimized for the NIH-compliant GRCh38 protocol and container-based cloud deployment. Applied to more than 50,000 whole genomes across dozens of experimental designs and ethnicities, we have characterized the impact of coverage depth, read quality, and sequencing platform on genomic variation assessment, harmonizing HiSeq X and NovaSeq heterogeneities and resulting in quality control “better” practices validated against high-confidence truth sets and genotype-phenotype association studies. Specifically, a 35,000 multi-ethnic cohort association of SVs and cardiovascular phenotypes suggests the need for comprehensive variation in the clinic.
These large-scale research efforts are joined with the HGSC Clinical Laboratory’s clinical infrastructure, including automated clinical reports (Neptune), HIPAA-compliant PHI intake and tracking (eDAP) and clinic-grade CNV calling (Atlas-CNV). This infrastructure has processed more than 35,000 clinical samples, returning automated reports for ACMG 59 genes, pharmacogenomic regions, and cardiovascular risk score sites. For the All of Us program, the HGSC has merged these functionalities into a single clinical informatics environment that will serve thousands of users and serve as a model for WGS clinical population analysis.