Skip to content

For a better interpretation of variants, evidence-based databases, such as ClinVar, compile data on the presumed relationships between variants and phenotypes. In this study, we aimed to analyze the pattern of sequencing depth in variants from whole exome sequencing data in the 1000 Genomes project phase 3, focusing on the variants present in th…

Notifications You must be signed in to change notification settings

mgborges/SequencingDepth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SequencingDepth

For a better interpretation of variants, evidence-based databases, such as ClinVar, compile data on the presumed relationships between variants and phenotypes. In this study, we aimed to analyze the pattern of sequencing depth in variants from whole exome sequencing data in the 1000 Genomes project phase 3, focusing on the variants present in the ClinVar database that were predicted to affect protein-coding regions. We demonstrate that the distribution of the sequencing depth varies across different sequencing centers (pair-wise comparison, p<0.001). Most importantly, we found that the distribution pattern of sequencing depth is specific to each facility, making it possible to correctly assign 96.9% of the samples to their sequencing center. Thus, indicating the presence of a systematic bias, related to the methods used in the different facilities, which generates significant variations in breadth and depth in whole exome sequencing data in clinically relevant regions. Our results show that methodological differences, leading to significant heterogeneity in sequencing depth may potentially influence the accuracy of genetic diagnosis. Furthermore, our findings highlight how it is still challenging to integrate results from different sequencing centers, which may also have an impact on genomic research.

S2 File

Variant calling file containing 4,543 variants from ClinVar (vcf). We extracted 282,453 variants from ClinVar (built 20170801, GRCh37.p13) and performed variant annotation using the Ensembl Variant Effect Predictor (VEP version 84) using the default parameters. Four thousand five hundred forty-three variants were classified as exonic and had a predicted impact on the transcripts (121 were classified as high, 2,166 moderate, 1,641 low, and 615 as a modifier).

S3 File

Distribution of depth and PCA analysis for different sequencing centers as depicted in Fig 1 (HTML). Fig 1A shows a complete distribution of depth of sequencing and an interactive 3D version of Fig 1B. Better visualized in Google Chrome.

S4 File

Variation in depth across sequencing centers and coding impact data from Fig 2 (HTML). Heatmap showing the variation of depth across sequencing centers of the 450 variants with higher variance. Each row represents a sample from one of the sequencing centers (BCM - Baylor College of Medicine, BI - Broad Institute, BGI, and WUGC - Washington University Genome Center). The columns represent each one of the variants, with their impact classified as high, moderate, low or modifier. Better visualized in Google Chrome.

About

For a better interpretation of variants, evidence-based databases, such as ClinVar, compile data on the presumed relationships between variants and phenotypes. In this study, we aimed to analyze the pattern of sequencing depth in variants from whole exome sequencing data in the 1000 Genomes project phase 3, focusing on the variants present in th…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published