-
Notifications
You must be signed in to change notification settings - Fork 18
Output files and formats
Brian Haas edited this page Jan 8, 2021
·
14 revisions
The primary output files generated by the pipeline include the following:
- variants.HC_init.wAnnot.vcf : the initially predicted variants
- variants.HC_hard_cutoffs_applied.vcf : variants after applying hard cutoffs to remove likely false positives. The hard cutoffs applied via 'GATK VariantFiltration' are: " -window 35 -cluster 3 -filter FS > 30 -filter QD < 2.0 -filter SPLICEADJ < 3 "
- cancer.vcf : the subset of variants that are considered most relevant to cancer biology. These are selected based on the variant annotations requiring: gnomad AF < 0.01 and (CHASM or VEST pVal < 0.05, FATHMM in ["CANCER", "PATHOGENIC"], or clinvar =~ /pathogenic/i )
- igvjs_viewer.html : self-contained web-application for interactively navigating the cancer variants.
If the RVBLR boosting method is applied, the final variants file should appear as:
- variants.HC_init.wAnnot.vcf.gz.RVBLR_min0.050.vcf.gz
The variant annotations and descriptions include:
Column | Description |
---|---|
CHROM | Chromosome |
POS | The 1-based position of the variation on the given sequence. |
REF | Base(s) at position in the reference genome (hg38) |
ALT | Alternate base(s) |
GENE | Gene name DP - combined depth across samples |
QUAL | A quality score associated with the inference of the given alleles. |
MQ | RMS mapping quality |
RNAEDIT | A known or predicted RNA-editing site |
RPT | Repeat family from UCSC Genome Browser Repeatmasker Annotations |
SPLICEADJ | Variant is within specified distance of a reference exon splice boundary |
FATHMM | FATHMM (Functional Analysis through Hidden Markov Models). 'Pathogenic':Cancer or damaging 'Neutral':Passanger or Tolerated. |
CHASM_PVALUE | Empirical p-value (probability that passenger variant is misclassified as a driver). |
CHASM_FDR | False discovery rate expected (Benjamini-Hochberg multiple testing correction). |
VEST_PVALUE | Empirical p-value (probability that benign variant is misclassified as pathogenic). |
VEST_FDR | Composite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silentvariants in the gene combined with Stouffer’s Z-score method. |
MuPIT | MuPIT 3D structure variant link |