Skip to content

Output files and formats

Brian Haas edited this page Jan 8, 2021 · 14 revisions

CTAT-mutations Output Files and Formats

The primary output files generated by the pipeline include the following:

  • variants.HC_init.wAnnot.vcf : the initially predicted variants
  • variants.HC_hard_cutoffs_applied.vcf : variants after applying hard cutoffs to remove likely false positives. The hard cutoffs applied via 'GATK VariantFiltration' are: " -window 35 -cluster 3 -filter FS > 30 -filter QD < 2.0 -filter SPLICEADJ < 3 "
  • cancer.vcf : the subset of variants that are considered most relevant to cancer biology. These are selected based on the variant annotations requiring: gnomad AF < 0.01 and (CHASM or VEST pVal < 0.05, FATHMM in ["CANCER", "PATHOGENIC"], or clinvar =~ /pathogenic/i )
  • igvjs_viewer.html : self-contained web-application for interactively navigating the cancer variants.

If the RVBLR boosting method is applied, the final variants file should appear as:

  • variants.HC_init.wAnnot.vcf.gz.RVBLR_min0.050.vcf.gz

The variant annotations and descriptions include:

Column Description
CHROM Chromosome
POS The 1-based position of the variation on the given sequence.
REF Base(s) at position in the reference genome (hg38)
ALT Alternate base(s)
GENE Gene name DP - combined depth across samples
QUAL A quality score associated with the inference of the given alleles.
MQ RMS mapping quality
RNAEDIT A known or predicted RNA-editing site
RPT Repeat family from UCSC Genome Browser Repeatmasker Annotations
SPLICEADJ Variant is within specified distance of a reference exon splice boundary
FATHMM FATHMM (Functional Analysis through Hidden Markov Models). 'Pathogenic':Cancer or damaging 'Neutral':Passanger or Tolerated.
CHASM_PVALUE Empirical p-value (probability that passenger variant is misclassified as a driver).
CHASM_FDR False discovery rate expected (Benjamini-Hochberg multiple testing correction).
VEST_PVALUE Empirical p-value (probability that benign variant is misclassified as pathogenic).
VEST_FDR Composite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silentvariants in the gene combined with Stouffer’s Z-score method.
MuPIT MuPIT 3D structure variant link