Skip to content

Output files and formats

Brian Haas edited this page Oct 18, 2019 · 14 revisions

CTAT-mutations Output Files and Formats

The primary output files generated by the pipeline include the following:

  • variants.HC_init.wAnnot.vcf : the initially predicted variants
  • variants.HC_hard_cutoffs_applied.vcf : variants after applying hard cutoffs to remove likely false positives.
  • cancer.vcf : the subset of variants that are considered most relevant to cancer biology
  • igvjs_viewer.html : self-contained web-application for interactively navigating the cancer variants.

The variant annotations and descriptions include:

Column Description
CHROM Chromosome
POS The 1-based position of the variation on the given sequence.
REF Base(s) at position in the reference genome (hg38)
ALT Alternate base(s)
GENE Gene name DP - combined depth across samples
QUAL A quality score associated with the inference of the given alleles.
MQ RMS mapping quality
RNAEDIT A known or predicted RNA-editing site
RPT Repeat family from UCSC Genome Browser Repeatmasker Annotations
SPLICEADJ Variant is within specified distance of a reference exon splice boundary
FATHMM FATHMM (Functional Analysis through Hidden Markov Models). 'Pathogenic':Cancer or damaging 'Neutral':Passanger or Tolerated.
CHASM_PVALUE Empirical p-value (probability that passenger variant is misclassified as a driver).
CHASM_FDR False discovery rate expected (Benjamini-Hochberg multiple testing correction).
VEST_PVALUE Empirical p-value (probability that benign variant is misclassified as pathogenic).
VEST_FDR Composite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silentvariants in the gene combined with Stouffer’s Z-score method.
MuPIT MuPIT 3D structure variant link