Skip to content

Output files and formats

M. Brown edited this page Feb 12, 2019 · 14 revisions

CTAT-mutations Output Files and Formats

Following are the column headers and its description:

Column Description
CHROM Chromosome
POS The 1-based position of the variation on the given sequence.
REF Base(s) at position in the reference genome (hg38)
ALT Alternate base(s)
GENE Gene name DP - combined depth across samples
QUAL A quality score associated with the inference of the given alleles.
MQ RMS mapping quality
SAO An integer that indicates variant allele origin. The accepted values for this tag are: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both. Note: "SAO" is "SNP Allele Origin". We have changed "SNP" to the more inclusive term "variant"; the tag "SAO" remains in the vcf files.
NSF The consequence of the variation is a non-synonymous frameshift -- a coding region variation where one allele in the set changes all downstream amino acids.
NSM The consequence of the variation is a non-synonymous (missense) change -- it is a coding region variation where one allele in the set changes the amino acid, but translation continues.
NSN The consequence of the variation is a non-synonymous stop codon (nonsense) -- it is a coding region variation where one allele in the set changes to a STOP codon (TER or *).
TUMOR Tumor type
TISSUE tissue name
COSMIC_ID COSMIC database ID
KGPROD Indicates the variation was submitted as part of the 1000 Genomes Project
RS The consequence of the variation is a non-synonymous frameshift -- a coding region variation where one allele in the set changes all downstream amino acids.
PMC Indicates that links exist from the variant's rs record to a PubMed Central article.
CHASM_PVALUE Empirical p-value (probability that passenger variant is misclassified as a driver).
CHASM_FDR False discovery rate expected (Benjamini-Hochberg multiple testing correction).
VEST_PVALUE Empirical p-value (probability that benign variant is misclassified as pathogenic).
VEST_FDR Composite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silentvariants in the gene combined with Stouffer’s Z-score method.

Description of all outputs

File Description
cancer.tab CRAVAT generated VCF file converted to user friendly tab delimited format
cancer.vcf CRAVAT generated VCF file
variants.vcf Output from GATK’s Haplotype caller in VCF format with raw, unfiltered SNP and indel calls
variants.vcf.idx Indexed variants.vcf file
refGene.sort.bed Reference sorted BED file
variants_initial_filtering.vcf Initially filtered variants file by VariantFiltration in GATK4.
variants_initial_filtering.vcf.idx Indexed filtered vcf file
variants_initial_filtering_clean_snp_RNAedit.vcf.gz RNAedit variant filters REDIportal and RADAR applied.
variants_initial_filtering_clean_snp_RNAedit.vcf.gz.csi Indexed filtered RNAedit filter applied file
variants_initial_filtering_clean_snp_RNAedit.vcf_dbsnp.vcf.gz BCFtools annotated DBSNP variants filtered file
variants_initial_filtering_clean_snp_RNAedit.vcf_snpeff.vcf Filtered VCF file snpeff annotated
variants_initial_filtering_clean_snp_RNAedit.vcf_snpeff_updated.vcf.gz snpeff annotated variants filtered file
variants_initial_filtering_clean_snp_RNAedit.vcf_snpeff_updated.vcf.gz.csi Indexed snpeff annotated variants filtered file
annotated_min_filtered.vcf.gz BCFtools annotated file obtained from CRAVAT generated VCF output
mutation_inspector.json JSON file including all mutations, required for visualization