Skip to content
M. Brown edited this page Mar 20, 2019 · 174 revisions

Overview

CTAT-Mutations Pipeline is a variant calling pipeline focussed on detecting mutations from RNA sequencing (RNA-seq) data. It integrates GATK Best Practices along with downstream steps to annotate, filter, and prioritize cancer mutations. This includes leveraging the RADAR and RediPortal databases for identifying likely RNA-editing events, dbSNP for excluding common variants, and COSMIC to highlight known cancer mutations. Finally, CRAVAT is leveraged to annotate and prioritize variants according to likely biological impact and relevance to cancer.

The CTAT Mutations pipeline is one of the components of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT), complementing other functionality that leverages RNA-Seq data for characterizing cancer transcriptomes, including identification of fusion transcripts, copy number variations from tumor single cell transcriptomes, among other analyses.

Our CTAT-Mutation pipeline aims to make mutation discovery from rna-seq data as easy as possible, requiring only the rna-seq reads as input, and generating summary reports and visualizations to help guide you to the most meaningful findings.

The following flowchart is a simplified visualization of the steps performed by the CTAT-Mutation Pipeline.

Installing CTAT-Mutations

The CTAT-Mutations pipeline requires the CTAT-Mutations software and companion genomic data resources. See our instructions for installing CTAT-Mutations for details.

Running the CTAT-Mutations Pipeline

Once the CTAT-Mutation Pipeline has successfully been installed along with the obligatory CTAT Genome Library, CTAT-Mutation Pipeline can be ran using the following command, only requiring the input reads.

   python /path/to/ctat_mutations \
   --left    : Path to the location of the left (ie. /1) paired end RNA-Seq Fastq file.
   --right   : Path to the location of the right (ie, /2) paired end RNA-Seq Fastq file. 
   --out_dir : Name to be given to the directory in which CTAT-Mutation outputs will 
               be placed. 

As inputs, CTAT-Mutation requires RNA-Seq reads in the form of a right and left paired-end FASTQ files, along with an output directory name where the pipeline products will be stored.

Example

A small sample data set is available for testing purposes. The pipeline can be ran on the sample data set by running the following command:

   python /path/to/ctat_mutations \
   --left reads_1.fq \
   --right reads_2.fq \
   --out_dir varcalling.outdir \

See our more detailed walk through tutorial leveraging these data.

Output

The output from the CTAT-Mutations pipeline includes summary tab-delimited reports and interactive visualizations.

Variant Reports

The primary outputs include the cancer.vcf and corresponding simpler summary cancer.tab file, containing the final prioritized set of cancer variants detected in the sample. The cancer.vcf (VCF version 4.0) records the genetic variations, their locations, and additional annotation information. The cancer.tab is a tab-delimited file that contains the same variant information in a user-friendly format. There are additional outputs that are generated by the different stages of the CTAT-Mutations pipeline, as others are likely to be of interest as well for exploring RNA-editing or common variants. Documentation is provided for all such output files and formats.

Variant Visualization

You will also find an html page output named "igvjs_viewer.html" (based on igv-reports ), which allows for dynamic navigation of the identified cancer variants and the read evidence supporting their identification. This file can be simply opened in your web browser. An example view is shown below.

mut_view2

More info for exploring the variant visualization framework is available here.

CTAT-Mutations Variant detection accuracy

We've assessed performance of the CTAT-Mutations pipeline using a variety of methods, including the Genome in a Bottle reference data and by applying our pipeline to cancer data sets having matched rna-seq and exome data.

To examine our performance assessment, please visit our Performance Assessment Report.

User support

Contact us on our google group https://groups.google.com/forum/#!forum/trinity_ctat_users

We aim to be responsive with user support. You will be responded to within hours time, generally (not days or weeks).

Funding

CTAT-Mutations is supported as part of the Trinity CTAT Project, funded by the National Cancer Institute Informatics Technology for Cancer Research