Skip to content
M. Brown edited this page Dec 14, 2022 · 12 revisions

CTAT pVACseq

Intro

Tumor cells can arise from non-synonymous coding mutations. A non-synonymous coding mutation in the DNA causes an alteration in the amino acid sequences of endogenous proteins. Through the process of Antigen Presentation these proteins are processed by the proteasome and lysed into peptides. These peptides are loaded into the antigen presentation complexes, in particular the Major Histocompatibility complex (MHC). The MHC then gravitates to the cell surface. These peptides are referred to as neo-antigens.

The T-Cell Receptor will uniquely bind to these peptides displayed on the MHC and elicit some kind of immune response. Because of their aberrant nature, the T-cells will view these mutant affiliated peptides as “non-self” or foreign entities, and eliminate the cell. Leveraging this mechanism of immune response, personalized vaccine development has shown great promise in driving immune response against tumor cells.

MHC’s have a high level of allelic diversity. This high level of diversity results in MHCs having variable affinities to mutated peptides. Therefore identifying proper neo-antigen targets is essential for the development of personalized therapies.

Given the interest in personalized immunotherapy, pVACtools was developed by the Griffith Lab at Washington Unitversity St. Louis to help identify and visualize these tumor neoantigens. pVACseq is a tool within the pVACtools toolkit that identifies and prioritizes neoantigens leveraging Tumor-Normal DNA and RNA data.

Trinity Cancer Transcriptome Analysis Toolkit (CTAT) aims to provide tools for leveraging RNA-seq to gain insights into the biology of cancer transcriptomes. CTAT-pVACseq uses the pVACseq framework to identify neoantigens leveraging RNA-seq data. CTAT-pVACseq best practices encourages users to run CTAT-Mutations pipeline and use outputs from CTAT-Mutations as inputs for CTAT-pVACseq.

Requirements

CTAT-pVACseq requires the following program in order to run:

  • Java
  • Cromwell
  • Docker

Running

Running CTAT-pVACseq is a two step process.

  1. The first step is to run the preprocessing step.
  2. The second Step is to run pVACseq.

Preprocessing

Before running CTAT-pVACseq, the RNA-seq data needs to go through the preprocessing steps. This preprocessing step adds the proper annotations to each variant in the VCF. In order to run the preprocessing step, the user must have the proper reference files. Trinity CTAT project provides a public reference library that holds all the needed reference files. The library can be found here (https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/).

The following command will run this preprocessing step:

java -jar cromwell-71.jar \
        run CTAT-pVACseq/WDL/preprocessing_Main_RNAseq.wdl \
        -i inputs.json

Users have to update the inputs.json file so that it includes the correct reference paths. The required input files include the following:

  • VCF
  • BAM
  • GTF
  • RNA-editing VCF
  • gnomad VCF
  • reference genome
  • reference genome dictionary
  • VEP Reference

The preprocessing workflow will output two VCFs; annotated_TXGX.vcf, and <ID>_decomposed_output.vcf.

pVACseq

The output VCFs from the above preprocessing step can then be fed into the pVACseq workflow. The following command is used to run the pVACseq workflow.

java -jar cromwell-71.jar \
        run CTAT-pVACseq/WDL/pVACseq.wdl \
        -i inputs.json \

Users have to update the inputs.json file and add the paths from the VCF outputs given by the preprocessing step. The annotated_TXGX.vcf will be your input VCF and <ID>_decomposed_output.vcf will be your input phased VCF.

"pVACseq.VCF":"annotated_TXGX.vcf"
"pVACseq.phased_VCF":"<ID>_decomposed_output.vcf"

Within the input.json file, users can choose what HLAs types and epitopes lengths they want to use in the pVACseq analysis. Users can also choose what algorithms to use.

Clone this wiki locally