Skip to content

HowToSetUpJAFFA

nadiadavidson edited this page Aug 10, 2016 · 27 revisions

In this wiki we describe how to install JAFFA and give some basic instructions to start running it. JAFFA is designed to be run on the bash command-line in linux. Having an understanding of bash (and R) would be useful to understand what the pipeline is doing, but isn't essential.

Installing

  1. Download the JAFFA tar ball and reference file.
  2. unzip and untar both files in the same place:
tar -zxvf <filename> 
  1. Download the human genome version [hg38] ( http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz), from [UCSC] (http://hgdownload.cse.ucsc.edu/downloads.html) if you don't already have it. JAFFA expects a single fasta file. So if you download the files above, you'll need to unzip and untar then combine all the chromosomal fasta files together. e.g. cat chr*.fa > hg38.fa. By default JAFFA will be expecting this file to be in the root of the JAFFA code directory. You can either copy it there, create a symbolic link to it (ln -s <path_to_hg38> <path_to_JAFFA_directory>), provide the path to your hg38.fa file in the pipeline file JAFFA_stages.groovy or pass it when you run the JAFFA command. Note that JAFFA expects the UCSC version of the genome. Other versions (e.g. Ensembl) aren't compatible with JAFFA's reference files. This is also true if you are using hg19 or mm10.
  2. Before running JAFFA, there are quite a few other programs which must be installed. To make your life easier we have provided a script to automate this using wget. Run it in JAFFA's directory. When it's finished, check that all paths are filled in the file tools.groovy
./install_linux64.sh
  1. If you don't already have it, you will need to install R. Note that the R package, IRanges, must be installed.
  2. Configure the JAFFA pipeline options for your data. This can be done either by editing the JAFFA_stages.groovy file, or by passing the parameters to bpipe when you run JAFFA. * readLayout - change to "single" if you have single-end reads otherwise paired-end is assumed. * genomeFasta - this is the path to the human genome. If you leave this unchanged it will default to the directory of the JAFFA package * fastqInputFormat - This tells bpipe how to split on samples and group of read pairs. The default should work if your reads are named like SampleA_1.fastq.gz SampleA_2.fastq.gz SampleB_1.fastq.gz SampleB_2.fastq.gz etc. JAFFA will create one directory for each sample. If you find this does not happen in a way you expect, you might need to adjust this variable. See the end of this bpipe doc page for more information. Also, you may need to change this parameter if your reads have the fq extension instead of fastq.

Input Type

The input to JAFFA should be either reads which have been gzipped. i.e. with an ending like ".fastq.gz" or a fasta file of contigs with an ending like ".fasta". JAFFA assumes there is one files (single-end) or pair of files (paired-end) per sample.

Running

Create and change into the directory where you intend the output files of JAFFA to be placed. You then have a choice of three JAFFA running modes: Assembly, Hybrid and Direct. Which mode to use will depend on your read length.

When to use which mode?

  • For reads shorter than 70bp use the Assembly mode. The reads are too short to be used as contigs directly.
  • For longer reads ( 70 and 95 bp), the hybrid mode is the most sensitive. However, because it involves assembly, it requires a lot of memory and CPU time. If computational resources are a constraint, we recommend using the direct method.
  • For long reads (e.g. 100 bp up), there is no advantage in assembling, therefore we recommend the direct mode.

Assembly

JAFFA will call Velvet and Oases to assemble the reads. It will then search for fusions from amongst the assembled contigs.

<path to JAFFA>/tools/bin/bpipe run <path to JAFFA>/JAFFA.groovy <path_to_directory with fastq files>/*.gz

Direct

JAFFA will map reads to the known reference transcriptome and extract reads which do not map. It will then search for fusions from amongst the unmapped reads.

<path to JAFFA>/tools/bin/bpipe run <path to JAFFA>/JAFFA_direct.groovy <path_to_directory with fastq files>/*.gz

In this mode, you can only search for fusions in pre-assembled transcriptomes, but providing a fasta file as input. In this case we skip the step where we filter for unmapped sequences.

<path to JAFFA>/tools/bin/bpipe run <path to JAFFA>/JAFFA_direct.groovy <path_to_directory with fasta file>/*.fasta

Hybrid

This is a combination of the previous two modes. First JAFFA will call Velvet and Oases to assemble the reads. It will then search for fusions from amongst the assembled contigs. Next it will map reads to both the known reference transcriptome and the assembled transcriptome. It will then search for fusions from amongst the unmapped reads.

<path to JAFFA>/tools/bin/bpipe run <path to JAFFA>/JAFFA_hybrid.groovy <path_to_directory with fastq files>/*.gz
Clone this wiki locally