Computational Genomics Lab, Genomics Institute, UC Santa Cruz

Toil RNA-Seq Pipeline

Scalable, reproducible, and robust RNA-seq expression quantification.

The Toil RNA-seq workflow converts RNA sequencing data into gene- and transcript-level expression quantification.

Please open issues for any bugs, errors, corrections, or feature requests.

If there are any questions not answered by this README or the wiki, contact John Vivian.

Appendix

Dependencies and Installation
Quickstart

For detailed information and troubleshooting, see the Wiki

Workflow Inputs
Examples
Methods
Troubleshooting
Auto-scaling on AWS

Overview

This workflow takes RNA sequencing reads (fastq / BAM) as input and outputs the following (if all options enabled):

<SAMPLE>
├── Kallisto
│   ├── abundance.h5
│   ├── abundance.tsv
│   ├── fusion.txt
│   └── run_info.json
├── QC
│   ├── fastQC
│   │   ├── R1_fastqc.html
│   │   ├── R1_fastqc.zip
│   │   ├── R2_fastqc.html
│   │   └── R2_fastqc.zip
│   └── STAR
│       ├── Log.final.out
│       └── SJ.out.tab
├── Hera
│   ├── abundance.h5
│   ├── abundance.tsv
│   ├── fusion.bedpe (paired-end data only)
│   └── summary
└── RSEM
    ├── Hugo
    │   ├── rsem_genes.hugo.results
    │   └── rsem_isoforms.hugo.results
    ├── rsem_genes.results
    └── rsem_isoforms.results

If the user selects options such as save-bam, or wiggle, additional files will appear in the output directory:

SAMPLE.sorted.bam
SAMPLE.wiggle.bg

The output tarball is prepended with the unique name for the sample (e.g. SAMPLE.tar.gz).

Dependencies and Installation

This workflow has been tested on Ubuntu 14.04, 16.04 and Mac OSX, but should also run on other unix based systems.
apt-get and pip often require sudo privilege, so if the below commands fail, try prepending sudo.
If you do not have sudo privileges you will need to build these tools from source, or bug a sysadmin about how to get them (they don't mind).

General Dependencies

1. Python 2.7
2. Curl         apt-get install curl
3. Docker       http://docs.docker.com/engine/installation/

Python Dependencies

1. Toil         pip install toil
2. S3AM         pip install s3am (optional, needed for uploading output to S3)

System Dependencies

The workflow requires approximately 50-60G of RAM in order to run STAR alignment.

Installation

The Toil RNA-seq workflow is pip installable!

For most users, the preferred installation method is inside a virtualenv to avoid dependency conflicts:

virtualenv ~/toil-rnaseq
source ~/toil-rnaseq/bin/activate
pip install toil-rnaseq

After installation, the workflow can be executed by typing toil-rnaseq into the teriminal.

Quickstart

First, obtain all of the necessary workflow inputs.

Then, type toil-rnaseq to get basic help menu and instructions

Type toil-rnaseq generate to create an editable manifest and config in the current working directory.
Parameterize the workflow by editing the config.
Fill in the manifest with information pertaining to your samples.
Type toil-rnaseq run [jobStore] to execute the workflow.

Citation

If you use this workflow to produce data for published research please cite the Toil white paper:

Vivian, J. et al. 
Toil enables reproducible, open source, big biomedical data analyses. 
Nat Biotech 35, 314–316 (2017).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Computational Genomics Lab, Genomics Institute, UC Santa Cruz

Toil RNA-Seq Pipeline

Scalable, reproducible, and robust RNA-seq expression quantification.

Appendix

Overview

Dependencies and Installation

General Dependencies

Python Dependencies

System Dependencies

Installation

Quickstart

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Computational Genomics Lab, Genomics Institute, UC Santa Cruz

Toil RNA-Seq Pipeline

Scalable, reproducible, and robust RNA-seq expression quantification.

Appendix

Overview

Dependencies and Installation

General Dependencies

Python Dependencies

System Dependencies

Installation

Quickstart

Citation