-
Notifications
You must be signed in to change notification settings - Fork 10
Workflow Inputs
The Toil RNA-seq workflow requires input files in order to run. These files are hosted on Synapse and by UC Santa Cruz. Inputs were built using Gencode v23 and HG38 (see Methods).
To build your own indices, a FASTA reference file is needed along with an annotation GTF. Simply supply the reference genome and GTF file to toil-rnaseq-inputs
.
An example command to generate indices for a mouse genome:
toil-rnaseq-inputs --ref /mnt/mm10.fa --gtf /mnt/mouse-annotation.gtf --star --rsem --kallisto --hera
- Register for a Synapse account
- Either download the samples from the website GUI or use the Python API
pip install synapseclient
-
python
import synapseclient
syn = synapseclient.Synapse()
syn.login('[email protected]', 'password')
- Get the RSEM reference (1 GB)
syn.get('syn5889216', downloadLocation='.')
- Get the Kallisto index (2 GB)
syn.get('syn5886142', downloadLocation='.')
- Get the STAR index (25 GB)
syn.get('syn5886182', downloadLocation='.')
- Get the Hera index (2 GB)
syn.get('syn11678373', downloadLocation='.')
These inputs are used specifically for a test sample run during continuous integration. The test sample was generated from reads mapped to chromosome 6. Ensure the ci
option is enabled in the config if attempting to run these test samples so the appropriate resources are requested.
- Small RSEM Reference [8.8M]
- Small STAR Index [2.0G]
There are no specific test inputs for Hera and Kallisto, just use the standard inputs.
-
python
- Get the sample (500 KB)
syn.get('syn9924961', downloadLocation='.')
syn.get('syn9924962', downloadLocation='.')
- Get the small RSEM reference (8 MB)
syn.get('syn9772189', downloadLocation='.')
- Get the small STAR reference (2 GB)
syn.get('syn9772190', downloadLocation='.')
- Get the Kallisto reference (1 GB, same as regular input)
syn.get('syn5886142', downloadLocation='.')
- Get the sample (500 KB)
When running the pipeline, set the CI
option in the config to true
, so that it requests an appropriate amount of memory when running STAR.