JCAST (Junction Centric Alternative Splicing Translator) takes in alternative splicing events and returns custom protein sequence databases for isoform analysis.
See https://ed-lau.github.io/jcast/ for Documentation and Usage.
Install Python 3.7+ and pip. See instructions on Python website for specific instructions for your operating system.
JCAST can be installed from PyPI via pip. We recommend using a virtual environment.
$ pip install jcast
Launch JCAST as a module (Usage/Help):
$ python -m jcast
Alternatively:
$ jcast
Example command:
$ python -m jcast data/encode_human_pancreas/ data/gtf/Homo_sapiens.GRCh38.89.gtf data/gtf/Homo_sapiens.GRCh38.89.gtf data/genome/Homo_sapiens.GRCh38.dna.primary_assembly.fa -o encode_human_pancreas -q 0 1 -r 1 -m -c
To test that the installation can load test data files in tests/data (sample rMATS file and human chr 15 genome files)
$ pip install tox
$ tox
To run JCAST using the test files and print the results to Desktop
$ python -m jcast {j}/tests/data/rmats {j}/tests/data/genome/Homo_sapiens.GRCh38.89.chromosome.15.gtf {j}/tests/data/genome/Homo_sapiens.GRCh38.dna.chromosome.15.fa.gz -o ~/Desktop
python -m jcast -h
usage: __main__.py [-h] [-o OUT] [-r READ] [-m] [-c] [-q q_lo q_hi] [--g_or_ln G_OR_LN] rmats_folder gtf_file genome
jcast retrieves transcript splice junctionsand translates them into amino acid sequences
positional arguments:
rmats_folder path to folder storing rMATS output
gtf_file path to Ensembl gtf file
genome path to genome file
optional arguments:
-h, --help show this help message and exit
-o OUT, --out OUT name of the output files [default: psq_out]
-r READ, --read READ the lowest skipped junction read count for a junction to be translated [default: 1]
-m, --model models junction read count cutoff using a Gaussian mixture model [default: False]
-c, --canonical write out canonical protein sequence even if transcriptslices are untranslatable [default: False]
-q q_lo q_hi, --qvalue q_lo q_hi
take junctions with rMATS fdr within this threshold [default: 0 1]
--g_or_ln G_OR_LN Switch on distribution to use for low end of histogram, 0 for Gamma, anything else for LogNorm
JCAST has been tested in Python 3.7, 3.8, 3.9 and uses the following packages:
biopython>=1.78
gtfparse>=1.2.1
pandas>=1.3.0
requests>=2.24.0
tqdm>=4.61.2
scikit-learn>=1.0
matplotlib==3.4.2
scipy>=1.7.0
- rMATS output with rows containing
NA
as gene name can fail. - Upstream analyses should be performed using an unmasked genome. Currently JCAST cannot handle masked nucleotides (
N
).
Additional details on troubleshooting and result interpretation can be found in our publication in STAR Protocols.
Please contact us if you wish to contribute, and submit pull requests to us.
- Edward Lau, PhD - Code/design - ed-lau
- Maggie Lam, PhD - Code/design - Maggie-Lam
- Robert Wes Ludwig, BSc - Modeling - WesLudwig
This project is licensed under the MIT License - see the LICENSE.md file for details