-
Notifications
You must be signed in to change notification settings - Fork 103
Targeted Iso Seq QC Wiki
Last Updated: 08/15/2017
This wiki shows scripts used for targeted Iso-Seq analysis.
- BioPython
- bx-python
You can install BioPython via pip install biopython
and bx-python via pip install bx-python
. It is highly recommended that you do such installations in a virtual environment via Anaconda or python's own virtualenv to have a clean environment to work with.
No installation is required. You just download the Cupcake repository from GitHub and add the appropriate paths.
$ git clone https://github.com/Magdoll/cDNA_Cupcake.git
$ export PYTHONPATH=$PYTHONPATH:<path_to>/cDNA_Cupcake/sequence
To see that you have successfully gotten BioPython, bx-python, and the sequence/
subdir working, try the following in Python interpreter:
$ python
Python 2.7.13 |Anaconda 2.4.1 (64-bit)| (default, Dec 20 2016, 23:09:15)
>>> import Bio # testing BioPython
>>> import bx # testing bx-python
>>> import BED # testing Cupcake sequence/
All of the above should work without error messages.
You will need:
- The input fasta file
- The aligned SAM file (see GMAP command below)
- A tab-delimited probe BED file with at least the first three columns and an optional 4th column of
<chrom> <start> <end> <gene_name>
. No headers allowed!
You can generate the aligned SAM file using GMAP:
gmap -D ~/share/gmap_db_new/ -d hg19 -f samse -n 0 -t 30 -z sense_force \
isoseq_flnc.fasta \
> isoseq_flnc.fasta.hg19.sam \
2> isoseq_flnc.fasta.hg19.sam.log
The script usage is:
python calc_probe_hit_from_sam.py [BED] [FASTA] [SAM]
--start_base {0 or 1}
--end_base {0 or 1}
-o [OUTPUT]
Where --start_base, --end_base
indicates whether the start/end index is 0-based or 1-based.
For example:
python <path_to>/cDNA_Cupcake/targeted/calc_probe_hit_from_sam.py my-probes.bed isoseq_flnc.fasta isoseq_flnc.fasta.hg19.sam --start_base 0 --end_base 0 -o isoseq_flnc.fasta.hg19.sam.probe_hit.txt