Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Targeted Iso Seq QC Wiki

Magdoll edited this page Aug 15, 2017 · 11 revisions

Last Updated: 08/15/2017

This wiki shows scripts used for targeted Iso-Seq analysis.

Python Requirement

  • BioPython
  • bx-python

You can install BioPython via pip install biopython and bx-python via pip install bx-python. It is highly recommended that you do such installations in a virtual environment via Anaconda or python's own virtualenv to have a clean environment to work with.

How to set up the scripts

No installation is required. You just download the Cupcake repository from GitHub and add the appropriate paths.

$ git clone https://github.com/Magdoll/cDNA_Cupcake.git
$ export PYTHONPATH=$PYTHONPATH:<path_to>/cDNA_Cupcake/sequence

To see that you have successfully gotten BioPython, bx-python, and the sequence/ subdir working, try the following in Python interpreter:

$ python
Python 2.7.13 |Anaconda 2.4.1 (64-bit)| (default, Dec 20 2016, 23:09:15) 
>>> import Bio  # testing BioPython
>>> import bx   # testing bx-python
>>> import BED  # testing Cupcake sequence/

All of the above should work without error messages.

List of scripts

Detailed Documentation

Calculate on-target rate from SAM alignment + probe BED file

You will need:

  1. The input fasta file
  2. The aligned SAM file (see GMAP command below)
  3. A tab-delimited probe BED file with at least the first three columns and an optional 4th column of <chrom> <start> <end> <gene_name>. No headers allowed!

You can generate the aligned SAM file using GMAP:

gmap -D ~/share/gmap_db_new/ -d hg19 -f samse -n 0 -t 30 -z sense_force \
       isoseq_flnc.fasta \
    > isoseq_flnc.fasta.hg19.sam \
   2> isoseq_flnc.fasta.hg19.sam.log