Skip to content

TAMA GO: Degradation Signature

GenomeRIK edited this page Jun 12, 2019 · 8 revisions

This tool in TAMA-GO is used to assess degradation signature of cDNA libraries and get useful stats about the annotation.

The degradation signature (DegSig) is calculated by counting the number of collapsed transcripts from collapsing with the "no_cap" and collapsing with the "capped" algorithm.

Formula:

DegSig = ((capped transcripts) - (no cap transcripts)) / (capped transcripts)

However, only transcripts from genes with mutliple exons and multiple read support are used. This is done to prevent over-counting relevant transcript models since genes with single read support will not differ in their collapsing and genes with only single exons do not have splice junctions to assess exon cascading.

Note: The Degradation Signature calculation requires that the TAMA Collapse runs were done on pre-clustered FLNC mapped reads. This is because pre-mapping clustering (as in the official Iso-Seq pipeline) will already collapse a portion of the degraded products into longer reads thus hiding it from TAMA Collapse. So running the Degradation Signature tool on TAMA Collapse runs from cluster/polish reads will result in a lower estimation of degradation.

tama_degradation_signature.py

To run tama_degradation_signature.py you first need to run TAMA Collapse with both the "Capped" and "No Cap" algorithms. Both runs should otherwise have identical parameters. The inputs for this tool are the trans_read.bed files from these runs.

USAGE:

python tama_degradation_signature.py -c capped_trans_read.bed -nc nocap_trans_read.bed -o outfile_name

Input explanation:

capped_trans_read.bed - This is the trans_read.bed file that was output from the TAMA Collapse capped run.

nocap_trans_read.bed - This is the trans_read.bed file that was output from the TAMA Collapse no_cap run.

outfile_name - This is the name of the output file which will contain a summary of stats including the degradation signature.

The output will look like this:

  Degradation Signature = 0.41700034321
  Capped multi-exon, multi-read, transcript count = 61187
  No-cap multi-exon, multi-read, transcript count = 35672
  Capped total transcript count = 76518
  No-cap total transcript count = 49544
  Capped single exon trans count = 28755
  No-cap single exon trans count = 15854
  Capped multi exon trans count = 47763
  No-cap multi exon trans count = 33690
  Capped total gene count = 21722
  No-cap total gene count = 21722
  Capped single exon gene count = 11158
  No-cap single exon gene count = 11158
  Capped multi exon gene count = 10564
  No-cap multi exon gene count = 10564
  Capped single exon single read gene count = 9794
  No-cap single exon single read gene count = 9794
  Capped multi exon single read gene count = 2019
  No-cap multi exon single read gene count = 2019

Note that gene counts should be the same for the capped and no_cap runs. These numbers are shown for trouble shooting in case the wrong input files are used.

A degradation signature higher than 0.25 is considered high and indicates a large number of degraded products in the sequenced RNA.