Skip to content

TAMA GO: Find Model Changes

GenomeRIK edited this page Oct 3, 2019 · 3 revisions

This tool in TAMA-GO is used to identify transcript model changes due to different pre-processing pipelines.

Please see the TAMA paper for more details: https://www.biorxiv.org/content/10.1101/780015v1

In order to run tama_find_model_changes.py, you will need to first create two different annotations using TAMA Collapse and then merge the annotations with TAMA Merge. You will also need to create a read support file for the merged annotation using tama_read_support_levels.py.

The full pipeline goes something like this: TAMA Collapse on pipeline A TAMA Collapse on pipeline B Read support on TAMA Collapse on pipeline A Read support on TAMA Collapse on pipeline B TAMA Merge the bed files from TAMA Collapse A and B Read support on the TAMA Merge Now you should have all the files needed to run tama_find_model_changes.py.

tama_find_model_changes.py

To identify which reads have changed transcript and gene mapping due to pre-map processing use tama_find_model_changes.py.

usage: tama_find_model_changes.py [-h] [-b]B [-r] [-o] [-ref] [-alt]

optional arguments:

  -h, --help  show this help message and exit
  -b B        Merged annotation bed file
  -r R        Read support file for the merged bed file
  -o O        Output prefix (required)
  -ref REF    Name of the reference TAMA Collapse run. This should match the names used in the read support file.
  -alt ALT    Name of the alternative TAMA Collapse run. This should match the names used in the read support file.

Default command would look like this:

python tama_find_model_changes.py -b bed -r readsupport -o prefix -ref A -alt B

Detailed explanation of arguments:

-b B

The bed file is the annotation bed file that is the result of running TAMA Merge to merge two annotation bed files (both of which should be the outputs from TAMA Collapse runs).

-r R

The readsupport file is the output file from running tama_read_support_levels.py on the the output of the TAMA Merge run that was used for the bed file input.

-o O

This is the output prefix. The prefix will be used to create the output file names.

-ref REF

REF is just the name you gave to one of the TAMA Collapse bed files when merging with TAMA Merge. So this should match the name provided in the TAMA Merge input filelist file.

-alt ALT

ALT is just the name you gave to the other TAMA Collapse bed file when merging with TAMA Merge. So this should match the name provided in the TAMA Merge input filelist file. The reason for REF and ALT is just for ordering the output fields.

Outputs:

  prefix_diff_report.txt
  prefix_diff_genes.txt
  prefix_diff_trans.txt
  prefix_diff_one_source_genes.txt
  prefix_diff_one_source_trans.txt

Detailed explanation:

prefix_diff_report.txt

This is a report file showing the summary of the number of gene and transcripts swaps/changes.

An example:

  num_diff_gene_reads: 34637
  num_diff_trans_reads: 830760
  num_merge_diff_gene: 6774
  num_merge_diff_trans: 134714
  this_source_diff_genes flnc: 4799
  this_source_diff_trans flnc: 108274
  this_source_diff_genes polish: 2793
  this_source_diff_trans polish: 38492
  only_source_num_genes flnc: 3230
  only_source_num_trans flnc: 83179
  only_source_num_genes polish: 104
  only_source_num_trans polish: 13280
  total_one_source_genes_count: 3334
  total_one_source_trans_count: 96459

prefix_diff_genes.txt

This file shows the details of each read involved in a gene swap. "all_gene_line" shows the merged gene ID of the read map and the associated pipelines.

  read_id num_genes       all_gene_line   all_pos_line    all_trans_line
  m64012_181221_231243/103351668/ccs      2       polish:G1,flnc:G29636   1:14359-201270,12:14474-32818   polish:G1.1,flnc:G29636.28

prefix_diff_trans.txt

This file shows the details of each read involved in a transcript swap.

  read_id alt_trans_diff_count    alt_diff_trans_id_list_line     alt_trans_id_list_line  ref_trans_id_list_line
  m64012_181221_231243/65014367/ccs       1       G1.1    G1.1    G1.182
  m64012_181221_231243/84937275/ccs       1       G1.1    G1.1    G1.43

prefix_diff_one_source_genes.txt

This file shows the genes which occur in only one of the two TAMA Collapse runs despite the supporting read being used in both pipelines.

  merge_source    merge_gene_id
  flnc    G97591
  flnc    G151641

prefix_diff_one_source_trans.txt

This file shows the transcripts which occur in only one of the two TAMA Collapse runs despite the supporting read being used in both pipelines.

  merge_source    merge_trans_id
  flnc    G143950.83
  flnc    G161009.42
  flnc    G50280.13