-
Notifications
You must be signed in to change notification settings - Fork 26
TAMA GO: Find Model Changes
This tool in TAMA-GO is used to identify transcript model changes due to different pre-processing pipelines.
Please see the TAMA paper for more details: https://www.biorxiv.org/content/10.1101/780015v1
In order to run tama_find_model_changes.py, you will need to first create two different annotations using TAMA Collapse and then merge the annotations with TAMA Merge. You will also need to create a read support file for the merged annotation using tama_read_support_levels.py.
The full pipeline goes something like this: TAMA Collapse on pipeline A TAMA Collapse on pipeline B Read support on TAMA Collapse on pipeline A Read support on TAMA Collapse on pipeline B TAMA Merge the bed files from TAMA Collapse A and B Read support on the TAMA Merge Now you should have all the files needed to run tama_find_model_changes.py.
tama_find_model_changes.py
To identify which reads have changed transcript and gene mapping due to pre-map processing use tama_find_model_changes.py.
usage: tama_find_model_changes.py [-h] [-b]B [-r] [-o] [-ref] [-alt]
optional arguments:
-h, --help show this help message and exit -b B Merged annotation bed file -r R Read support file for the merged bed file -o O Output prefix (required) -ref REF Name of the reference TAMA Collapse run. This should match the names used in the read support file. -alt ALT Name of the alternative TAMA Collapse run. This should match the names used in the read support file.
Default command would look like this:
python tama_find_model_changes.py -b bed -r readsupport -o prefix -ref A -alt B
Detailed explanation of arguments:
-b B
The bed file is the annotation bed file that is the result of running TAMA Merge to merge two annotation bed files (both of which should be the outputs from TAMA Collapse runs).
-r R
The readsupport file is the output file from running tama_read_support_levels.py on the the output of the TAMA Merge run that was used for the bed file input.
-o O
This is the output prefix. The prefix will be used to create the output file names.
-ref REF
REF is just the name you gave to one of the TAMA Collapse bed files when merging with TAMA Merge. So this should match the name provided in the TAMA Merge input filelist file.
-alt ALT
ALT is just the name you gave to the other TAMA Collapse bed file when merging with TAMA Merge. So this should match the name provided in the TAMA Merge input filelist file. The reason for REF and ALT is just for ordering the output fields.
Outputs:
prefix_diff_report.txt prefix_diff_genes.txt prefix_diff_trans.txt prefix_diff_one_source_genes.txt prefix_diff_one_source_trans.txt
Detailed explanation:
prefix_diff_report.txt
This is a report file showing the summary of the number of gene and transcripts swaps/changes.
An example:
num_diff_gene_reads: 34637 num_diff_trans_reads: 830760 num_merge_diff_gene: 6774 num_merge_diff_trans: 134714 this_source_diff_genes flnc: 4799 this_source_diff_trans flnc: 108274 this_source_diff_genes polish: 2793 this_source_diff_trans polish: 38492 only_source_num_genes flnc: 3230 only_source_num_trans flnc: 83179 only_source_num_genes polish: 104 only_source_num_trans polish: 13280 total_one_source_genes_count: 3334 total_one_source_trans_count: 96459
prefix_diff_genes.txt
This file shows the details of each read involved in a gene swap. "all_gene_line" shows the merged gene ID of the read map and the associated pipelines.
read_id num_genes all_gene_line all_pos_line all_trans_line m64012_181221_231243/103351668/ccs 2 polish:G1,flnc:G29636 1:14359-201270,12:14474-32818 polish:G1.1,flnc:G29636.28
prefix_diff_trans.txt
This file shows the details of each read involved in a transcript swap.
read_id alt_trans_diff_count alt_diff_trans_id_list_line alt_trans_id_list_line ref_trans_id_list_line m64012_181221_231243/65014367/ccs 1 G1.1 G1.1 G1.182 m64012_181221_231243/84937275/ccs 1 G1.1 G1.1 G1.43
prefix_diff_one_source_genes.txt
This file shows the genes which occur in only one of the two TAMA Collapse runs despite the supporting read being used in both pipelines.
merge_source merge_gene_id flnc G97591 flnc G151641
prefix_diff_one_source_trans.txt
This file shows the transcripts which occur in only one of the two TAMA Collapse runs despite the supporting read being used in both pipelines.
merge_source merge_trans_id flnc G143950.83 flnc G161009.42 flnc G50280.13