-
Notifications
You must be signed in to change notification settings - Fork 26
TAMA GO: Read Support
This set of tools in TAMA-GO is used to collect all the information regarding supporting reads for each gene/transcript model.
tama_read_support_levels.py
To generate a file containing read support for each transcript use tama_read_support_levels.py. This is a versatile tool which provides read support/count information for different levels of processing. It can be used to find read support for clustered reads, for collapse transcript models, and for merged transcript models.
USAGE:
python tama_read_support_levels.py -f filelist -o prefix -m mergefile
trans_read.bed - This is the output file from TAMA Collapse.
cluster_file - This is the cluster file form running clustering with the official Iso-Seq pipelines. For Iso-Seq3 the file will look like "prefix.primer_5p--primer_3p.cluster" for Iso-Seq1 the cluster file will look like "cluster_report.csv". Alternatively, if you did not do clustering and mapped the FLNC directly to the genome then you can use the "trans_read.bed" file in for the cluster_file input.
output_file - The name that you want the output file to be called.
OUTPUT:
The format of the output file is as follows:
gene_id trans_id gene_num_reads trans_num_reads cluster_line G1 G1.1 5 1 4_c18833:m160316_194043_42149_c100936532550000001823211106101647_s1_p0/27978/2288_50_CCS
cluster_line - This field shows the clusters and reads supporting each transcript model. The field is sub-divided using ";" to delimit between cluster groups, ":" to delimit between cluster name and read name, and "," to delimit between reads names.
DEPRECATED TOOLS BELOW
tama_read_support_collapse_cluster.py
To find read support for each TAMA Collapse run use tama_read_support_collapse_cluster.py.
USAGE:
python tama_read_support_collapse_cluster.py trans_read.bed cluster_file output_file
trans_read.bed - This is the output file from TAMA Collapse.
cluster_file - This is the cluster file form running clustering with the official Iso-Seq pipelines. For Iso-Seq3 the file will look like "prefix.primer_5p--primer_3p.cluster" for Iso-Seq1 the cluster file will look like "cluster_report.csv". Alternatively, if you did not do clustering and mapped the FLNC directly to the genome then you can use the "trans_read.bed" file in for the cluster_file input.
output_file - The name that you want the output file to be called.
OUTPUT:
The format of the output file is as follows:
gene_id trans_id gene_num_reads trans_num_reads cluster_line G1 G1.1 5 1 4_c18833:m160316_194043_42149_c100936532550000001823211106101647_s1_p0/27978/2288_50_CCS
cluster_line - This field shows the clusters and reads supporting each transcript model. The field is sub-divided using ";" to delimit between cluster groups, ":" to delimit between cluster name and read name, and "," to delimit between reads names.
tama_read_support_merge_collapse.py
To find read support for each TAMA Merge run use tama_read_support_merge_collapse.py.
USAGE:
python tama_read_support_merge_collapse.py filelist_file output_file
filelist_file - This is a file that lists all the read support files from the TAMA Collapase runs that were merged during TAMA Merge.
The format is as follows (tab delimited):
read_support_collapse1.txt collapse1 /path/to/file/ read_support_collapse2.txt collapse2 /path/to/file/
read_support_collapse1.txt - This is the name of the output file from running "tama_read_support_collapse_cluster" on the TAMA Collapse run.
collapse1 - This is the prefix used in the TAMA Merge run to identify the TAMA Collapse source run
/path/to/file/ - This is the path to the "read_support_collapse1.txt" file.
output_file - The name that you want the output file to be called.
OUTPUT:
The format of the output file is as follows:
merge_gene_id merge_trans_id gene_read_support trans_read_support source_prefix source_trans_line source_read_line G34 G34.2 8 3 ovary,testes ovary_G21.2,testes_G26.2 m160315_220438_42149_c100936532550000001823211106101642_s1_p0/121400/26_2480_CCS,m160316_194043_42149_c100936532550000001823211106101647_s1_p0/30762/26_2477_CCS;m160316_064304_42149_c100936532550000001823211106101644_s1_p0/108271/27_2482_CCS
merge_gene_id - The gene ID as given in the TAMA Merge output file.
merge_trans_id - The transcript ID as given in the TAMA Merge output file.
gene_read_support - The number of reads supporting this gene.
trans_read_support - The number of read supporting this transcript.
source_prefix - A list of the sources supporting this transcript.
source_trans_line - A list of the source transcripts supporting the merged transcript model.
source_read_line - A list of the read names support this merged transcript model. The group of reads supporting each source transcript are delimited by ";". The reads within each group are delmited by ",".The order of the group of reads matches the order given in the "source_trans_line" field.