Skip to content

TAMA GO: Read Support

GenomeRIK edited this page Oct 2, 2019 · 33 revisions

This set of tools in TAMA-GO is used to collect all the information regarding supporting reads for each gene/transcript model.

tama_read_support_levels.py

To generate a file containing read support for each transcript use tama_read_support_levels.py. This is a versatile tool which provides read support/count information for different levels of processing. It can be used to find read support for clustered reads, for collapse transcript models, and for merged transcript models.

USAGE:

python tama_read_support_levels.py -f filelist -o prefix -m mergefile

trans_read.bed - This is the output file from TAMA Collapse.

cluster_file - This is the cluster file form running clustering with the official Iso-Seq pipelines. For Iso-Seq3 the file will look like "prefix.primer_5p--primer_3p.cluster" for Iso-Seq1 the cluster file will look like "cluster_report.csv". Alternatively, if you did not do clustering and mapped the FLNC directly to the genome then you can use the "trans_read.bed" file in for the cluster_file input.

output_file - The name that you want the output file to be called.

OUTPUT:

The format of the output file is as follows:

  gene_id trans_id        gene_num_reads  trans_num_reads cluster_line
  G1      G1.1    5       1       4_c18833:m160316_194043_42149_c100936532550000001823211106101647_s1_p0/27978/2288_50_CCS

cluster_line - This field shows the clusters and reads supporting each transcript model. The field is sub-divided using ";" to delimit between cluster groups, ":" to delimit between cluster name and read name, and "," to delimit between reads names.

DEPRECATED TOOLS BELOW

tama_read_support_collapse_cluster.py

To find read support for each TAMA Collapse run use tama_read_support_collapse_cluster.py.

USAGE:

python tama_read_support_collapse_cluster.py trans_read.bed cluster_file output_file

trans_read.bed - This is the output file from TAMA Collapse.

cluster_file - This is the cluster file form running clustering with the official Iso-Seq pipelines. For Iso-Seq3 the file will look like "prefix.primer_5p--primer_3p.cluster" for Iso-Seq1 the cluster file will look like "cluster_report.csv". Alternatively, if you did not do clustering and mapped the FLNC directly to the genome then you can use the "trans_read.bed" file in for the cluster_file input.

output_file - The name that you want the output file to be called.

OUTPUT:

The format of the output file is as follows:

  gene_id trans_id        gene_num_reads  trans_num_reads cluster_line
  G1      G1.1    5       1       4_c18833:m160316_194043_42149_c100936532550000001823211106101647_s1_p0/27978/2288_50_CCS

cluster_line - This field shows the clusters and reads supporting each transcript model. The field is sub-divided using ";" to delimit between cluster groups, ":" to delimit between cluster name and read name, and "," to delimit between reads names.

tama_read_support_merge_collapse.py

To find read support for each TAMA Merge run use tama_read_support_merge_collapse.py.

USAGE:

python tama_read_support_merge_collapse.py filelist_file output_file

filelist_file - This is a file that lists all the read support files from the TAMA Collapase runs that were merged during TAMA Merge.

The format is as follows (tab delimited):

  read_support_collapse1.txt collapse1   /path/to/file/
  read_support_collapse2.txt collapse2   /path/to/file/

read_support_collapse1.txt - This is the name of the output file from running "tama_read_support_collapse_cluster" on the TAMA Collapse run.

collapse1 - This is the prefix used in the TAMA Merge run to identify the TAMA Collapse source run

/path/to/file/ - This is the path to the "read_support_collapse1.txt" file.

output_file - The name that you want the output file to be called.

OUTPUT:

The format of the output file is as follows:

  merge_gene_id   merge_trans_id  gene_read_support       trans_read_support      source_prefix   source_trans_line       source_read_line
  G34     G34.2   8       3       ovary,testes    ovary_G21.2,testes_G26.2        m160315_220438_42149_c100936532550000001823211106101642_s1_p0/121400/26_2480_CCS,m160316_194043_42149_c100936532550000001823211106101647_s1_p0/30762/26_2477_CCS;m160316_064304_42149_c100936532550000001823211106101644_s1_p0/108271/27_2482_CCS

merge_gene_id - The gene ID as given in the TAMA Merge output file.

merge_trans_id - The transcript ID as given in the TAMA Merge output file.

gene_read_support - The number of reads supporting this gene.

trans_read_support - The number of read supporting this transcript.

source_prefix - A list of the sources supporting this transcript.

source_trans_line - A list of the source transcripts supporting the merged transcript model.

source_read_line - A list of the read names support this merged transcript model. The group of reads supporting each source transcript are delimited by ";". The reads within each group are delmited by ",".The order of the group of reads matches the order given in the "source_trans_line" field.