Skip to content

TAMA GO: Formatting

GenomeRIK edited this page Apr 20, 2022 · 12 revisions

Tools for converting between different file formats

This is a set of tools to convert between different file formats. Please note that due to format flexibility these tools may not always work for your specific file. The GTF and GFF file formats have a lot of flexibility which means it is virtually impossible to have a universal GTF/GFF convertor. Please check the converted files to see that they are correct before proceeding with any downstream processing.

Note: Errors outputs pertaining to CDS information is ok and just showing you possible issues with the input files. The converted fille will still be generated.

tama_convert_bed_gtf_ensembl_no_cds.py

To convert a TAMA bed12 file into Ensembl GTF format without CDS represented use tama_read_support_collapse_cluster.py.

USAGE:

python tama_convert_bed_gtf_ensembl_no_cds.py bed_file output_file

bed_file - This is the main annotation output file from either TAMA Collapse or TAMA Merge.

output_file - The name that you want the output file to be called.


tama_convert_bed_gtf_ensembl_orf_nmd.py

To convert a TAMA bed12 file into Ensembl GTF format with CDS represented use tama_convert_bed_gtf_ensembl_orf_nmd.py. Please note that you must use the output from the TAMA ORF/NMD Predictor tool for this.

USAGE:

python tama_convert_bed_gtf_ensembl_orf_nmd.py bed_file output_file

bed_file - This is the main annotation output file from the TAMA ORF/NMD Predictor tool.

output_file - The name that you want the output file to be called.


tama_format_gff_to_bed12_cupcake.py

To convert a Cupcake GFF file into bed12 format use tama_format_gff_to_bed12_cupcake.py.

USAGE:

python tama_format_gff_to_bed12_cupcake.py gff_file output_file

gff_file - This is the main annotation output file from Cupcake Collapse.

output_file - The name that you want the output file to be called.


tama_format_gtf_to_bed12_ensembl.py

To convert an Ensembl GTF file into bed12 format use tama_format_gtf_to_bed12_ensembl.py. Note: Make sure you use the correct version of the Ensembl GTF annotation. (For example: ftp://ftp.ensembl.org/pub/release-98/gtf/homo_sapiens/Homo_sapiens.GRCh38.98.gtf.gz)

USAGE:

python tama_format_gtf_to_bed12_ensembl.py gtf_file output_file

gtf_file - This is the GTF file from an Ensembl annotation.

output_file - The name that you want the output file to be called.


tama_format_gtf_to_bed12_ncbi.py

To convert a NCBI GTF file into bed12 format use tama_format_gtf_to_bed12_ncbi.py.

USAGE:

python tama_format_gtf_to_bed12_ncbi.py gtf_file output_file

gtf_file - This is the GTF file from an Ensembl annotation.

output_file - The name that you want the output file to be called.


tama_format_gtf_to_bed12_stringtie.py

To convert a stringtie/cufflinks GTF file into bed12 format use tama_format_gtf_to_bed12_stringtie.py.

USAGE:

python tama_format_gtf_to_bed12_stringtie.py gtf_file output_file

gtf_file - This is the main annotation output file from either Stringtie or Cufflinks.

output_file - The name that you want the output file to be called.


tama_format_gff_to_bed12_liftoff.py

To convert a Liftoff GFF3 file into bed12 format use tama_format_gff_to_bed12_liftoff.py.

USAGE:

python tama_format_gff_to_bed12_liftoff.py gff_file output_file

gt=ff_file - This is the GFF3 file from Liftoff.

output_file - The name that you want the output file to be called.


tama_format_id_filter.py

To re-arrange the ID line (4th column) in the bed12 file. Usually used for making the public annotation ID's the primary ID's after using TAMA Merge to match project transcriptome annotations to public annotations (when using the -s command)

USAGE:

python tama_format_id_filter.py -b bed_file -o output_file

bed_file - This is the main annotation output file from either TAMA Merge or the TAMA ORF/NMD pipeline.

output_file - The name that you want the output file to be called.

optional arguments:

  -h, --help  show this help message and exit
  -b B        bed file (required)
  -o O        Output file name (required)
  -f F        Filter level (default "none", use "only_match" to only include
              models with a match)
  -s S        Sub-field management method (default "ensembl_merge" for
              restructuring sub-fields from Ensembl ID, use "custom" to define
              sub-field shuffling)
  -r R        Sub-field reshuffle parameter (default "none")
  -d D        Sub-field reshuffle delimiters (default ";")

Default run is for a TAMA Merge output where the "-s" option was used to pull public annotation ID's.

You can also use this for a special type of ORF/NMD run where you Blastp to a reference annotation CDS peptide file to get a comparison between species.

Example:

python tama_format_id_filter.py -b bed_file -o output_file -s ensembl_orf

And if you want to filter for only models that match:

python tama_format_id_filter.py -b bed_file -o output_file -s ensembl_orf -f only_match

You can also use this to re-arrange the ID line in a BED12 file. For instance if you want to re-arrange subfields 1,2,3 to make it 2,1,3 and the subfield delimiters are ";" and "," you would use:

python tama_format_id_filter.py -b bed_file -o output_file -s custom -r 2,1,3 -d ";,"