Skip to content

TAMA GO: Sequence Cleanup

GenomeRIK edited this page Oct 8, 2019 · 10 revisions

This set of tools in TAMA-GO is used to clean up sequences. Right now there is only one tool but it will be expanded later.

tama_flnc_polya_cleanup.py

To remove poly-A tail sequences from the FLNC fasta files use tama_read_support_levels.py. This tool is used to remove the poly-A tails left in the FLNC fasta files after running IsoSeq3 Refine without the "--require-polya" parameter. If you have Iso-Seq data generated from cDNA libraries prepared with the Teloprime kit, you should not use the "--require-polya" parameter. Using the "--require-polya" parameter will remove many reads due to an issue with the Teloprime 3' primer sequence and the way LIMA works. Instead you should run default Refine and then clean up the remaining Poly-A tails using this tool.


Instructions for Teloprime Iso-Seq data: Primer sequences (ie primers.fasta)(may need to change header depending on software version):

>primer_5p TGGATTGATATGTAATACGACTCACTATAG

>primer_3p AAAAAAAAAAAAAAAAAACGCCTGAGA

Run LIMA depending on the version you are using (for IsoSeq3 3.2): lima --isoseq --dump-clips --no-pbi --peek-guess -j 24 ccs.bam primers.fasta demux.bam

Run refine without the "--require-polya" argument (for IsoSeq3 3.2): isoseq3 refine output.5p--3p.bam primers.fasta flnc.bam

Convert flnc.bam file into a fasta file: bamtools convert -format fasta -in flnc.bam > flnc.fa

Run tama_flnc_polya_cleanup.py to remove remaining 3' poly-A tails: python tama_flnc_polya_cleanup.py -f flnc.fa -p prefix

The resulting fasta file is now ready for genome mapping.


In order to convert the FLNC BAM file into a fasta file you can use this command: bamtools convert -format fasta -in bam_file > fasta_file

Note: This is not a part of TAMA. This is bamtools.

usage: tama_flnc_polya_cleanup.py [-h] [-f] [-p]

optional arguments:

  -h, --help  show this help message and exit
  -f F        FLNC fasta file
  -p P        Prefix for output file

Default command would look like this:

python tama_flnc_polya_cleanup.py -f flnc.fa -p prefix

Detailed explanation of arguments:

-f F

The FLNC fasta file is the output from running IsoSeq3 Refine and then the BAM to Fasta conversion.

-p P

This is the prefix used for the file naming of all the output files.

Outputs:

  prefix.fa
  prefix_polya_flnc_report.txt

Detailed explanation:

prefix.fa

This is the cleaned up FLNC fasta file.

prefix_polya_flnc_report.txt

This is a report file showing a table of the number of sequences with different counts of poly-A's.

  polya_num       polya_num_count
  0       40676
  1       46986
  2       63718