Replies: 3 comments
-
Hi Martin, Thanks for posting the first discussion for TAMA! Regarding your question, there are multiple ways of doing this. The one I prefer is using a reference-less quantification pipeline with Kallisto or Salmon. To use these with your TAMA annotation you just need to convert the annotation into a fasta file for input to these tools. You can do so by using the bedtools as is shown in the first step of the ORF/NMD pipeline: ie: If you prefer to still use the reference genome assembly then you can just guide STAR using the TAMA annotation in GTF format. So for this pipeline you would just need to convert the TAMA bed file into a gtf file which you can do using one of the TAMA bed to gtf convertor tools: For your purposes I would use this one: Let me know if this works for you and/or if you have more questions. Cheers, |
Beta Was this translation helpful? Give feedback.
-
Hi Richards, Thanks a lot for taking your time to explain the two different approaches. It helped me to understand some differences between kallisto/salmon and STAR (though, as I remember, kallisto/salmon don't output an alignment? Which, however, is not need for DGE ...) I would like to discuss some possible issues when working with non-model-organisms, for which only a low-quality reference genome is available:
What do you think about the following approach, which is based on suggestions made by the cogent developers: Unmapped and badly mapped transcripts could be reference-free clustered with cogent (or isONclust?). Cogent then tries to reconstruct the gene to which the clustered transcripts belong. The reconstructed genes could now be appended to the reference genome. Finally, the new genome could be used as reference for GMAP/minimap mapping, followed by transcript clustering with tama. The expression of the clusters can then be quantified with one of the two above approaches you explained. The clustering with tama will generate annotations that also include genetic information of transcripts, which are originally not encoded in the reference genome, or map with low coverage. The approach of appending reference genomes with reconstructed genes would also have the advantage to generate all-in-one visualizations that allow to keep track over differences between the reference genome and generated PacBio transcripts. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi Martin,
Yes this is true.
This is true and problematic if the genome assembly is not good.
I think this is a good approach. You can use Cogent, IsONclust or Rattle for the unmapped reads. You just need to be careful about the gene models that are generated from the reference-less approach since it is difficult to filter out problematic models without the genome assembly.
I think this sounds like a great idea. Let me know if you have more questions regarding all of this or if I missed one of your questions. Cheers, |
Beta Was this translation helpful? Give feedback.
-
Dear Tama developers and community,
I am looking for some inspiration and suggestions. My aim is to do differential gene expression analysis using a PacBio IsoSeq transcriptome as reference.
I did PacBio Isoseq and Illumina paired-end sequencing of a eukaryotic alga. The two datasets were assembled de-novo into a hybrid transcriptome. I mapped the hybrid transcriptome to a publicly available reference genome of the same species (but different strain) using GMAP, and collapsed the contigs using tama_collapse.
If I understand right, I could do now two things for differential gene expression analyses. First, I could use Illumina short reads, map them to the reference genome (using e.g. STAR), and count reads based on the tama_collapse gene loci (and normalize counts with e.g. DeSeq2). However, I would prefer to map Illumina short reads directly to the novel hybrid transcriptome.
Here is my question: How can I get from the tama_collapse step to a reference transcriptome that can be used for differential gene expression analysis? (preferentially, I would like to generate a annotation file for the hybrid transcriptome with information about gene loci and transcript isoforms, that can be used as input for STAR).
Best,
Martin
Beta Was this translation helpful? Give feedback.
All reactions