-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error calling tRNAscan-SE #290
Comments
Hi @asierFernandezP, Thanks for trying out First of all I clearly need to change the error messages from '__ not found' to '__ failed' I think so thanks for that. I have had a crack at replicating this and I think the issue is that the GenBank output format of Therefore I have 2 options to fix this:
I am aware that the Genbank format is a nightmare, so I don't wish to spend a lot of time down the GenBank rabbit hole, but I do think I will need to change the gff to gbk conversion function of pharokka regardless of what I do to deal with cases where different contigs have different translation codes, so I think 2. would be the most preferable option. George |
Hi George, Thank you so much for your quick answer!! Indeed, I think the 2nd option would be ideal and more convenient. As you mentioned, the problem is that prodigal-gv outputs a Genbank formatted CSV file without the CDS sequences. Let me know if it would be possible for you to implement it or if any further information/help from my side is needed! And thanks again for the great tool :)) |
No problems @asierFernandezP! I will aim to implement this when I get some time hopefully in the next week or 2 - and this will make pharokka v1.5.0. I'll let you know regarding help - I might get you to do some testing before I release it. But otherwise I think I know what I need to implement to make this work. George |
I don't think I will integrate it to "stock" Pyrodigal, but i can probably figure out a way to get that into a different dedicated package that has the |
I think it would be very useful quite soon for lots of users in the gut metagenome/giant virus etc space. Also more generally speaking as sort of mentioned in althonos/pyrodigal#24, not sure what you have in mind but there's some scope to build upon your methods for alternative coding prediction and within-genome stop codon reassignment (see e.g. https://github.com/gatech-genemark/Mgcod ). Happy to help out and contribute whatever I can to help make this happen (and I'm sure @asierFernandezP and @apcamargo would be keen too by the looks). George |
@gbouras13 :
On that note, I think within-genome stop codon reassignment is probably out of scope for Prodigal, because the dynamic programming needs to have the codons from the entire genome to start finding the highest scoring gene path. However, alternative coding prediction could be feasible, similarly to the |
@althonos I've given it a test run (with the pyrodigal v3 alpha installed from source), it looks phenomenal mate. The only thing lacking I can think that would be useful (in terms of recapitulating the pyrodigal-gv output and generally) is 'genetic_code' in the description (it's between rbs_spacer and gc_cont). prodigal-gv v2.11 output
pyrodigal-gv output
the code:
George |
@gbouras13 : I have made a new Pyrodigal pre-release that supports that ( |
Beautiful, outputs look great to me. More than happy to do any testing that you would find useful Martin. George |
You're most likely the main user of this with @apcamargo at the moment so I'll keep on alpha for a bit while you try it around 👍 |
I'm still testing it out. But no problems so far! |
I've tested it out today on a number of single phages + a 673 contig gut phage test set (from Yutin et al), around 50MB fasta file. The results look great, I get exactly what is expected and the output is easily parsable for me. With Running on Intel Core i7-10700K CPU @ 3.80GHz on a machine running Ubuntu 20.04.6 LTS, it took 305 seconds to run the This compares to 184 seconds for regular For single phage contigs runtime is negligible for both (sub 1 second). Overall, I am happy with the output for my purposes, when you release Thanks @althonos and @apcamargo amazing stuff! George |
Nice to hear! 🙏 I also don't know if you've seen but there's an option to run meta mode only using viral models, in case you're looking for speedup 👍 Maybe something to expose as a flag? |
Thanks @gbouras13! I don't remember having a slowdown this dramatic when comparing Prodigal to prodigal-gv. Can you check if execution time decreases when you use the |
Hey @apcamargo and @althonos , This indeed seems to explain the difference - it took 178 seconds with For production I will follow your advice so I will leave it as was. Either way, it is more than quick enough for Pharokka's purposes, a few minutes to do CDS prediction on nearly a thousand contigs is pretty efficient. George |
Thank you guys @althonos @gbouras13 @apcamargo! That's called efficiency! Just tested it on a set of ~2000 metagenomic viral contigs and no problems observed! |
@althonos @apcamargo I'm releasing v1.5.0 today. I've added a citation to the citations section for Larradle M. and Camargo A., (2023) Pyrodigal-gv: A Pyrodigal extension to predict genes in giant viruses and viruses with alternative genetic code. https://github.com/althonos/pyrodigal-gv. If you want to change it just let me know. George |
Just spell my name right and we're good 😄 |
Easiest commit I have made, sorry about that @althonos but fixed now :) |
phrokka version: v1.4.1
Python version: Python 3.10.12
Operating System: Computing cluster
Description
I have run prodigal-gv on my viral genomes to get the ORFs (it also outputs a genbank formatted CSV file). Using the predicted ORFs as input I want to run Pharroka to annotate them (without running prodigal or Phannotate again).
What I Did
I run the following command:
And I got the following error:
Thanks in advance!
The text was updated successfully, but these errors were encountered: