-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flextaxd creates empty kraken db #48
Comments
all the files nodes.dmp library.fna seem ok @davve2 could, please you help me with this. |
I reverted back to version custom_taxonomy_databases: version 0.3.5 and now it seems to work. I'm happy to give infos if it can help to solve it. |
By the way. also the above log comes from the stderr. I think it would be good to have the kreken-build log also in the flextaxd logfile not only in stderr. |
Thank you SilasK, Good suggestion to include the kraken-build log, I will add that to the next version of flextaxd. I will look into this error, did you change anything in the source files between the two runs or were all files, taxonomy, library etc the same? If not I will try to produce files locally from the two versions and see if I can reproduce the error (and the completed database). Can you replicate the error using the --test parameter? This will help a lot during debug (It uses only a handfull of genomes to run through the pipeline within a few minutes). |
I did both rerunning flextaxd-create from the already existing library and mapping (I don't know what files the script updates and which one not). But it run into the same error. Only if create the flextaxd database with the older version the kraken db is build correctyl. By the way here is my code https://github.com/SilasK/Kraken/blob/master/workflow/build.smk I use a snakemake and usually start from a green genes formated file. |
Dear @SilasK I´m still working on this issue, I may have found a bug related to the import of greengenes official file. Sometimes emtpy nodes (g__;s__) will lead to an annotation of a node "" of multiple genomes, I´ve added a solution locally and will push an update, but this is unrelated to your problem. Preferably I want to understand and add a solution to this issue as well before I push the update. However, I cannot replicate the issue that you have, with kraken not use the files on disk. How is your genome structure looking? At the moment (unfortunately, it is on my own list of updates) the program cannot take one single large file of genomes. This is originally due to the structure of the NCBI genomeid2taxid file that doesn´t give you an identification to the genome name. I have worked with greengenes and have it working locally using the following structure on my input genomes (splitting the original fasta file into files with "taxid.fasta.gz"
Please let me know if this is of any help and otherwise perhaps you can supply some example data that I can work with to replicate the problem. Kind regards, |
Just to say, I got a similar result as @SilasK , but from a different route. I basically created the custom taxonomy with flextaxd, then manually edited the genome fastas to have the relevant "kraken:taxid" in the header Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map complete. [0.362s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 37427200 bytes
Capacity estimation complete. [2.154s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 6 bits reserved for taxid.
Completed processing of 0 sequences, 0 bp
Writing data to disk... complete.
Database files completed. [1.754s]
Database construction complete. [Total: 5.115s] The kraken db "looks" OK:
seqid2taxid.map is populated:
Yet the build clearly didn't work:
I am using
Has anything changed in the way flextaxd dumps the taxonomy to file? |
Downgraded to
and got a successful build
Crucailly this bit:
So what changed between the dump of nodes.dmp and names.dmp between the two versions? What I can see is that the newer .dmp files have one extra line:
|
There are two root nodes in the newer names.dmp:
Only one in the older
|
Looking at nodes.dmp for the new version (where names.dmp has two root nodes) it looks like the first root node has nothing hanging off it:
So I would suggest this is the first place to look for the bug |
Dear Mick, Thanks for the information, I located how the bug happens and have implemented a fix. I have a few additional updates coming very soon which will include a bugfix for this issue. I hope to get it updated today or during next week. Best, |
Thanks @davve2, keep up the excellent work and thanks for flextaxd! |
Did you manage to fix the bug? |
Yes this should now be resolved with the latest release (v0.4.3), it was created by two minor bugs, one adding two roots on join, another leading to incorrect taxonomy levels. However, it is also important to remember to export the taxonomy using --dbprogram kraken2. Otherwise the final database will not retain the information (Kraken trims at minimum one column from the right, the default export format for flextaxd contain the node information in the last column). |
I try to build a custom kreken db flextaxd creates an empty kreken db and does not even fail.
I have latest flextaxd 0.4.2 and kraken 2.1.2
Could you help me to solve this issue.
The text was updated successfully, but these errors were encountered: