-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
genome name changed in output NJ_tree #159
Comments
@flass did you use dendropy or rapidnj for the tree? The latter will be much faster for a large dataset |
it was using the default tool - I assumed it to be RapidNJ (was pretty fast to compute indeed). |
The default is to use dendropy, and the changes are probably down to the dendropy's treatment of underscores, which is complex: https://dendropy.org/primer/taxa.html. I would add We should still take a look at the denropy behaviour - have you got a test set of ~10 sequences, with odd names, we could use @flass? @johnlees I think we need to add |
Ah do we not use that? Yes, we should add the flag. This will be addressed in #148 then |
@nickjcroucher see the list of names below:
|
Fixed in 66ca2d2 |
Hi John,
I've got a yet another minor bug to report here:
Versions
I am using PopPUNK v2.3.0 with pp-sketchlib 1.6.2, as provided by a conda environment built with
Command used and output returned
Describe the bug
When uploading viz output files to microreact, I realised there was something wrong as some genome had their name edited in the NJ tree Newick file.
the name of the genome is
RKI-ZBS2-CH129_TACAGC_L002.contigs_spades
but appears in950Vc_core_NJ.nwk
as:'RKI-ZBS2-CH129 TACAGC L002'
so with underscores replaced by spaces.
The other output files (
.csv
and.dot
) have the correct spelling, even though in some it's edited as well to drop the.contigs_spades
suffix:in the
950Vc_perplexity20.0_accessory_tsne.dot
file:in the
950Vc_grapetree_clusters.csv
and950Vc_microreact_clusters.csv
files:in the
950Vc_clusters.csv
file:So because of the difference between
950Vc_core_NJ.nwk
and950Vc_microreact_clusters.csv
it leads to a bug when when uploading to Microreact.Weirdly there are many other genomes that have underscores in their name but none other have been replaced. is this due to some specificity of that name that is wrongly parsed when getting rid of the name tail? (it's true it's got dashes and underscores and dots)
In this case it's only one name to correct so it's easy to deal with but I've had it before where many names more were missing/edited, so properly preventing me to enjoy the Microreact viz.
Best,
Florent
The text was updated successfully, but these errors were encountered: