-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of ambiguous amino acids #346
Comments
Hello Donovan! Is J an ambiguous base? I haven't heard of it: http://www.bioinformatics.org/sms/iupac.html |
Guess it depends on who you ask: Also, it appears in a non-trivial number of genomes from GenBank. :) |
Hey @dparks1134 -- Thanks for your patience with this. I have implemented this, and as a diagnostic I have the following table:
This is how I interpret the ambiguity codes. Does that look right to you? I should also note that as you can see in be77c7d , for some reason I was previously interpreting B as a synonym for N and Z as a synonym for Q. So making this change will make a minor difference in some folks' analysis in which those letters appear. Could you pull the new branch and give it a spin? |
Thanks! Table looks good to me. I'm a bit swamped at the moment, but should be able to give this a spin in the next few weeks. |
No worries! Just let me know if it looks good and I'll merge. |
Can you send me the binaries for this new release? We don't have a build environment for pplacer. |
We complied the latest code and it looks to work great. I'd say make it official! |
https://github.com/matsen/pplacer/releases/tag/v1.1.alpha18 <- here's the new release. |
pplacer currently does not handle ambiguous bases. I appreciate that from a ML perspective fully handling such characters is challenging. However, I am wondering if ambiguous bases can simply be treated as unknowns and a warning generated. This would seem preferable to causing a full exception that disallows such sequences to be inserted into a tree:
Such situations are extremely problematic when processing large data sets where quality control over the input sequences can be challenging.
The text was updated successfully, but these errors were encountered: