-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements to contexts #134
Comments
In theory we only need one context for each nomenclatural code. From a UI Maybe it would be more useful to allow an arbitrary taxon to be used as a |
I agree with the idea that every higher taxon in OTT should be usable as a context.
But those would all be implementation details, and not visible to (or confusing to) the client. |
While I understand the benefit of tolerating synonyms, it actually seems much cleaner to me to require 2 calls:
|
Well, besides name disambiguation, the other advantage of contexts is that they limit the search space and are thus faster and provide better fuzzy matches. For example, "Felis domestica" (an invalid name for housecat) is a close fuzzy match to "Malus domestica" (apple). I note these are already separated by existing contexts but at least it illustrates the advantage of using more limited scope for fuzzy matching (and I will reiterate: the speed improvements for fuzzy matching could be significant). As far as using any arbitrary taxon for contexts, this is theoretically possible to do that currently, but it would require quadratic space and runtime to store/build the indexes: each one includes entries for all the descendants of the specified taxon. That seems prohibitive. Mark's ideas seem promising. In the mean time, adding a handful more contexts at shallow levels in the taxonomy could be helpful and would require almost no effort and only a moderate amount of disk space, but I'm not sure how many nor which taxa to use. Maybe @uyedaj could provide some thoughts. |
It's not quadratic, it's n log n. But I agree it's probably too big given
the current prices for AWS instances.
Awaiting examples and/or criteria. They're not hard to add.
|
I don't have specific examples...I guess recently I was working with a cephalopod and a elasmobranch phylogeny. I was hoping that you could turn any higher taxon into a context, and then the user could just query whatever name they wanted (e.g. sharks, selachii, selachimorpha), get the ottid, and then use it as a context for querying tnrs. Failing that, the standard textbook list of named animal clades would be useful. Some of these are already available, but others are not. e.g.: Porifera, Ctenophora, Rotifera, Onychophora, Echinodermata, Brachiopoda, Bilateria, Lophotrochozoa, Ecdysozoa, Protostomes, Deuterostomes Within larger groups, would be useful to have things like: This is by no means exhaustive. As Cody said, my main issue is not disambiguation but speed. Even querying OpenTree for ottids when the names are exact matches is slower than I would like it to be for large trees. |
It depends on the shape of the tree, right? If the tree were fully Mark's suggestion certainly seems more space efficient (and time efficient |
Some feature requests from @uyedaj:
The text was updated successfully, but these errors were encountered: