Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow annotation to gene product IDs not in neo #90

Open
cmungall opened this issue Jan 14, 2019 · 4 comments
Open

Allow annotation to gene product IDs not in neo #90

cmungall opened this issue Jan 14, 2019 · 4 comments

Comments

@cmungall
Copy link
Member

Context: https://docs.google.com/document/d/1RVlRNic37R3EQZfiNjn4R7Q6R3v_XSIg1DT7uQUwW-s/edit

Curators in the multi-organism group need to be able to annotate to species that may not be in neo

The GE allows pasting in of arbitrary IDs - we should make the form have similar functionality

@tmushayahama
Copy link
Contributor

@cmungall @thomaspd can you make the requirements. Sometime ago, we all discussed before briefly that a user will put an ID on the GP then if it is not on the autocomplete, then Noctua Form will make an exception and put it in as it is. However, I don't know if the ID is correct and very error prone Any thoughts?.

@thomaspd
Copy link

We want this to be restricted to UniProt identifiers. So the user needs to paste a valid uniprot ID, e.g. P12345. One way to do it would be this:

We could use the UniProt website API to do the request:
https://www.uniprot.org/uniprot/P12345.txt

If the service doesn't return a text entry (e.g. try the above URL with P123456 instead), the identifier is not valid and Noctua form should pop up an error: "UniProt identifier P123456 not found". If the identifier is found, text will be returned and you should parse the entry to get the lines starting with GN or OS, and print out those lines in a popup:
Uniprot identifier P12345
GENE: (text from GN line)
ORGANISM: (text from OS line)
with a confirm button, and cancel button.

@cmungall
Copy link
Member Author

Sorry, I didn't see @tmushayahama's request. Let's hold off until thursday software call. We don't want different parts of the stack calling different services.

My suggestion was just to allow pasting of IDs unchecked to bring parity with the GE. If we want to prioritize having this work properly (which is massively important for anyone outside MOD/human) then let's discuss the approach and implement universally:

  1. Ingest in neo (PR ready to be tested: Load swissprot neo#35)
  2. Use uniprot web services
  3. Use mygene web services

Note if we go services then we should use proper services not the 1980s https://www.uniprot.org/uniprot/P12345.txt swissprot format! 😄

I think any one of these should be quick to implement but each will have a few implications. For 1, increased size of neo, for 2 or 3, labels will be missing in rdf store with implications for downstream components that use it. And in fact will require lookups to be implemented at various other points in the stack to stop unlabeled IDs showing up.

cc @lpalbou @kltm @deepakunni3 @vanaukenk @balhoff

@tmushayahama
Copy link
Contributor

@cmungall @vanaukenk @thomaspd @lpalbou this was decided not to happen, right? All the gps should be in neo? Any update or changes? However, thinking about the workflow, there should be a mechanism for putting/requesting new gp and annotate without having to wait for a long time or break the curator annotation workflow (i.e. if GP is missing in NF, they cannot save or continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants