Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert chemicals to ChEBI rather than Reacto #221

Closed
ukemi opened this issue Dec 12, 2022 · 7 comments
Closed

Convert chemicals to ChEBI rather than Reacto #221

ukemi opened this issue Dec 12, 2022 · 7 comments
Assignees

Comments

@ukemi
Copy link

ukemi commented Dec 12, 2022

This came up during the QC checks of the last release. The initial discussion is pasted below. Also see the discussion from:
#176 (comment) onward.

http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-1474151

Yes, I see in the BioPAX that input [PTHP](https://reactome.org/content/detail/R-ALL-1474179#Homo%20sapiens) has CHEBI:17804 xref'd to its entityReference (in the External Reference Information section). However, the conversion code currently doesn't look at xrefs from entityReference elements on a SmallMolecule object and instead just uses its Reactome ID. Same with output [sepiapterin](https://reactome.org/content/detail/R-ALL-1497811#Homo%20sapiens) and likely every other small molecule in Reactome GO-CAMs. We can open a ticket to change this behavior to always fetch the CHEBI if that is desired.
For the enabled_by (I think this is the real ShEx violation), sepiapterin synthase (R-HSA-9693721) is in the BioPAX as a PhysicalEntity. See its [entry](https://reactome.org/content/detail/R-HSA-9693721) at Reactome and notice it does not have a CHEBI cross reference. As a result, in reacto.owl, R-HSA-9693721 only has subClassOf continuant, which is not specific enough to be inferred as either InformationBiomacromolecule or ProteinContainingComplex.
PD comment Agreed - the sepiapterin synthase (R-HSA-9693721) genome encoded entity has neither a UniProt reference link nor a crossReference to [ChEBI:36080](https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:36080) "protein". In contrast, for example the [MHDB decarboxylase (R-HSA-2167848)](https://reactome.org/content/detail/R-HSA-2167848 does have a ChEBI:36080 crossReference and its pathway [Ubiquinol biosynthesis](https://reactome.org/content/detail/R-HSA-2142789) (R-HSA-2142789) yields a GO-CAM with no SHEX error. Now patched in Reactome for the ver 83 release. Bottom line: a Reactome curation mistake that occurred after the previous clean-up of physical entities with no acceptable link to UniProt or ChEBI. Again, something easy to flag and fix during the hypothetical future one-week clean-up period.
) genome encoded entity in pathway
@deustp01
Copy link
Collaborator

N.b. Once we are sure that all chemicals do have ChEBI IDs, need to clean up REACTO to remove chemical IDs
Also need QA check and a way to automatically convert items like David Hill's manual mouse GO-CAMs that have REACTO chemicals in them.
Changes to ShEX to fix this.
Long term goal - retire REACTO. Short term process - retire parts of it as possible. Chemicals here, proteoforms soon.
Link to Ben's global ticket
https://github.com/geneontology/noctua-models-migrations/issues

@ukemi
Copy link
Author

ukemi commented Mar 10, 2023

Separate from the missing (unresolving) ChEBI identifiers that were already spotted. I need a way to check the integrity of the identifiers that are resolving. The best method that I can think of is to open the model in the graph editor, output the GPAD and cross check the label in the graph editor with the Chebi identifier in the GPAD and then cross-check those with Chebi. I will also check them with respect to the cross references in Reactome. This is a labor-intensive manual process, but I think it is necessary to ensure that things happened correctly. I will start a spreadsheet and link it to this ticket. I'm not sure how many I will check, but will look at several different pathways and several different kinds of reactions.

@nataled
Copy link
Collaborator

nataled commented Mar 10, 2023

@ukemi that shouldn't have to be done manually. I can probably whip up a way to check automatically once given the GPAD. This is not to stop any manual work that could proceed while the automated check is in progress (that's my usual procedure anyway).

@ukemi
Copy link
Author

ukemi commented Mar 10, 2023

That would be awesome @nataled! @dustine32 do you know if there are products generated from the development server? If so, is there a GPAD that @nataled could use? Even if it is a mega-file, it would be straightforward to filter on annotations from Reactome models. We might actually want to put something like this in place beyond just for this project.

@deustp01
Copy link
Collaborator

deustp01 commented Mar 10, 2023

I need a way to check the integrity of the identifiers that are resolving.

Item for Monday "weeds" - what exactly are the integrity problems (wrong charge states of ionizable compounds? other?)? In principle, this is really a Reactome curation integrity issue: we should only be using correct ChEBI instances in the first place, so the follow-on question is how to change Reactome curation and QA practice to fix them at the source. And, as suggested on Wednesday, get rid of ChEBI terms used to identify polynucleotides where SO terms would work. And, probably, also identify classes of ChEBI instances that Reactome needs to annotate weird cases - perhaps we really need those electrons and photons - to add to Jim's list of ChEBI terms legal for GO-CAM.

Your spreadsheet will be a good resource for starting to sort this out.

@ukemi
Copy link
Author

ukemi commented Mar 10, 2023

It's way simpler than that. In my own worrisome way, I just want to make sure that the process worked. That is, when I see a chemical in a model, it is the chemical that was in the original Reactome pathway and the label on the chemical is correct though the chain of Reactome ID->ChebiID in Reactome->ChebiID in GO-CAM->GO-CAM graph label.

@ukemi
Copy link
Author

ukemi commented Mar 13, 2023

I did a bit of this today and I am convinced that the integrity of the information being transferred is intact:
https://docs.google.com/spreadsheets/d/1-NxsN6eVxxWuAGuH9tGX0W2FMi3mc90AodOOxaOP7sY/edit#gid=0
I looked at the ones in the spreadsheet in detail, but also checked other reactions in the pathways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants