-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove (don't load) models that are a single node #202
Comments
Here's a weedy suggestion for how to proceed. Agreed that a pathway with no content except subpathways yields a GO-CAM with no informative content, but before simply discarding these it would be prudent to get a list of all the single-node pathways for manual inspection to confirm that nothing is lost. I expect that everything on the list will be OK for removal - even where a curator has made one of these as a placeholder and plan to fill in individual reaction children along with the pathway children, when and if that happens the pathway will then pass the rule proposed here and will get loaded OK. And a naive question. Is the Reactome event hierarchy somehow preserved in the exported GO-CAM structure? I guess that it is not and in that case, these empty grouping pathways do not have a useful linking role. |
This is a great question that has now set me thinking. The Reactome hierarchy is not preserved because of the inability to discriminate is_a and Part_of (another thing that I think we could brainstorm about at a face-2-face. I think you had some good ideas about this). However, let's say that there is a Reactome pathway that has no reactions, but only pathways as children. If the parent pathway has an asserted GO BP term mapped and none of the children do, it would be safe to put the parent pathway on generic children. It doesn't matter is the child is a subclass or a part of the parent because we won't represent that. The parent BP will just go to the new top node of the model. I'm not sure how many of these exist, but I think I've seen some. To follow up, Peter sent an e-mail to Guanming:
who replied:
to which Peter replied:
Guanming:
Peter:
Guanming:
|
Now deferred from things to do in connection with GO-CAM build from Reactome 82 - make this a headache for another day |
I'm trying to knock this ticket out as part of getting the comprehensive list of "done" Reactome GO-CAMs with no molecular event placeholder activities. When applying the criteria "contains no molecular events," I kept seeing these empty parent pathway models containing only a single BP node in the results. Simply blocking the write-out of these model files would clean up these results. So I applied this filter and did a before-after comparison to get the list of models (351 total) that would be removed, attached below: @ukemi @deustp01 Please take a look and let me know if you catch anything wrong with this list. Thanks! |
To track checking, made a Google Doc, "Getting rid of models that are a single node" in the "Getting rid of molecular events" folder. |
I've now checked the list. Bottom line - 350 pathways should be blocked as Dustin proposes and one should be edited out of existence in Reactome. Weedy details - Most of the pathways on the "Getting rid of models that are a single node" list have only one or more other pathways as children (indicated with a simple “no” in column 2 in the list / table). While they provide useful grouping information for a future GO-CAM structure that shows causal relationships among pathways, blocking their write-out now is correct. A subset of these have only a single pathway as a child (“no” in column 2, “has only one pathway child” in column 3). Dustin should block these just like the simple-no ones. They are flagged because in Reactome as in GO it does not make much sense for a higher-level grouping term to have only a single child, so a useful side-effect of this review of candidates for blocking is a list of pathways in Reactome that are candidates for rearrangement / merging to eliminate unneeded steps in the Reactome pathway hierarchy. One pathway, R-HSA-1630316, contains a single reaction child but this can be fixed on the Reactome side by putting the reaction in a different pathway, which actually it belongs to. A number of pathways are composed of multiple drug (one of the participants is flagged as a drug) or stealth-drug (one of the participants is a set all of whose members are drugs) reactions. The GO-CAM script correctly suppresses the generation of activity units from these reactions, resulting in an empty pathway. These too should be blocked at the write-out stage. I’ve flagged them separately (“YES” in column 2 and drug verbiage in column 3) because I guess that Dustin may want to reinforce the say-no-to-drugs tests to filter them out more elegantly? |
This is indeed true. One task to make use of this in GO-CAM would be to try to figure out a method for determining if the child pathways are is_a or part_of children of the grouping pathway. Right now it is a mix. |
While we've been aware of the distinction, we never had to deal with it within the Reactome event hierarchy, so now there's a big legacy clean-up problem. My hunch is that it would be straightforward for a human curator to make the classification correctly, but tedious. @dustine32 do you see any hope here for a script that could sort out the two classes of pathway reliably, and identify pathways that have both is_a and part_of children, because I expect we have some - there's nothing to prohibit them? |
Sorry @ukemi @deustp01, I just noticed this request as I came to announce I was ready to merge the single-node pathway fix code. For the two aspects of this request:
Also, I think this still means I'm good to start merging the single-node pathway filtering code into |
This might be something worth looking at in New York. I have looked before and discriminating between is_a and Part_of was not obvious. However, one thing I didn't do was look at preceding reaction relationships. Another possibility is to look at the asserted BPs that are on the pathways and interrogate the ontology for relationships, but I'm not sure this will work. I think there are a lot of 'partial' pathways that are asserted to be the pathway. We might want to have a look at some of these in NYC. They are not asserted, so you won't see them in the BioPax. |
…hwys Skip model writeout if no events or functions; for #202
In some cases, a Reactome pathway doesn't have any reactions that are directly associated with it. Instead it has a collection of subpathways under it. In those cases, the parent gets imported as a single node with nothing else associated with it. We should not load these.
eg R-HSA-71291
The text was updated successfully, but these errors were encountered: