-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wangs semantic similarity method #183
Comments
This is a fantastic idea. Let me look into it. Do you have additional information? We are also looking into implementing Yang's add-on for using the terms below the terms of interest and have alpha code developed that is looking good, but right now we are too busy to add it due to the surrounding tests and notebooks that would also need to be added. Please excuse the delay in response, I have been busy working finishing a publication and thesis. GREAT idea about Wang's semantic similarity. I will check it out. |
That sounds good! Yangs semantic similarity method sounds good, since I'm especially looking for methods that are not IC based. Wangs semantic similarity is described nicely in this documentation of the R package GOSemSim (https://www.bioconductor.org/packages/release/bioc/vignettes/GOSemSim/inst/doc/GOSemSim.html#wang-method) and in the original publication (https://doi.org/10.1093/bioinformatics/btm087). Thanks! |
Hello! I've got the first cut of Wang's semantic similarity. It is not yet ready for prime time, but will be soon. The current test passed on the data in Wang's GODag in Fig 1, with the expected results being from Wang's Table 1 for svalues and the semantic similarity value in Wang's section 2.1. I still need to add functionality for using alternate GO IDs and plan to add a special plotting class for pairs of GO IDs, which will be useful to researchers and will help us in debugging tests. So.... The effort to add Wang's semantic similarity is well underway. Thank you so much for opening this issue. What a great idea. |
That's awesome, thank you so much! I'm excited to try it on my data. |
This is incredible, as I was in need of such a tool just a few days ago. Thanks for all your work! I'd be interested in beta testing. |
I also saw this repo, which hasn't been updated or maintained in about 2 years, but I've been using this so far. |
Also, there is an updated method to Wang's original similarity metric, as described here: https://pubmed.ncbi.nlm.nih.gov/26356015/ It addresses two issues with the original score; (1) the need for empirical weights, and (2) computational cost for many pairwise term scores. |
@ejmolinelli , Thanks so much for the link. I'll take a look at it. |
I have been testing the new GOATOOLS Wang semantic similarity by comparing our values to the values generated by pygosemsim. I would have liked to compare the Wang values to Bioconductor's GOSemSim, but was not sure how to get the go-basic.obo that they used. If anybody knows how to do this, I will compare test our Wang values against theirs. Our speed is a bit faster than pygosemsim overall. I believe there is a mistake in pygosemsim: The only way to match GOATOOLS Wang values to the pygosemsim Wang values was to:
This is NOT the same as only using the part_of relationships to get the ancestors and then using the same edge_weights as above with the regulates relationships set to 0.0. The correct way is to get ancestors by traversing up only the relationships that are specifically requested by the user, as is done in GOATOOLS's get_go2ancestors, not by traversing up all relationships and then zeroing out the edge weights for the Wang S-value calculations, as seems to happen in pygosemsim. I will be submitting Wang's semantic similarity calculated in GOATOOLS soon, with tests and documentation |
Another note: In pygosemsim, the function round is used multiple times, which is troublesome... The troubles of Python's round function are reported in https://stackoverflow.com/questions/13479163/round-float-to-x-decimals. And in May 2020, https://github.com/mdickinson, wishes for the deprecation of the two-argument form of round in Python here: micropython/micropython#3516 (comment) |
Thanks for implementing this and for finding the flaws in pygosemsim! For goSemSim: do you need the go-basic.obo to ensure that you are using the same and that your results are comparable? goSemSim is using GO.db, another R package for the graph structure (https://bioconductor.org/packages/release/data/annotation/html/GO.db.html). Therefore you don' need to provide a go-basic.obo for the actual computations of semantic similarity. You just need a species, but since Wangs method is not IC based it shouldn't matter which one you use. |
Yes, I use the same go-basic.obo for both GOATOOLS and pygosemsim. I would need the version used in R's GO.db package, of which their documentation says this: Mappings were based on data provided by: Gene Ontology http://current.geneontology.org/ontology/gobasic.obo With a date stamp from the source of: 2020-05-02 I looked for the 2020-05-02 go-basic.obo on the gene ontology website, but only found source files that are run through a program to generate a go-basic.obo file. I don't believe that I have access to that program. Another item to consider is we also don't know how R's program to store the GO DAG in R's GO.db works and can't really know if we would be comparing the exact same data. So it looks like we will not be able to compare to Bioconductor's GoSemSim. That is a shame, because it is always useful to compare results. Regardless, the GOATOOLS implementation is working well. It matches the small amount of data in the Wang paper and compares well to pygosemsim, if in the test we use the two points mentioned in my comment above. I will submit it soon... |
#183 2. Changed code to workaround new formats in Gene Ontology Consortium's annotations https://github.com/geneontology/go-annotation/issues/3373 geneontology/go-annotation#3523 3. Moved reldepth calculations into its own module to support Wang's method and to give researcher ability to calc reldepths with subset of relationships geneontology/go-annotation#3523
I have implemented Wang's semantic similarity, documented it with examples in a Juypter notebook, and added tests. Please give it a try and let us know what you think. I am using another method in my thesis, but think Wang might be a good step in "future work." Thank you @ejmolinelli again for the link for the update on Wang's semantic similarity. This also looks like a step in the right direction and I would like to implement it. |
Thank you so much! I'll give it a try. The link to the notebook is not correct but I found it anyway! :) |
How can I access these changes? They're not released yet and I also cannot access them via the development version. |
@tanghaibao , can you release a new version of GOATOOLS so that the newly implemented Wang's Semantic Similarity can be easily available to all? @ThHarbig , I also corrected the link in the comment. Thanks for giving us the heads up. |
Updated to |
I still cannot find reference "semsim" after upgrading to v1.0.12 |
@ThHarbig , Thank you so much for commenting so quickly. You are correct. The |
@tanghaibao, can you release a new version of GOATOOLS? I added the missing new semantic similarity directories to the setup.py file and added a new test to ensure that we won't have this problem again. @ThHarbig , thank you so much for taking your time to report this issue. We should not see it again with the addition of the new test. |
Updated to |
Hello. I am starting to work with goatools, which python version is more friendly with this package? I am using (at the beginning) pycharm. Any suggestions? Thank you for the work! |
Hi,
I'm working on a web application which uses goatools functions in its backend and I'd like to provide further semantic similarity methods to the users. I just wanted to ask how hard and feasible it would be to implement Wangs semantic similarity in goatools. I thought about doing it myself and create a pull request but I do not know if it is feasible because in Wangs method all terms in the DAG contribute to the semantics of a term.
Thanks!
The text was updated successfully, but these errors were encountered: