Personalized PageRank using Semantic Similarity Measures
This is the code used to run our experiments for the paper "PPR-SSM: Personalized PageRank and Semantic Similarity Measures for Entity Linking".
The code has three steps:
- Generating candidates file
- Running PPR algorithm
- Analyze results
The code for each gold standard is organized on its separate directoy (hpo_src, chebi_src, and go_src). The main script of each gold standard are ones starting with "parse". The others have helper functions to generate and process data.
You can build a docker image using the Dockerfile provided on this repository or download it from dockerhub: docker pull andrelamurias/pprssm
We used the following corpora:
- HPO GSC+ (
- ChEBI patents corpus (provided with this repo)
- CRAFT ( - put brat files inside CRAFT/GO_BP and CRAFT/GO_CC)
And the following ontologies:
- Gene Ontology
For each ontology, it is necessary a OBO file and a .db file processed by DiShIn. These can be obtained with the script.
First run with flask:
export DISHIN_DB=chebi.db
flask run &
- min distance
- min similarity
- corpus dir (or ontology name for Gene Ontology entities in CRAFT corpus: "GO_BP" for GO Biological Process entities, "GO_CC" for GO Cellular Component entities)
python chebi_src/ 1 0.5 ChebiPatents/
Run the PPRforNED script:
java ppr_for_ned_chebi resnik_dishin
For GO entities in CRAFT corpus change to the desired subontology in the script
Process the results to get more results than what is given by PPRforNED:
python src/ chebi
Example output:
one candidate 431
correct 909
wrong 105
total 1014
accuracy: 0.8964497041420119
accuracy (multiple candidates): 0.8198970840480274