Skip to content

Latest commit

 

History

History
145 lines (116 loc) · 4.84 KB

README.md

File metadata and controls

145 lines (116 loc) · 4.84 KB

A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs

This is the benchmark, code, and configuration accompanying the EMNLP-Findings 2023 paper A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs. The main branch holds code/information about the benchmark itself. The following branches hold code and configuration for the separate models evaluated in the study.

Download Benchmark

mkdir data
cd data
curl -O https://madata.bib.uni-mannheim.de/424/2/wikidata5m-si.tar.gz
tar -zxvf wikidata5m-si.tar.gz

Benchmark Content

All files are tab separated.

  • entity_ids.del
    • maps ids used in all files to Wikidata IDs
    • first column entity id, second column Wikidata entity id
  • entity_mentions.del
    • maps entity ids to entity mentions
  • entity_desc.del
    • maps entity ids to entity descriptions
  • relation_ids.del
    • maps relation ids Wikidata relation ids
    • first column relation id, second column Wikidata relation id
  • relation_mentions.del
    • maps relation ids to relation mentions
  • train.del
    • contains training triples in the form of subject, relation, object

Transductive

  • valid.del
    • contains transductive validation triples in the form of subject, relation, object
  • test.del
    • contains transductive validation triples in the form of subject, relation, object

Semi-Inductive

  • all_entity_ids.del
    • contains ids from entity_ids.del and additionally all ids of unseen entities
  • all_entity_mentions.del
    • contains mentions from entity_mentions.del and additionally all mentions of unseen entities
  • all_entity_desc.del
    • contains descriptions from entity_desc.del and additionally all descriptions of unseen entities
  • valid_pool.del
    • contains all triples used for semi-inductive validation
    • columns
      • 1: unseen entity id
      • 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot)
      • 3-5: validation triple
        • 3: subject
        • 4: relation
        • 5: object
    • use prepare_few_shot.py to create all semi-inductive tasks from this file
  • test_pool.del
    • contains all triples used for semi-inductive testing
    • columns
      • 1: unseen entity id
      • 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot)
      • 3-5: test triple
        • 3: subject
        • 4: relation
        • 5: object
    • tab separated
    • use prepare_few_shot.py to create all semi-inductive tasks from this file

Generate Few Shot Tasks

  • use the file prepare_few_shot.py
  • create a few_shot_set_creator object
    • dataset_name: (str) name of the dataset
    • default: wikidata5m_v3_semi_inductive
    • use_invese: (bool) whether to use inverse relations
      • default: False
        • if True: for all triples where the unseen entity is in the object slot, increase relation id by num-relations and invert triple
    • split: (str) which split to use - default: valid
    • context_selection: (str) which context_selection technique to use - default: most_common - options: most_common, least_common, random
few_shot_set_creator = FewShotSetCreator(
	dataset_name="wikidata5m_v3_semi_inductive",
	use_inverse=True,
	split="test"
)
  • generate the data using the few_shot_set_creator
    • num_shots: (int) the number of shots to use (between 0 and 10)
data = few_shot_set_creator.create_few_shot_dataset(num_shots=5)
  • evaluation is performed in direction unseen to seen
  • output format looks like this
[
{
	"unseen_entity": <id of unseen entity>,
	"unseen_slot": <slot of unseen entity: 0 for head/subject, 2 for tail/object>,
	"triple: <[s, p, o]>,
	"context: <[unseen_entity_id, unseen_entity_slot, s, p, o]>
},
...

]

Create Benchmarks Based on Other Graphs

  • to create similar benchmark based on other graphs use the file create_semi_inductive_dataset.py
  • this file was used to create wikidata5m-si based on wikidata5m

How to Cite

  • if you use the proposed benchmark, the provided code or insights presented in the paper please cite.
@inproceedings{kochsiek2023benchmark,                                                                                                                                                                  
title={A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs},
author={Kochsiek, Adrian and Gemulla, Rainer},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
year={2023}
}