- Entities are embedded in a continuous low dimensional vector space and each relations are embedded in different vector space.
- Judge whether triplet (entities, relationships, entities) can be considered as a fact by similarity based on distance.
- First, the entity vector is projected onto the vector space related to the relation, and then use L2 normal form as the similarity calculation method, and the formula is as follows:
$$ f(\textbf{h},\textbf{r},\textbf{t}) = | \textbf{M}{r}\textbf{h}+\textbf{r}-\textbf{M}{r}\textbf{t}|_{2}^{2} $$
- The negative samples are constructed by destroying the fact triples to train the model.
- The loss calculation method is as follows:
-
Finally, use BP to update the model.
-
In addition, CTransR uses the results of TransE to pre cluster entities, and then learns a relationship vector $ \textbf{r}_{c} $ for each cluster. The new loss is as follows:
$$ f_{r}(h,t)=| \textbf{M}{r}\textbf{h}+\textbf{r}-\textbf{M}{r}\textbf{t} |{2}^{2} + \alpha | \textbf{r}{c}-\textbf{r} |_{2}^{2} $$
- What`s more, the dimensions of the entity embedding vector and the relationship embedding vector can be different.
-
Clone the Openhgnn-DGL
-
Run transE model first
# For link prediction task python main.py -m TransE -t link_prediction -d FB15k -g 0 --use_best_config
-
Run transR model
# For link prediction task python main.py -m TransR -t link_prediction -d FB15k -g 0 --use_best_config
If you do not have gpu, set -gpu -1.
-
-
Number of entities and relations
entities relations 14,951 1,345 -
Size of dataset
set type size train set 483,142 validation set 50,000 test set 59,071
-
-
-
Number of entities and relations
entities relations 40,493 18 -
Size of dataset
set type size train set 141,442 validation set 5,000 test set 5,000
-
-
Evaluation metric: mrr
Testing model performance...
TrainerFlow: TransX flow
You can modify the parameters[TransE] in openhgnn/config.ini
Xiaoke Yang
Submit an issue or email to [email protected].