Support Predicting Unseen Entities #82

chanlevan · 2019-04-15T16:06:49Z

Background and Context
GNN in Knowledge Transfer for Out-of-Knowledge-Base Entities: A graph Neural Network Approach supports predicting unseen entities. We should have this model.
Description

mmercierTh · 2019-07-12T16:22:06Z

Here at Thales, we are currently trying to use this upcoming feature directly from the branch. We were previously using 1.0.2 and are facing some problems. We saw that loss function multiclass_nll that we were using in our ComplEx saved model does not seem to be supported anymore, is it correct? This may impact our current work, so would be great to get clarification on this.

Also, we tried the unseen entity example which seems to work if you use the model right after using the fit function. In our workflow we prefer to save the model on disk and then restore for later use. Unfortunately the feature branch code will fail with the following error: AttributeError: 'ComplEx' object has no attribute 'ent_emb'. Any plan to fix this?

The sample code use is attached.
unseen_test.zip

Thanks

sumitpai · 2019-07-12T16:40:25Z

multiclass_nll would be supported in our releases.

The problem is that this feature branch(feature/#82) was branched out from an older version of AmpliGraph (which didn't not have the multiclass_nll loss) and doesn't contain the later changes that have been done on the master branch. As you rightly pointed out, we will try and integrate the changes on master branch into this branch (to keep it up to date).

This feature is not complete, so you may notice some issues (like you mentioned about fit-predict), but once complete it would be merged to develop.

mmercierTh · 2019-07-15T12:27:29Z

Thanks for the clarification. What would be the ETA for completing this feature. From what I understand it is expected to be in milestone 1.2 (due august 8th?).

chanlevan · 2019-07-15T13:38:23Z

Hi @mmercierTh

I have pushed some changes to feature/82. The restored model can predict unseen entities and multiclass-loss is also supported. Because there is some changes from the multiclass_nll version, your example code should now look like:


from ampligraph.latent_features import ComplEx
from ampligraph.utils.model_utils import save_model as ampligraph_save_model
from ampligraph.utils.model_utils import restore_model as ampligraph_restore_model
import os

model = ComplEx(batches_count=2, seed=555, epochs=20, k=10, loss="multiclass_nll")

X = np.array([["a", "y", "b"],
            ["b", "y", "a"],
            ["a", "y", "c"],
            ["c", "y", "a"],
            ["a", "y", "d"],
            ["c", "y", "d"],
            ["b", "y", "c"],
            ["f", "y", "e"]])

model.fit(X)

saved_name = "./model.pkl"

ampligraph_save_model(model, model_name_path=saved_name)
restored_model = ampligraph_restore_model(model_name_path=saved_name)

print(model.predict(np.array(["z", "y", "f"]), approximate_unseen={
        "pool": "avg",
        "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
}))

print(restored_model.predict(np.array(["z", "y", "f"]), approximate_unseen={
        "pool": "avg",
        "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
}))```

chanlevan · 2019-07-15T14:13:47Z

What we have done in this moment is the baseline in the paper which the unseen vector is approximated from its neighbors using average, max or sum metric. We are expecting to have the Hamaguchi model implemented in version 1.2 with full intergrated tests and documentation.

mmercierTh · 2019-07-16T15:08:22Z

Great thanks, we tested the update and it is only working for a single prediction. but not for 2 or more consecutive prediction from a model that is loaded once. Is there any method we should call to reset the model before each prediction? The error is below with a sample code

Traceback (most recent call last):
  File "/home/user/src/NER/graph_embeddings/unseen_test.py", line 29, in <module>
    "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 2079, in predict
    approximate_unseen={**approximate_unseen, "k_size": 2 * self.k})
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 1088, in predict
    X, e, app_embs = self._assign_unseen_idx(approximate_unseen)(to_idx)(X, ent_to_idx=self.ent_to_idx, rel_to_idx=self.rel_to_idx)
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 949, in inner_dec
    k_size=approximate_unseen["k_size"])
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 1015, in _approximate_embeddings
    neighbour_vectors = self.get_embeddings(N_ent, embedding_type='entity')
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 395, in get_embeddings
    return emb_list[idxs]
IndexError: index 6 is out of bounds for axis 0 with size 6

sample code:

for i in range(2):
    print(model.predict(np.array(["z", "y", "f"]), approximate_unseen={
            "pool": "avg",
            "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
    }))

for i in range(2):
    print(restored_model.predict(np.array(["z", "y", "f"]), approximate_unseen={
            "pool": "avg",
            "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
    }))```

chanlevan · 2019-07-16T16:39:05Z

The bug has been fixed and new code has been updated to feature/82

mmercierTh · 2019-07-17T15:18:08Z

Great it is now working fine now . Thanks for your responsiveness on this issue it is greatly appreciated. Quick question related to the implementation. How exactly do you calculate the embedding of the unseen entity. Thanks

chanlevan · 2019-07-17T15:25:36Z

We collect all the neighbour entities of that unseen entity, get their corresponding trained vectors, average, take the max or sum those. You can find the details of the implementation in line 997 of file ampligraph/latent_features/models.py, function def _approximate_embeddings. Hope this helps.

plgregoire · 2019-07-17T16:59:17Z

Hi,

That is what we figure out when we looked at the code. We wanted to be sure that we could use this code to get the approximate embedding of the unknown entity

neighbour_vectors = model.get_embeddings(["c", "d"])
approximate_embedding_of_z = numpy.np.mean(neighbour_vectors)

So the embedding of z is calculated from the average of its neighbors c and d.

I find this way of approximating z a little bit odd since it is an average of all its connected entities(neighbors) instead of being a average of all entities that have the same neighbors than z.

Is there something I'm not understanding ? Could you enlighten me on that ?

chanlevan · 2019-07-17T17:26:06Z

We were targeting implement the model in the paper Knowledge Transfer for Out-of-Knowledge-Base Entities: A graph Neural Network Approach. But at the moment the implementation is not done yet, so we just support the baseline. In the page number 6 of this paper, it describes the baseline of how to approximate the unseen entity, and we used this approach.

plgregoire · 2019-07-17T18:32:51Z

Thank you for the explanation and your responsiveness
We will continue to follow the development of this feature

captify-dieter · 2019-08-12T14:09:58Z

Seems to break when filter_unseen_entities is set to True as no approximate_unseen is given and the var unseen is used before declaration in the line:

self._add_app_embs(unseen, app_embs)(self._load_model_from_trained_params)()

sumitpai · 2019-08-12T14:13:39Z

Thanks for try out this feature. This issue is under development, so it may be unstable. As described here only the baselines have been implemented but the Hamaguchi model is not yet implemented.

We are targeting this feature for 1.2 release.

captify-dieter · 2019-08-12T14:16:41Z

I meant even without using a method, it seems to break because of conflicts with code structure further down. Before using any method, I wanted to get a baseline when removing unseen entities, using the same feature branch (i.e. make sure all entities are seen, and do not pass anything as approximate_unseen). Will submit a PR if I re-structure it and fix this.

lukostaz · 2020-02-14T11:13:00Z

This will be addressed in long-term release 2.1.

chanlevan self-assigned this Apr 15, 2019

lukostaz added this to the 1.1 milestone Apr 19, 2019

chanlevan changed the title ~~Implement Graph Neural Network~~ Support Predicting Unseen Entities May 17, 2019

sumitpai added the enhancement New feature or request label Jun 10, 2019

lukostaz added the API issues related to AmpliGraph programming interface label Jun 12, 2019

lukostaz modified the milestones: 1.1, 1.2 Jul 1, 2019

chanlevan closed this as completed Jul 15, 2019

chanlevan reopened this Jul 15, 2019

sumitpai mentioned this issue Aug 12, 2019

handling small number of relations #125

Closed

lukostaz removed this from the 1.2 milestone Oct 22, 2019

lukostaz added this to the 2.1 milestone Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Predicting Unseen Entities #82

Support Predicting Unseen Entities #82

chanlevan commented Apr 15, 2019

mmercierTh commented Jul 12, 2019

sumitpai commented Jul 12, 2019 •

edited

Loading

mmercierTh commented Jul 15, 2019

chanlevan commented Jul 15, 2019

chanlevan commented Jul 15, 2019

mmercierTh commented Jul 16, 2019

chanlevan commented Jul 16, 2019

mmercierTh commented Jul 17, 2019

chanlevan commented Jul 17, 2019

plgregoire commented Jul 17, 2019

chanlevan commented Jul 17, 2019

plgregoire commented Jul 17, 2019

captify-dieter commented Aug 12, 2019 •

edited

Loading

sumitpai commented Aug 12, 2019 •

edited

Loading

captify-dieter commented Aug 12, 2019

lukostaz commented Feb 14, 2020

Support Predicting Unseen Entities #82

Support Predicting Unseen Entities #82

Comments

chanlevan commented Apr 15, 2019

mmercierTh commented Jul 12, 2019

sumitpai commented Jul 12, 2019 • edited Loading

mmercierTh commented Jul 15, 2019

chanlevan commented Jul 15, 2019

chanlevan commented Jul 15, 2019

mmercierTh commented Jul 16, 2019

chanlevan commented Jul 16, 2019

mmercierTh commented Jul 17, 2019

chanlevan commented Jul 17, 2019

plgregoire commented Jul 17, 2019

chanlevan commented Jul 17, 2019

plgregoire commented Jul 17, 2019

captify-dieter commented Aug 12, 2019 • edited Loading

sumitpai commented Aug 12, 2019 • edited Loading

captify-dieter commented Aug 12, 2019

lukostaz commented Feb 14, 2020

sumitpai commented Jul 12, 2019 •

edited

Loading

captify-dieter commented Aug 12, 2019 •

edited

Loading

sumitpai commented Aug 12, 2019 •

edited

Loading