Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Predicting Unseen Entities #82

Open
chanlevan opened this issue Apr 15, 2019 · 16 comments
Open

Support Predicting Unseen Entities #82

chanlevan opened this issue Apr 15, 2019 · 16 comments
Assignees
Labels
API issues related to AmpliGraph programming interface enhancement New feature or request
Milestone

Comments

@chanlevan
Copy link
Contributor

Background and Context
GNN in Knowledge Transfer for Out-of-Knowledge-Base Entities: A graph Neural Network Approach supports predicting unseen entities. We should have this model.
Description

@chanlevan chanlevan self-assigned this Apr 15, 2019
@lukostaz lukostaz added this to the 1.1 milestone Apr 19, 2019
@chanlevan chanlevan changed the title Implement Graph Neural Network Support Predicting Unseen Entities May 17, 2019
@sumitpai sumitpai added the enhancement New feature or request label Jun 10, 2019
@lukostaz lukostaz added the API issues related to AmpliGraph programming interface label Jun 12, 2019
@lukostaz lukostaz modified the milestones: 1.1, 1.2 Jul 1, 2019
@mmercierTh
Copy link

Here at Thales, we are currently trying to use this upcoming feature directly from the branch. We were previously using 1.0.2 and are facing some problems. We saw that loss function multiclass_nll that we were using in our ComplEx saved model does not seem to be supported anymore, is it correct? This may impact our current work, so would be great to get clarification on this.

Also, we tried the unseen entity example which seems to work if you use the model right after using the fit function. In our workflow we prefer to save the model on disk and then restore for later use. Unfortunately the feature branch code will fail with the following error: AttributeError: 'ComplEx' object has no attribute 'ent_emb'. Any plan to fix this?

The sample code use is attached.
unseen_test.zip

Thanks

@sumitpai
Copy link
Contributor

sumitpai commented Jul 12, 2019

multiclass_nll would be supported in our releases.

The problem is that this feature branch(feature/#82) was branched out from an older version of AmpliGraph (which didn't not have the multiclass_nll loss) and doesn't contain the later changes that have been done on the master branch. As you rightly pointed out, we will try and integrate the changes on master branch into this branch (to keep it up to date).

This feature is not complete, so you may notice some issues (like you mentioned about fit-predict), but once complete it would be merged to develop.

@mmercierTh
Copy link

Thanks for the clarification. What would be the ETA for completing this feature. From what I understand it is expected to be in milestone 1.2 (due august 8th?).

@chanlevan
Copy link
Contributor Author

Hi @mmercierTh

I have pushed some changes to feature/82. The restored model can predict unseen entities and multiclass-loss is also supported. Because there is some changes from the multiclass_nll version, your example code should now look like:


from ampligraph.latent_features import ComplEx
from ampligraph.utils.model_utils import save_model as ampligraph_save_model
from ampligraph.utils.model_utils import restore_model as ampligraph_restore_model
import os

model = ComplEx(batches_count=2, seed=555, epochs=20, k=10, loss="multiclass_nll")

X = np.array([["a", "y", "b"],
            ["b", "y", "a"],
            ["a", "y", "c"],
            ["c", "y", "a"],
            ["a", "y", "d"],
            ["c", "y", "d"],
            ["b", "y", "c"],
            ["f", "y", "e"]])

model.fit(X)

saved_name = "./model.pkl"

ampligraph_save_model(model, model_name_path=saved_name)
restored_model = ampligraph_restore_model(model_name_path=saved_name)

print(model.predict(np.array(["z", "y", "f"]), approximate_unseen={
        "pool": "avg",
        "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
}))

print(restored_model.predict(np.array(["z", "y", "f"]), approximate_unseen={
        "pool": "avg",
        "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
}))```

@chanlevan chanlevan reopened this Jul 15, 2019
@chanlevan
Copy link
Contributor Author

What we have done in this moment is the baseline in the paper which the unseen vector is approximated from its neighbors using average, max or sum metric. We are expecting to have the Hamaguchi model implemented in version 1.2 with full intergrated tests and documentation.

@mmercierTh
Copy link

Great thanks, we tested the update and it is only working for a single prediction. but not for 2 or more consecutive prediction from a model that is loaded once. Is there any method we should call to reset the model before each prediction? The error is below with a sample code

Traceback (most recent call last):
  File "/home/user/src/NER/graph_embeddings/unseen_test.py", line 29, in <module>
    "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 2079, in predict
    approximate_unseen={**approximate_unseen, "k_size": 2 * self.k})
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 1088, in predict
    X, e, app_embs = self._assign_unseen_idx(approximate_unseen)(to_idx)(X, ent_to_idx=self.ent_to_idx, rel_to_idx=self.rel_to_idx)
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 949, in inner_dec
    k_size=approximate_unseen["k_size"])
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 1015, in _approximate_embeddings
    neighbour_vectors = self.get_embeddings(N_ent, embedding_type='entity')
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 395, in get_embeddings
    return emb_list[idxs]
IndexError: index 6 is out of bounds for axis 0 with size 6

sample code:

for i in range(2):
    print(model.predict(np.array(["z", "y", "f"]), approximate_unseen={
            "pool": "avg",
            "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
    }))

for i in range(2):
    print(restored_model.predict(np.array(["z", "y", "f"]), approximate_unseen={
            "pool": "avg",
            "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
    }))```

@chanlevan
Copy link
Contributor Author

The bug has been fixed and new code has been updated to feature/82

@mmercierTh
Copy link

Great it is now working fine now . Thanks for your responsiveness on this issue it is greatly appreciated. Quick question related to the implementation. How exactly do you calculate the embedding of the unseen entity. Thanks

@chanlevan
Copy link
Contributor Author

We collect all the neighbour entities of that unseen entity, get their corresponding trained vectors, average, take the max or sum those. You can find the details of the implementation in line 997 of file ampligraph/latent_features/models.py, function def _approximate_embeddings. Hope this helps.

@plgregoire
Copy link

Hi,

That is what we figure out when we looked at the code. We wanted to be sure that we could use this code to get the approximate embedding of the unknown entity

neighbour_vectors = model.get_embeddings(["c", "d"])
approximate_embedding_of_z = numpy.np.mean(neighbour_vectors)

So the embedding of z is calculated from the average of its neighbors c and d.

I find this way of approximating z a little bit odd since it is an average of all its connected entities(neighbors) instead of being a average of all entities that have the same neighbors than z.

Is there something I'm not understanding ? Could you enlighten me on that ?

@chanlevan
Copy link
Contributor Author

We were targeting implement the model in the paper Knowledge Transfer for Out-of-Knowledge-Base Entities: A graph Neural Network Approach. But at the moment the implementation is not done yet, so we just support the baseline. In the page number 6 of this paper, it describes the baseline of how to approximate the unseen entity, and we used this approach.

image

@plgregoire
Copy link

Thank you for the explanation and your responsiveness
We will continue to follow the development of this feature

@captify-dieter
Copy link

captify-dieter commented Aug 12, 2019

Seems to break when filter_unseen_entities is set to True as no approximate_unseen is given and the var unseen is used before declaration in the line:

self._add_app_embs(unseen, app_embs)(self._load_model_from_trained_params)()

@sumitpai
Copy link
Contributor

sumitpai commented Aug 12, 2019

Thanks for try out this feature. This issue is under development, so it may be unstable. As described here only the baselines have been implemented but the Hamaguchi model is not yet implemented.

We are targeting this feature for 1.2 release.

@captify-dieter
Copy link

I meant even without using a method, it seems to break because of conflicts with code structure further down. Before using any method, I wanted to get a baseline when removing unseen entities, using the same feature branch (i.e. make sure all entities are seen, and do not pass anything as approximate_unseen). Will submit a PR if I re-structure it and fix this.

@lukostaz lukostaz removed this from the 1.2 milestone Oct 22, 2019
@lukostaz lukostaz added this to the 2.1 milestone Feb 14, 2020
@lukostaz
Copy link
Contributor

This will be addressed in long-term release 2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API issues related to AmpliGraph programming interface enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants