-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: fix embedding_dim mismatch in DocumentStore #3183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I left a comment for a possible follow up to your fix.
Also do you think we can add some unit tests to cover this code? (Open question, not sure how difficult can be reproducing the conditions to test the raise
)
"The shape of the DocumentStore index doesn't match with the shape of the embeddings from the Transformer. Please reinitiate your DocumentStore wth embedding_dim={}.".format( | ||
len(embeddings[0]) | ||
) | ||
) | ||
assert len(document_batch) == len(embeddings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the risk of going a bit off topic for the PR, since we're here I would also get rid of these assert
s, I see in the memory docstore we raise a DocumentstoreError
- we can maybe align all the docstores to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, will work on the mentioned changes. Will figure out how to write the proper test cases and add them as well. Thanks for you time !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed the issue resolved in memory docstore, will align my fixes to other docstores like the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a negative case test, recreating it becomes tedious with various models and mismatching the embedding dim for each doc store. Please advise. @masci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't produce a fast unit test with the use of mocks, let's not do it 👍
@kalki7 there's a test failing due to a bug that was fixed in |
will do |
rebasing
@masci do let me know if I've made any mistakes |
@kalki7 tests failures are unrelated, I'll merge as soon as I sort that out |
Cool thanks ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests failures were unrelated, merging now 👍
Related Issues
Proposed Changes:
When a DocumentStore is initialized with a given embedding_dim value, and the dim of the embeddings from the EmbeddingRetriever with a model do not match, an assertion error with no message is thrown. Thus have added a check if the dimension match, if they don't a clear ValueError with message is thrown to the user.
How did you test it?
init DocumentStore with a random embedding_dim that doesn't match the output dim of the EmbeddingRetriever
Notes for the reviewer
Only a check and a appropriate error message has been added after discussion for #3090
Checklist