Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: fix embedding_dim mismatch in DocumentStore #3183

Merged
merged 6 commits into from
Sep 15, 2022

Conversation

kalki7
Copy link
Contributor

@kalki7 kalki7 commented Sep 8, 2022

Related Issues

Proposed Changes:

When a DocumentStore is initialized with a given embedding_dim value, and the dim of the embeddings from the EmbeddingRetriever with a model do not match, an assertion error with no message is thrown. Thus have added a check if the dimension match, if they don't a clear ValueError with message is thrown to the user.

How did you test it?

init DocumentStore with a random embedding_dim that doesn't match the output dim of the EmbeddingRetriever

Notes for the reviewer

Only a check and a appropriate error message has been added after discussion for #3090

Checklist

@kalki7 kalki7 requested a review from a team as a code owner September 8, 2022 05:54
@kalki7 kalki7 requested review from masci and removed request for a team September 8, 2022 05:54
@kalki7 kalki7 changed the title bug: fix embedding_dim mismatch in DocumentStore (deepset-ai#3090) bug: fix embedding_dim mismatch in DocumentStore (#3090) Sep 8, 2022
@kalki7 kalki7 changed the title bug: fix embedding_dim mismatch in DocumentStore (#3090) bug: fix embedding_dim mismatch in DocumentStore Sep 8, 2022
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I left a comment for a possible follow up to your fix.

Also do you think we can add some unit tests to cover this code? (Open question, not sure how difficult can be reproducing the conditions to test the raise)

"The shape of the DocumentStore index doesn't match with the shape of the embeddings from the Transformer. Please reinitiate your DocumentStore wth embedding_dim={}.".format(
len(embeddings[0])
)
)
assert len(document_batch) == len(embeddings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of going a bit off topic for the PR, since we're here I would also get rid of these asserts, I see in the memory docstore we raise a DocumentstoreError - we can maybe align all the docstores to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, will work on the mentioned changes. Will figure out how to write the proper test cases and add them as well. Thanks for you time !!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed the issue resolved in memory docstore, will align my fixes to other docstores like the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a negative case test, recreating it becomes tedious with various models and mismatching the embedding dim for each doc store. Please advise. @masci

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't produce a fast unit test with the use of mocks, let's not do it 👍

@kalki7 kalki7 requested a review from masci September 12, 2022 10:12
@masci
Copy link
Contributor

masci commented Sep 14, 2022

@kalki7 there's a test failing due to a bug that was fixed in main recently. Can you rebase this PR on top of main?

@kalki7
Copy link
Contributor Author

kalki7 commented Sep 14, 2022

main

will do

@kalki7
Copy link
Contributor Author

kalki7 commented Sep 14, 2022

@masci do let me know if I've made any mistakes

@masci
Copy link
Contributor

masci commented Sep 14, 2022

@kalki7 tests failures are unrelated, I'll merge as soon as I sort that out

@kalki7
Copy link
Contributor Author

kalki7 commented Sep 15, 2022

Cool thanks !

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests failures were unrelated, merging now 👍

@masci masci merged commit f50b496 into deepset-ai:main Sep 15, 2022
brandenchan pushed a commit that referenced this pull request Sep 21, 2022
* match index dim with embed dim (#3090)

* aligned messages across all docstores

* aligned messages across all docstores (#3090)

* aligned messages across all docstores (#3090)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Embedding Dimension of FAISSDocumentStore not working
2 participants