Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: FAISSDocumentStore - make write_documents properly work in combination w update_embeddings #5221

Merged
merged 6 commits into from
Jul 3, 2023

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Jun 28, 2023

Related Issues

Proposed Changes:

The bug was described in #2834.
In short:

  • when the write_documents method was called with duplicate_documents="overwrite" and without providing embeddings, for each document, FAISS vector_id was not set (even if the embedding already exists in the index).
  • calling update_embeddings method with update_existing_embeddings=False does not make the embeddings for existing documents be computed (this is correct), but then the overwritten documents have no vector_id (the embeddings are "lost").

Solution
When the write_documents method is called with duplicate_documents="overwrite" and without providing embeddings, the corresponding FAISS vector_ids are recovered and set.

(no changes in update_embeddings)
Then, if update_embeddings is called with update_existing_embeddings=False, the existing vector_ids are available. Otherwise, the FAISS index is regenerated.

How did you test it?

CI, new integration test

Notes for the reviewer

I also refactored the write_documents method a bit.

Checklist

@coveralls
Copy link
Collaborator

coveralls commented Jun 28, 2023

Pull Request Test Coverage Report for Build 5412737136

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 309 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.05%) to 44.03%

Files with Coverage Reduction New Missed Lines %
document_stores/filter_utils.py 83 41.91%
pipelines/base.py 110 40.61%
document_stores/faiss.py 116 18.68%
Totals Coverage Status
Change from base Build 5391855155: 0.05%
Covered Lines: 10074
Relevant Lines: 22880

💛 - Coveralls

@anakin87 anakin87 changed the title Fix: FAISSDocumentStore - make write_documents properly work in combination update_embeddings Fix: FAISSDocumentStore - make write_documents properly work in combination w update_embeddings Jun 28, 2023
@anakin87 anakin87 marked this pull request as ready for review June 28, 2023 14:20
@anakin87 anakin87 requested a review from a team as a code owner June 28, 2023 14:20
@anakin87 anakin87 requested review from ZanSara and removed request for a team June 28, 2023 14:20
@anakin87 anakin87 self-assigned this Jun 28, 2023
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I would fix the test, if possible, then it's ready to merge 👍

test/document_stores/test_faiss.py Show resolved Hide resolved
haystack/document_stores/faiss.py Outdated Show resolved Hide resolved
haystack/document_stores/faiss.py Outdated Show resolved Hide resolved
@anakin87 anakin87 requested a review from ZanSara June 29, 2023 13:30
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

@anakin87 anakin87 merged commit 1be3936 into main Jul 3, 2023
@anakin87 anakin87 deleted the fix_faiss_update_existing_embeddings branch July 3, 2023 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants