fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug #5584

cnellington · 2023-06-01T23:21:18Z

Chroma update_document full document embeddings bugfix

Chroma update_document takes a single document, but treats the page_content sting of that document as a list when getting the new document embedding.

This is a two-fold problem, where the resulting embedding for the updated document is incorrect (it's only an embedding of the first character in the new page_content) and it calls the embedding function for every character in the new page_content string, using many tokens in the process.

Fixes #5582

Before submitting

Who can review?

Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:

Tagging @dev2049 for vectorstore bugfix

…er-wise embedding bug

dev2049

great catch!

tests/integration_tests/vectorstores/test_chroma.py

langchain/vectorstores/chroma.py

dev2049 · 2023-06-02T01:10:43Z

thanks @cnellington!

…gs and new content embeddings

…r-wise embedding bug (langchain-ai#5584) # Chroma update_document full document embeddings bugfix Chroma update_document takes a single document, but treats the page_content sting of that document as a list when getting the new document embedding. This is a two-fold problem, where the resulting embedding for the updated document is incorrect (it's only an embedding of the first character in the new page_content) and it calls the embedding function for every character in the new page_content string, using many tokens in the process. Fixes langchain-ai#5582 Co-authored-by: Caleb Ellington <[email protected]>

Caleb Ellington added 2 commits June 1, 2023 16:18

fix chroma update_document to embed entire documents, fixes a charact…

5b3e93a

…er-wise embedding bug

added integration tests for update_document bug

ae55c34

dev2049 reviewed Jun 1, 2023

View reviewed changes

tests/integration_tests/vectorstores/test_chroma.py Outdated Show resolved Hide resolved

Caleb Ellington added 2 commits June 1, 2023 16:34

fix test typo

b44848c

fix 2nd test typo

db5fd83

dev2049 reviewed Jun 1, 2023

View reviewed changes

langchain/vectorstores/chroma.py Show resolved Hide resolved

remove unnecessary list transform of new embeddings

fa8b54e

cnellington added 3 commits June 1, 2023 20:57

make embeddings consistent to test consistency between saved embeddin…

6e39f1f

…gs and new content embeddings

black format

0bdf198

ruff formatting

394e095

dev2049 merged commit c5a7a85 into langchain-ai:master Jun 2, 2023

danielchalef mentioned this pull request Jun 5, 2023

Zep Hybrid Search #5742

Merged

This was referenced Jun 25, 2023

Zep Authentication #6725

Closed

Zep Authentication #6728

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug #5584

fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug #5584

cnellington commented Jun 1, 2023 •

edited

Loading

dev2049 left a comment

dev2049 commented Jun 2, 2023

fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug #5584

fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug #5584

Conversation

cnellington commented Jun 1, 2023 • edited Loading

Chroma update_document full document embeddings bugfix

Before submitting

Who can review?

dev2049 left a comment

Choose a reason for hiding this comment

dev2049 commented Jun 2, 2023

cnellington commented Jun 1, 2023 •

edited

Loading