Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector dimensions mismatch in OpenSearch vector store using BedrockEmbeddings #7530

Open
5 tasks done
dixitt5 opened this issue Jan 15, 2025 · 2 comments
Open
5 tasks done
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@dixitt5
Copy link

dixitt5 commented Jan 15, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

        const embeddings = new BedrockEmbeddings({
            region: process.env.AWS_REGION,
            credentials: {
                accessKeyId: process.env.AWS_ACCESS_KEY_ID,
                secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
            },
            model: 'amazon.titan-embed-text-v2:0'
        });

        const vectorStore = new OpenSearchVectorStore(embeddings, {
            client,
            indexName: 'bedrock-knowledge-base-default-index',
            vectorFieldName: 'bedrock-knowledge-base-default-vector',
            service: 'aoss',
            vectorSearchOptions: {
                engine: 'hnsw'
            }
        });

const doc = await vectorStore.similaritySearch('what do you know about poc?', 1);

Error Message and Stack Trace (if applicable)

error ResponseError: search_phase_execution_exception: [query_shard_exception] Reason: failed to create query: Query vector has invalid dimension: 8192. Dimension should be: 1024
    at onBody (/home/usr/Projects/backend/conversation/node_modules/@opensearch-project/opensearch/lib/Transport.js:426:23)
    at IncomingMessage.onEnd (/home/usr/Projects/backend/conversation/node_modules/@opensearch-project/opensearch/lib/Transport.js:341:11)
    at IncomingMessage.emit (node:events:531:35)
    at IncomingMessage.emit (node:domain:488:12)
    at endReadableNT (node:internal/streams/readable:1696:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Description

  • I'm using a vector store that I've created in AWS OpenSearch serverless. It has one index that has below configurations:
- Engine: faiss
- Precision: Binary
- Dimensions: 1024
- Distance Type: hamming
- M: 16
- ef_construction: 16
- ef_search: 512
  • What I want to do is use the vector store instance to add vectors to it and use similarity search to get the relevant documents.

  • While trying to add the vectors, I didn't get any errors, but when checking it in dashboard, no vector is added. Also, while similarity search, an error is thrown, which I've put above (the query embedding is mysteriously 8192 dimensions whereas bedrock's titan model can only create embedding with 1024 dimensions max.)

  • To debug, I tried to create embeddings with just using BedrockEmbeddings, and I verified that the created embedding was 1024 dimensions to be precise.

System Info

Dependencies:
@langchain/openai: >=0.1.0 <0.4.0 js-yaml: ^4.1.0 openapi-types: ^12.1.3 yaml: ^2.2.1
@langchain/textsplitters: >=0.0.0 <0.2.0 jsonpointer: ^5.0.1 p-retry: 4 zod-to-json-schema: ^3.22.3
js-tiktoken: ^1.0.12 langsmith: ^0.2.8 uuid: ^10.0.0 zod: ^3.22.4

Copy link

dosubot bot commented Jan 15, 2025

Hey @dixitt5! I'm here to help you with any bugs, questions, or contributions. Let's get this sorted out together!

The error you're encountering indicates a mismatch between the expected vector dimensions in your OpenSearch index and the dimensions of the vectors being queried. Specifically, your OpenSearch index is configured to expect vectors with 1024 dimensions, but the query vector is being generated with 8192 dimensions.

Here are a few steps to help resolve this issue:

  1. Verify Embedding Dimensions: Ensure that the BedrockEmbeddings model you are using (amazon.titan-embed-text-v2:0) is indeed configured to produce embeddings with 1024 dimensions. You mentioned that you verified this, so it seems correct.

  2. Check VectorStore Configuration: Double-check the configuration of your OpenSearchVectorStore to ensure that it is set up to handle vectors with 1024 dimensions. The dimension should be specified correctly when creating the index in OpenSearch.

  3. Debugging the Embedding Process: Since you verified that the embeddings are 1024 dimensions when created directly, there might be an issue in how the embeddings are being passed to the OpenSearchVectorStore. Ensure that the embeddings are not being altered or transformed in a way that changes their dimensions before being added to the vector store.

  4. Review the Code for Dimension Handling: Look into the code where the embeddings are being added to the vector store and queried. Ensure that there are no transformations or operations that might inadvertently change the dimensions of the vectors.

  5. Check for Updates or Known Issues: Ensure that you are using the latest version of LangChain.js and check if there are any known issues related to dimension handling with the OpenSearchVectorStore and BedrockEmbeddings.

If the issue persists, you might want to add logging or debugging statements to trace the dimensions of the vectors at different stages in your code to pinpoint where the discrepancy occurs.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dixitt5
Copy link
Author

dixitt5 commented Jan 15, 2025

@jacoblee93 @bracesproul can you guys please look into this, that would be really helpful!

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant