LlamaCppEmbeddings does not work #3626

lanluu · 2023-12-12T00:53:18Z

Currently, we are employing llamacpp via langchain as documented at https://js.langchain.com/docs/integrations/llms/llama_cpp.

(version: 0.0.204)

It looks like that the embeddings via LlamaCppEmbeddings class (located at: langchain/src/embeddings/llama_cpp.ts) still do not work for us, as the resulted vector's length is quite randomized, for instance:
[
1824, 349, 418,
274, 1402, 28709,
334, 8236, 28828,
28804
]

It is noted that, the different gguf models such as llama-2 or zephyr-ß result the same thing.

Your tips will help us in this case much.

Thank you in advance!

The text was updated successfully, but these errors were encountered:

dosubot · 2023-12-12T00:53:28Z

🤖

Hello @lanluu! 🖐️

I'm Dosu, a friendly bot here to help you out while we wait for a human maintainer. I specialize in solving bugs, answering questions, and even guiding you on your journey to becoming a contributor.

Remember, I'm here to assist you. Let's work together to sort this out! 😊

The vector length appears to be randomized because the LlamaCppEmbeddings class in the langchainjs framework generates embeddings based on the input text. The length of the resulting vector is dependent on the number of tokens in the input text, not on the specific model used (like llama-2 or zephyr-ß).

In the embedDocuments and embedQuery methods, the input text is tokenized using the _context.encode(text) function. This function breaks down the input text into tokens, which are then converted into numerical representations. The number of tokens, and thus the length of the resulting vector, depends on the content and structure of the input text.

Here is the relevant code:

const encodings = await this.caller.call(
  () =>
    new Promise((resolve) => {
      resolve(this._context.encode(text));
    })
);

for (let i = 0; i < encodings.length; i += 1) {
  const token: number = +encodings[i];
  tokens.push(token);
}

In this code, encodings.length determines the length of the resulting vector. This length is not fixed and will vary depending on the input text.

If you need more control over the length of the resulting vector, you might consider implementing a padding mechanism to ensure a consistent vector length, or truncating the vector to a fixed length. However, please note that both of these approaches may impact the quality of the embeddings.

For more details, you can refer to the source code of the LlamaCppEmbeddings class in the langchainjs repository.

I hope this helps! If you have any more questions, feel free to ask.

Best regards,
Dosu

Sources

libs/langchain-community/src/embeddings/llama_cpp.ts

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jacoblee93 · 2023-12-13T00:41:02Z

CC @nigel-daniels any ideas here?

nigel-daniels · 2023-12-13T00:49:01Z

@lanluu do you have some more context on what you are trying to do and how the failure manifests?

AnirudhaGohokar · 2023-12-13T05:46:49Z

I too can confirm this issue. The LlamaCppEmbeddings is generating random length encodings which in turn is giving below error when inserting into any DB:

Error processing the file: ClickHouseError: Constraint cons_vec_len for table default.vector_search (e161893e-d050-4cc8-b412-6f8051dc83df) is violated at row 2. Expression: (length(embedding) = 49). Column values: embedding = [512., 13715., 362., 7569., 3062., 259., 13., 30211., 29871., 29906., 29900., 29906., 29906., 10811., 1693., 616., 512., 1890., 4803., 9333., 363., 4831., 291., 29880., 6897., 259., 13., 29906., 259., 13., 3562., 310., 2866., 1237., 259., 13., 23869., 12267., 259., 13., 29909., 15837., 310., 1857., 319., 29902., 2693., 1860., 259., 13., 1576., 5434., 310., 319., 29902., 259., 13., 1168., 10085., 259., 13., 29896., 29899., 29941., 259., 13., 29941., 259., 13., 29946., 29899., 29953., 259., 13., 29947., 259., 13., 11403., 4251., 259., 13., 29947., 259., 13., 11096., 2708., 393., 674., 337., 7922., 319., 29902., 259., 13., 29896., 29899., 29941., 259., 13., 24445., 800., 310., 1857., 319., 29902., 26811., 335., 29885., 448., 21784., 29257., 259., 13., 29941.]

nigel-daniels · 2023-12-13T19:56:51Z

@AnirudhaGohokar the length of the token array will be approximately relative to the length of the response, therefore this will result in a variable length array. The error you shared looks like the error is coming from a breaking table constraint in ClickHouse. Can you share the code you are using as I don't see why the resulting embedding array would map to a database constraint?

awijshoff · 2023-12-13T20:16:57Z

Hi, I also can confirm this issue. It happens with Faiss and HNSWLib. The code to reproduce is basically the example from the doc-files:

import { FaissStore } from "langchain/vectorstores/faiss";
import { LlamaCppEmbeddings } from "langchain/embeddings/llama_cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; 

const embedding = new LlamaCppEmbeddings({
  modelPath: llamaPath,
  embedding: true,
});

const vectorStore = await FaissStore.fromTexts(
  ["Hello world", "Bye bye", "hello nice world"],
  [{ id: 2 }, { id: 1 }, { id: 3 }],
  embedding
);

const resultOne = await vectorStore.similaritySearch("hello world", 1);

which gives:

For Faiss:

throw new Error(`Query vector must have the same length as the number of dimensions (2)`);
                  ^

Error: Query vector must have the same length as the number of dimensions (2)

or when changing to HNSWLib:

this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 2, but got 4).

nigel-daniels · 2023-12-13T20:36:35Z

Ok got it so the error is the dimension size not the specific array length.

awijshoff · 2023-12-13T21:30:56Z

I think #3509 is related to this. When changing the order of the text (moving "Bye bye" to the first position) HNSWLib returns

this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 4, but got 2).

HNSWLib internally sets this.args.numDimensions to the length of the embeddings of the first element of the passed elements.

    async initIndex(vectors) {
        if (!this._index) {
            if (this.args.numDimensions === undefined) {
                this.args.numDimensions = vectors[0].length;
            }

therefore this never fails for the first element, but in this case fails for the second with an array-lenght of 2. I think other vectorStores work very similar, for example in Faiss:

const dv = vectors[0].length;
        if (!this._index) {
            const { IndexFlatL2 } = await FaissStore.importFaiss();
            this._index = new IndexFlatL2(dv);
        }
        const d = this.index.getDimension();
        if (dv !== d) {
            throw new Error(`Vectors must have the same length as the number of dimensions (${d})`);
        }

Therefore I assume that the vectors/embeddings have to be of equal length for this to work? But that is just my wild guess.

This article supports my theory that every vector/embedding has to have the same length (MongoDB article).

If that is indeed the case, the issue seems to be in node-llama-cpp, because in the following code-example (using node-llama-cpp directly), the val1 val2 and val3 are arrays of unequal length, whereas they should have been arrays of equal lenght - the dimension of the model?

import { LlamaModel, LlamaContext, LlamaChatSession } from "node-llama-cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; //english

const model = new LlamaModel({
  modelPath: llamaPath,
});

const context = new LlamaContext({ model });

const val1 = context.encode("hello nice world");
const val2 = context.encode("Hello world");
const val3 = context.encode("Bye bye");

nigel-daniels · 2023-12-14T19:35:08Z

@awijshoff correct I am just passing the results from node-llama-cpp thru. I've started a discussion withnode-llama-cpp's maintainer on where the fix is best applied.

giladgd · 2023-12-16T13:07:12Z

I haven't added support for embedding in node-llama-cpp yet, but it is planned for the next version that I'm currently working on.

The current implementation for this on Langhcain is wrong and won't work.

I've opened an issue for the embedding feature on node-llama-cpp so you can track its progress (withcatai/node-llama-cpp#123).
I'll also update this issue after I implement the embedding support so we can fix the code on Langchain.

nigel-daniels · 2023-12-19T17:59:21Z

@giladgd thanks for the update, I'll await the node-llama-cpp v3 release then update accordingly.

iOnline247 · 2024-01-08T00:27:29Z

I think #3509 is related to this. When changing the order of the text (moving "Bye bye" to the first position) HNSWLib returns
this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 4, but got 2).
HNSWLib internally sets this.args.numDimensions to the length of the embeddings of the first element of the passed elements.
    async initIndex(vectors) {
        if (!this._index) {
            if (this.args.numDimensions === undefined) {
                this.args.numDimensions = vectors[0].length;
            }
therefore this never fails for the first element, but in this case fails for the second with an array-lenght of 2. I think other vectorStores work very similar, for example in Faiss:
const dv = vectors[0].length;
        if (!this._index) {
            const { IndexFlatL2 } = await FaissStore.importFaiss();
            this._index = new IndexFlatL2(dv);
        }
        const d = this.index.getDimension();
        if (dv !== d) {
            throw new Error(`Vectors must have the same length as the number of dimensions (${d})`);
        }
Therefore I assume that the vectors/embeddings have to be of equal length for this to work? But that is just my wild guess.

This article supports my theory that every vector/embedding has to have the same length (MongoDB article).

If that is indeed the case, the issue seems to be in node-llama-cpp, because in the following code-example (using node-llama-cpp directly), the val1 val2 and val3 are arrays of unequal length, whereas they should have been arrays of equal lenght - the dimension of the model?
import { LlamaModel, LlamaContext, LlamaChatSession } from "node-llama-cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; //english

const model = new LlamaModel({
  modelPath: llamaPath,
});

const context = new LlamaContext({ model });

const val1 = context.encode("hello nice world");
const val2 = context.encode("Hello world");
const val3 = context.encode("Bye bye");

@awijshoff Thanks for detailing this. The issue I created is exactly what you've described and am very excited about the update coming down the pipe for it. :)

stewartoallen · 2024-01-10T22:58:52Z

this is not a bug. the context encode and decode functions are meant to turn text into the tokens actually sent to the llm (it is not sent text), so they will vary with text input length. if your example above happened to match, it's because one or more words used more than one token. it's not 1:1. encode is good for determining if your input text will exceed the context length of the model. for embedding vectors, you will want to use a model designed for generating semantic search vectors. you can get these from Huggingface. in JS you can do something like:

import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import { env } from '@xenova/transformers';

const modelName = "Xenova/all-MiniLM-L12-v2";
const model = new HuggingFaceTransformersEmbeddings({ modelName });
const prompt = "query or prompt to transform into a semantic vector";
const vector = model.embedDocuments([ prompt ]);

in the case of the Xenova model above, you will always get a vector of dimension 384. then for each of your document chunks being stored in a vector db, you will use the same embed call. that way vectors are comparable. the vector db can take it from there.

lanluu · 2024-01-15T10:19:39Z

@giladgd Hi, we are so excited to be able to work with node-llama-cpp v3 👍. Thank you for your effort. Do you have a hint regarding the release date of V3 for us? :-)

giladgd · 2024-01-18T00:10:26Z

@lanluu I predict it'll take about ~1 month to finish the major features that contain breaking changes for the v3 beta, so the stable release would come a bit after that

giladgd · 2024-01-21T00:46:56Z

I've added support for embedding in the beta version of node-llama-cpp (3.0.0-beta.3); you can find a code example of how to use it here.

Note that since this is a beta version, the API may change before the stable release.

nigel-daniels · 2024-01-21T02:40:47Z

@giladgd that's great, thanks. Looking forward to the 3.0.0 release :)

AllenPan03 · 2024-03-19T03:18:53Z

I've added support for embedding in the beta version of node-llama-cpp (3.0.0-beta.3); you can find a code example of how to use it here.

Note that since this is a beta version, the API may change before the stable release.

I used node-llama-cpp (3.0.0-beta.3) with langchain(2.8.8), but still got TypeError: this._context.encode is not a function
at file:///Users/panjunlin/work/github/chat-mind/node_modules/langchain/node_modules/@langchain/community/dist/embeddings/l lama_cpp.js:48:39, is the langchain version wrong?

AllenPan03 · 2024-03-19T03:19:16Z

I also used node-llama-cpp (3.0.0-bet.14) earlier, but found a new error: file:///Users/panjunlin/work/github/chat-mind/node_modules/node-llama-cpp/dist/evaluator/LlamaModel.js:22
constructor({ modelPath, gpuLayers, vocabOnly, useMmap, useMlock, onLoadProgress, loadSignal }, { _llama }) {
^

TypeError: Cannot destructure property '_llama' of 'undefined' as it is undefined.
at LlamaModel (/Users/panjunlin/work/github/chat-mind/node_modules/node-llama-cpp/src/evaluator/LlamaModel.ts:63:9)

PeterTucker · 2024-10-18T17:20:31Z

Cannot destructure property '_llama' of 'undefined' as it is undefined.

Ran into this bug as well, solution was to use node-llama-cpp without langchain.

dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 12, 2023

giladgd mentioned this issue Dec 16, 2023

feat: get embedding for text withcatai/node-llama-cpp#123

Closed

nigel-daniels mentioned this issue Mar 19, 2024

LlamaCpp not working for some models and for Chinese Characters #4812

Closed

5 tasks

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 18, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 25, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaCppEmbeddings does not work #3626

LlamaCppEmbeddings does not work #3626

lanluu commented Dec 12, 2023 •

edited

Loading

dosubot bot commented Dec 12, 2023 •

edited

Loading

jacoblee93 commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

AnirudhaGohokar commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

awijshoff commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

awijshoff commented Dec 13, 2023 •

edited

Loading

nigel-daniels commented Dec 14, 2023

giladgd commented Dec 16, 2023

nigel-daniels commented Dec 19, 2023

iOnline247 commented Jan 8, 2024

stewartoallen commented Jan 10, 2024

lanluu commented Jan 15, 2024

giladgd commented Jan 18, 2024

giladgd commented Jan 21, 2024

nigel-daniels commented Jan 21, 2024

AllenPan03 commented Mar 19, 2024

AllenPan03 commented Mar 19, 2024

PeterTucker commented Oct 18, 2024

LlamaCppEmbeddings does not work #3626

LlamaCppEmbeddings does not work #3626

Comments

lanluu commented Dec 12, 2023 • edited Loading

dosubot bot commented Dec 12, 2023 • edited Loading

Sources

jacoblee93 commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

AnirudhaGohokar commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

awijshoff commented Dec 13, 2023

nigel-daniels commented Dec 13, 2023

awijshoff commented Dec 13, 2023 • edited Loading

nigel-daniels commented Dec 14, 2023

giladgd commented Dec 16, 2023

nigel-daniels commented Dec 19, 2023

iOnline247 commented Jan 8, 2024

stewartoallen commented Jan 10, 2024

lanluu commented Jan 15, 2024

giladgd commented Jan 18, 2024

giladgd commented Jan 21, 2024

nigel-daniels commented Jan 21, 2024

AllenPan03 commented Mar 19, 2024

AllenPan03 commented Mar 19, 2024

PeterTucker commented Oct 18, 2024

lanluu commented Dec 12, 2023 •

edited

Loading

dosubot bot commented Dec 12, 2023 •

edited

Loading

awijshoff commented Dec 13, 2023 •

edited

Loading