Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when doing similaritysearch with libsql and langchain #7553

Open
4 tasks done
binaryArrow opened this issue Jan 18, 2025 · 1 comment
Open
4 tasks done

Error when doing similaritysearch with libsql and langchain #7553

binaryArrow opened this issue Jan 18, 2025 · 1 comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@binaryArrow
Copy link

Discussed in #7531

Originally posted by binaryArrow January 15, 2025

Checked other resources

  • I added a very descriptive title to this question.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.

Commit to Help

  • I commit to help with one of those options 👆

Example Code

import {createClient} from "@libsql/client";
import path from "node:path";
import {LlamaCppEmbeddings} from "@langchain/community/embeddings/llama_cpp";
import {join} from "path";
import {PDFLoader} from "@langchain/community/document_loaders/fs/pdf";
import {RecursiveCharacterTextSplitter} from "@langchain/textsplitters";
import {Document} from '@langchain/core/documents';
import {fileURLToPath} from "node:url";
import {LibSQLVectorStore} from "@langchain/community/vectorstores/libsql";

// we need this function to pad the embeddings to the correct size, somehow vectorstres addDocument function does not do this
function padOrTruncateEmbedding(embedding: number[], targetSize: number): number[] {
    if (embedding.length < targetSize) {
        return [...embedding, ...new Array(targetSize - embedding.length).fill(0)];
    }
    return embedding;
}
// create DB and indexes, the vector size for llamacpp is 4096
const client = createClient({
    url: 'file:local.db'
});
await client.batch(
    [
        `CREATE TABLE IF NOT EXISTS vectors
         (
             id        INTEGER PRIMARY KEY AUTOINCREMENT,
             content   TEXT,
             metadata  TEXT,
             embedding F32_BLOB(4096)
         );`,
        `CREATE INDEX IF NOT EXISTS vector_idx ON vectors (libsql_vector_idx(embedding));`,
    ],
    'write'
);

// create embeddingmodel
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const llamaEmbeddings = await LlamaCppEmbeddings.initialize({
    modelPath: path.join(join(__dirname, '/llm-model'), 'Mistral-7B-Instruct-v0.3.Q4_K_M.gguf')
});

// get pdfloader and split the pdf into chunks
const loader = new PDFLoader('testpdf');
const docs = await loader.load();
const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 2000,
    chunkOverlap: 500
});
const splits: Document[] = await textSplitter.splitDocuments(docs);
console.log('Splits:', splits.length);

// create the vectorstore and embed the document
const testVectorStore = new LibSQLVectorStore(llamaEmbeddings, {
    db: client,
});
const embeddings = await llamaEmbeddings.embedDocuments(splits.map((doc) => doc.pageContent));
embeddings.forEach((embedding, i) => {
    embeddings[i] = padOrTruncateEmbedding(embedding, 4096);
});
await testVectorStore.addVectors(embeddings, splits);
const question = 'give me all words in the documents that relate to the word drugs'
console.log('Retrieving documents for question:', question);
const retrievedDocs = await testVectorStore.similaritySearch(question);
console.log(retrievedDocs)

Description

I am trying to use Libsql vectorstore to embed pdf content into it and do a RAG on the documents. It fails to even do the similaritysearch. I get the following Error: Error: vector index(search): failed to parse vector index parameters.
For the code to work you need the LLM Model, i opened a repo reproducing this error https://github.com/binaryArrow/libsql-index-error-reproduction.
The code does create the database and creates the embeddings, but using it fails.
My resources i used are:
https://js.langchain.com/docs/integrations/vectorstores/libsql/
https://docs.turso.tech/features/ai-and-embeddings#vector-index

System Info

"@langchain/community": "^0.3.24",
"@langchain/core": "^0.3.29",
"@libsql/client": "^0.14.0",
"node-llama-cpp": "^3.4.0",
"pdf-parse": "^1.1.1"

Node version 20.18.0
System: Windows 10
@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 18, 2025
Copy link

dosubot bot commented Jan 18, 2025

A similar discussion regarding the error "vector index(search): failed to parse vector index parameters" when using LibSQL with LangChain can be found here [1].

To resolve this error, ensure the following:

  1. Table and Column Setup: Verify that the table and embedding column are correctly defined. The embedding column should match the dimensions of your vector model. For example, if using LlamaCpp embeddings with a vector size of 4096, the column should be defined as F32_BLOB(4096).

    CREATE TABLE IF NOT EXISTS vectors (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        content TEXT,
        metadata TEXT,
        embedding F32_BLOB(4096)
    );
  2. Index Creation: Ensure the index on the embedding column is created correctly. The index should use the libsql_vector_idx function.

    CREATE INDEX IF NOT EXISTS vector_idx ON vectors (libsql_vector_idx(embedding));
  3. Configuration: Check that your vector store configuration matches the expected parameters, including the vector size and index usage.

By following these steps, you should be able to resolve the parsing error related to vector index parameters [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant