Error when doing similaritysearch with libsql and langchain #7553

binaryArrow · 2025-01-18T15:15:37Z

Discussed in #7531

^{Originally posted by binaryArrow January 15, 2025}

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

import {createClient} from "@libsql/client";
import path from "node:path";
import {LlamaCppEmbeddings} from "@langchain/community/embeddings/llama_cpp";
import {join} from "path";
import {PDFLoader} from "@langchain/community/document_loaders/fs/pdf";
import {RecursiveCharacterTextSplitter} from "@langchain/textsplitters";
import {Document} from '@langchain/core/documents';
import {fileURLToPath} from "node:url";
import {LibSQLVectorStore} from "@langchain/community/vectorstores/libsql";

// we need this function to pad the embeddings to the correct size, somehow vectorstres addDocument function does not do this
function padOrTruncateEmbedding(embedding: number[], targetSize: number): number[] {
    if (embedding.length < targetSize) {
        return [...embedding, ...new Array(targetSize - embedding.length).fill(0)];
    }
    return embedding;
}
// create DB and indexes, the vector size for llamacpp is 4096
const client = createClient({
    url: 'file:local.db'
});
await client.batch(
    [
        `CREATE TABLE IF NOT EXISTS vectors
         (
             id        INTEGER PRIMARY KEY AUTOINCREMENT,
             content   TEXT,
             metadata  TEXT,
             embedding F32_BLOB(4096)
         );`,
        `CREATE INDEX IF NOT EXISTS vector_idx ON vectors (libsql_vector_idx(embedding));`,
    ],
    'write'
);

// create embeddingmodel
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const llamaEmbeddings = await LlamaCppEmbeddings.initialize({
    modelPath: path.join(join(__dirname, '/llm-model'), 'Mistral-7B-Instruct-v0.3.Q4_K_M.gguf')
});

// get pdfloader and split the pdf into chunks
const loader = new PDFLoader('testpdf');
const docs = await loader.load();
const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 2000,
    chunkOverlap: 500
});
const splits: Document[] = await textSplitter.splitDocuments(docs);
console.log('Splits:', splits.length);

// create the vectorstore and embed the document
const testVectorStore = new LibSQLVectorStore(llamaEmbeddings, {
    db: client,
});
const embeddings = await llamaEmbeddings.embedDocuments(splits.map((doc) => doc.pageContent));
embeddings.forEach((embedding, i) => {
    embeddings[i] = padOrTruncateEmbedding(embedding, 4096);
});
await testVectorStore.addVectors(embeddings, splits);
const question = 'give me all words in the documents that relate to the word drugs'
console.log('Retrieving documents for question:', question);
const retrievedDocs = await testVectorStore.similaritySearch(question);
console.log(retrievedDocs)

Description

I am trying to use Libsql vectorstore to embed pdf content into it and do a RAG on the documents. It fails to even do the similaritysearch. I get the following Error: Error: vector index(search): failed to parse vector index parameters.
For the code to work you need the LLM Model, i opened a repo reproducing this error https://github.com/binaryArrow/libsql-index-error-reproduction.
The code does create the database and creates the embeddings, but using it fails.
My resources i used are:
https://js.langchain.com/docs/integrations/vectorstores/libsql/
https://docs.turso.tech/features/ai-and-embeddings#vector-index

System Info

"@langchain/community": "^0.3.24",
"@langchain/core": "^0.3.29",
"@libsql/client": "^0.14.0",
"node-llama-cpp": "^3.4.0",
"pdf-parse": "^1.1.1"

Node version 20.18.0
System: Windows 10

The text was updated successfully, but these errors were encountered:

dosubot · 2025-01-18T15:20:19Z

A similar discussion regarding the error "vector index(search): failed to parse vector index parameters" when using LibSQL with LangChain can be found here ^[1].

To resolve this error, ensure the following:

Table and Column Setup: Verify that the table and embedding column are correctly defined. The embedding column should match the dimensions of your vector model. For example, if using LlamaCpp embeddings with a vector size of 4096, the column should be defined as F32_BLOB(4096).
```
CREATE TABLE IF NOT EXISTS vectors (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    content TEXT,
    metadata TEXT,
    embedding F32_BLOB(4096)
);
```
Index Creation: Ensure the index on the embedding column is created correctly. The index should use the libsql_vector_idx function.
```
CREATE INDEX IF NOT EXISTS vector_idx ON vectors (libsql_vector_idx(embedding));
```
Configuration: Check that your vector store configuration matches the expected parameters, including the vector size and index usage.

By following these steps, you should be able to resolve the parsing error related to vector index parameters ^[2].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when doing similaritysearch with libsql and langchain #7553

Error when doing similaritysearch with libsql and langchain #7553

binaryArrow commented Jan 18, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

dosubot bot commented Jan 18, 2025

Error when doing similaritysearch with libsql and langchain #7553

Error when doing similaritysearch with libsql and langchain #7553

Comments

binaryArrow commented Jan 18, 2025

Discussed in #7531

Checked other resources

Commit to Help

Example Code

Description

System Info

dosubot bot commented Jan 18, 2025