Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'Field "metadata" does not exist in schema' #27005

Open
5 tasks done
Jeferson100 opened this issue Sep 30, 2024 · 4 comments
Open
5 tasks done

KeyError: 'Field "metadata" does not exist in schema' #27005

Jeferson100 opened this issue Sep 30, 2024 · 4 comments
Labels
investigate Flagged for investigation. Ɑ: vector store Related to vector store module

Comments

@Jeferson100
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

The DataFrame is in the following format.
data_frame

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri,)
try:
    table = db.create_table("movies",data=md_final)
except:
    table = db.open_table("movies")    
from langchain.vectorstores import LanceDB
from langchain.chains import RetrievalQA
import os
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

docsearch = LanceDB(connection = db, 
                    embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2"), 
                    table_name="movies",
                    vector_key="vector",
                    id_key='title',
                    text_key='text',
                    uri=uri,
          
                    )

The error occurs in the following code.

query = "I'm looking for an animated action movie. What could you suggest to me?"
docs = docsearch.similarity_search(query) 

Error Message and Stack Trace (if applicable)

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\langchain_community\vectorstores\lancedb.py:524, in LanceDB.similarity_search(self, query, k, name, filter, fts, **kwargs)
500 def similarity_search(
501 self,
502 query: str,
(...)
507 **kwargs: Any,
508 ) -> List[Document]:
509 """Return documents most similar to the query
510
511 Args:
(...)
522 List of documents most similar to the query.
523 """
--> 524 res = self.similarity_search_with_score(
525 query=query, k=k, name=name, filter=filter, fts=fts, score=False, **kwargs
526 )
527 return res

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\langchain_community\vectorstores\lancedb.py:498, in LanceDB.similarity_search_with_score(self, query, k, filter, **kwargs)
496 embedding = self._embedding.embed_query(query)
497 res = self._query(embedding, k, filter=filter, **kwargs)
--> 498 return self.results_to_docs(res, score=score)

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\langchain_community\vectorstores\lancedb.py:155, in LanceDB.results_to_docs(self, results, score)
152 score_col = None
154 if score_col is None or not score:
--> 155 return [
156 Document(
157 page_content=results[self._text_key][idx].as_py(),
158 metadata=results["metadata"][idx].as_py(),
159 )
160 for idx in range(len(results))
161 ]
162 elif score_col and score:
163 return [
164 (
165 Document(
(...)
171 for idx in range(len(results))
172 ]

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\langchain_community\vectorstores\lancedb.py:158, in (.0)
152 score_col = None
154 if score_col is None or not score:
155 return [
156 Document(
157 page_content=results[self._text_key][idx].as_py(),
--> 158 metadata=results["metadata"][idx].as_py(),
159 )
160 for idx in range(len(results))
161 ]
162 elif score_col and score:
163 return [
164 (
165 Document(
(...)
171 for idx in range(len(results))
172 ]

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\pyarrow\table.pxi:1539, in pyarrow.lib._Tabular.getitem()

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\pyarrow\table.pxi:1625, in pyarrow.lib._Tabular.column()

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications.conda\Lib\site-packages\pyarrow\table.pxi:1561, in pyarrow.lib._Tabular._ensure_integer_index()

KeyError: 'Field "metadata" does not exist in schema'

Description

I connected to the LanceDB database and tried to create a table called "movies" using the data from the md_final DataFrame. If the table already existed, I opened it. Then, I initialized an instance of LanceDB with the "sentence-transformers/all-MiniLM-L6-v2" embeddings model, configuring the vector key, ID key, and text key options. I defined a text query and called the similarity_search method to obtain the relevant documents.

System Info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.22631
> Python Version:  3.11.9 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:40:41) [MSC v.1916 64 bit (AMD64)]

Package Information
-------------------
> langchain_core: 0.3.6
> langchain: 0.3.1
> langchain_community: 0.2.17
> langsmith: 0.1.129
> langchain_chroma: 0.1.4
> langchain_groq: 0.2.0
> langchain_huggingface: 0.1.0
> langchain_mistralai: 0.2.0
> langchain_openai: 0.2.1
> langchain_text_splitters: 0.3.0

Optional packages not installed
-------------------------------
> langgraph
> langserve

Other Dependencies
------------------
> aiohttp: 3.10.5
> async-timeout: Installed. No version info available.
> chromadb: 0.5.3
> dataclasses-json: 0.6.7
> fastapi: 0.115.0
> groq: 0.11.0
> httpx: 0.27.2
> httpx-sse: 0.4.0
> huggingface-hub: 0.24.7
> jsonpatch: 1.33
> numpy: 1.26.4
> openai: 0.28.0
> orjson: 3.10.7
> packaging: 23.2
> pydantic: 2.9.1
> PyYAML: 6.0.2
> requests: 2.32.3
> sentence-transformers: 3.1.1
> SQLAlchemy: 2.0.35
> tenacity: 8.5.0
> tiktoken: 0.7.0
> tokenizers: 0.19.1
> transformers: 4.44.2
> typing-extensions: 4.12.2
@langcarl langcarl bot added the investigate Flagged for investigation. label Sep 30, 2024
@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Sep 30, 2024
@biniyam69
Copy link

Hey can you give me a link to the dataframe that you are working with? I am trying to replicate your code to try and debug the issue

@Jeferson100
Copy link
Author

Codigo
Dados

@Jeferson100
Copy link
Author

I think the problem might be in _query

 def _query(
        self,
        query: Any,
        k: Optional[int] = None,
        filter: Optional[Any] = None,
        name: Optional[str] = None,
        **kwargs: Any,
    ) -> Any:
        if k is None:
            k = self.limit
        tbl = self.get_table(name)
        if isinstance(filter, dict):
            filter = to_lance_filter(filter)

        prefilter = kwargs.get("prefilter", False)
        query_type = kwargs.get("query_type", "vector")

        if metrics := kwargs.get("metrics"):
            lance_query = (
                tbl.search(query=query, vector_column_name=self._vector_key)
                .limit(k)
                .metric(metrics)
                .where(filter, prefilter=prefilter)
            )
        else:
            lance_query = (
                tbl.search(query=query, vector_column_name=self._vector_key)
                .limit(k)
                .where(filter, prefilter=prefilter)
            )
        if query_type == "hybrid" and self._reranker is not None:
            lance_query.rerank(reranker=self._reranker)

        docs = lance_query.to_arrow()
        if len(docs) == 0:
            warnings.warn("No results found for the query.")
        return docs
        

The error that occurs when using the function

tbl = docsearch.get_table(name="movies")
docsearch._query(query="I'm looking for an animated action movie. What could you suggest to me?", name="movies")
ValueError                                Traceback (most recent call last)
Cell In[32], [line 2](vscode-notebook-cell:?execution_count=32&line=2)
      [1](vscode-notebook-cell:?execution_count=32&line=1) tbl = docsearch.get_table(name="movies")
----> [2](vscode-notebook-cell:?execution_count=32&line=2) docsearch._query(query="I'm looking for an animated action movie. What could you suggest to me?", name="movies")

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications\.conda\Lib\site-packages\langchain_community\vectorstores\lancedb.py:391, in LanceDB._query(self, query, k, filter, name, **kwargs)
    [388](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/langchain_community/vectorstores/lancedb.py:388) if query_type == "hybrid" and self._reranker is not None:
    [389](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/langchain_community/vectorstores/lancedb.py:389)     lance_query.rerank(reranker=self._reranker)
--> [391](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/langchain_community/vectorstores/lancedb.py:391) docs = lance_query.to_arrow()
    [392](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/langchain_community/vectorstores/lancedb.py:392) if len(docs) == 0:
    [393](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/langchain_community/vectorstores/lancedb.py:393)     warnings.warn("No results found for the query.")

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications\.conda\Lib\site-packages\lancedb\query.py:810, in LanceFtsQueryBuilder.to_arrow(self)
    [793](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:793)     raise NotImplementedError(
    [794](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:794)         "Phrase query is not yet supported in Lance FTS. "
    [795](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:795)         "Use tantivy-based index instead for now."
    [796](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:796)     )
    [797](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:797) query = Query(
    [798](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:798)     columns=self._columns,
    [799](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:799)     filter=self._where,
   (...)
    [808](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:808)     offset=self._offset,
    [809](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:809) )
--> [810](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:810) results = self._table._execute_query(query)
    [811](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:811) results = results.read_all()
    [812](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/query.py:812) if self._reranker is not None:

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications\.conda\Lib\site-packages\lancedb\table.py:1752, in LanceTable._execute_query(self, query, batch_size)
   [1733](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1733) if len(query.vector) > 0:
   [1734](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1734)     nearest = {
   [1735](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1735)         "column": query.vector_column,
   [1736](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1736)         "q": query.vector,
   (...)
   [1740](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1740)         "refine_factor": query.refine_factor,
   [1741](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1741)     }
   [1742](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1742) return ds.scanner(
   [1743](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1743)     columns=query.columns,
   [1744](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1744)     limit=query.k,
   [1745](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1745)     filter=query.filter,
   [1746](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1746)     prefilter=query.prefilter,
   [1747](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1747)     nearest=nearest,
   [1748](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1748)     full_text_query=query.full_text_query,
   [1749](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1749)     with_row_id=query.with_row_id,
   [1750](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1750)     batch_size=batch_size,
   [1751](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1751)     offset=query.offset,
-> [1752](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lancedb/table.py:1752) ).to_reader()

File c:\Users\jefer\Documents\Livros\LLMs\BUILDING_LLM_POWERED_APPLICATIONS\Building-LLM-Powered-Applications\.conda\Lib\site-packages\lance\dataset.py:2540, in LanceScanner.to_reader(self)
   [2539](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lance/dataset.py:2539) def to_reader(self) -> pa.RecordBatchReader:
-> [2540](file:///C:/Users/jefer/Documents/Livros/LLMs/BUILDING_LLM_POWERED_APPLICATIONS/Building-LLM-Powered-Applications/.conda/Lib/site-packages/lance/dataset.py:2540)     return self._scanner.to_pyarrow()

ValueError: LanceError(IO): Full text search supports only one column right now, but got 0 columns, D:\a\lance\lance\rust\lance\src\dataset\scanner.rs:1147:17

@mori-dev
Copy link

In my case, I solved the problem with the following workaround on the app side.

from pydantic import BaseModel
from lancedb.pydantic import LanceModel, Vector
from datetime import datetime

class Metadata(BaseModel):
    key1: Optional[str] = ""
    key2: Optional[str] = ""

class LanceSchema(LanceModel):
    id: str
    vector: Vector(1536)
    source: str
    timestamp: datetime
    metadata: Metadata = Metadata()


local_db.create_table(table_name, filteredData, schema=LanceSchema, mode="overwrite")

ccurme added a commit that referenced this issue Dec 18, 2024
- [ x ] Fix when lancedb return table without metadata column
- **Description:** Check the table schema, if not has metadata column,
init the Document with metadata argument equal to empty dict
    - **Issue:** #27005

- [ x ] **Add tests and docs**

---------

Co-authored-by: ccurme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Flagged for investigation. Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

3 participants