You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the WeaviateEmbeddingRetriever to work with the data.
It works fine with the default class in weaviate.
Once I change it to the data class created by myself with customized schema, I got the issue as below:
File "/root/TS_ph3/00_WeaviateEmbeddingRetriever.py", line 70, in <module>
result = query_pipeline.run({"text_embedder": {"text": query}})
File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack/core/pipeline/pipeline.py", line 229, in run
res: Dict[str, Any] = self._run_component(name, components_inputs[name])
File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack/core/pipeline/pipeline.py", line 67, in _run_component
res: Dict[str, Any] = instance.run(**inputs)
File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/components/retrievers/weaviate/embedding_retriever.py", line 138, in run
documents = self._document_store._embedding_retrieval(
File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/document_stores/weaviate/document_store.py", line 538, in _embedding_retrieval
return [self._to_document(doc) for doc in result.objects]
File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/document_stores/weaviate/document_store.py", line 538, in <listcomp>
return [self._to_document(doc) for doc in result.objects]
File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/document_stores/weaviate/document_store.py", line 306, in _to_document
document_data["id"] = document_data.pop("_original_id")
KeyError: '_original_id'
I check the codes and find that the predefined function need to get data of _original_id and set it as the Document ID.
I have updated the codes in document_store.py and set set document_data["id"] as hardcode.
In this case, the expected results are shown.
I do not think this should be the correct way to handle this but I am confused if the _original_id is required in the weaviate scheme?
I have checked with others and the response is that document id is removed from 2.x and it is not required.
I am not sure which should be the correct way to handle this.
Could someone kindly help on this point?
The packages I am using are:
haystack-ai = "2.6.1"
fastembed-haystack = "1.3.0"
weaviate-client = "^4.9.0"
weaviate-haystack = "^4.0.0"
def_to_document(self, data: DataObject[Dict[str, Any], None]) ->Document:
""" Converts a data object read from Weaviate into a Document. """document_data=data.properties# The error is raise here and I just set document_data["id"] as hardcode.document_data["id"] =document_data.pop("_original_id")
ifisinstance(data.vector, List):
document_data["embedding"] =data.vectorelifisinstance(data.vector, Dict):
document_data["embedding"] =data.vector.get("default")
else:
document_data["embedding"] =Noneif (blob_data:=document_data.get("blob_data")) isnotNone:
document_data["blob"] = {
"data": base64.b64decode(blob_data),
"mime_type": document_data.get("blob_mime_type"),
}
# We always delete these fields as they're not part of the Document dataclass
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am using the WeaviateEmbeddingRetriever to work with the data.
It works fine with the default class in weaviate.
Once I change it to the data class created by myself with customized schema, I got the issue as below:
I check the codes and find that the predefined function need to get data of _original_id and set it as the Document ID.
I have updated the codes in document_store.py and set set document_data["id"] as hardcode.
In this case, the expected results are shown.
I do not think this should be the correct way to handle this but I am confused if the _original_id is required in the weaviate scheme?
I have checked with others and the response is that document id is removed from 2.x and it is not required.
I am not sure which should be the correct way to handle this.
Could someone kindly help on this point?
The packages I am using are:
haystack-ai = "2.6.1"
fastembed-haystack = "1.3.0"
weaviate-client = "^4.9.0"
weaviate-haystack = "^4.0.0"
Beta Was this translation helpful? Give feedback.
All reactions