Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGVector deprecated create_filter_clause uses incorrect method. Can't filter by OR or AND conditions #20445

Closed
5 tasks done
MartinGotelli opened this issue Apr 14, 2024 · 0 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: postgres Related to postgres integrations Ɑ: vector store Related to vector store module

Comments

@MartinGotelli
Copy link
Contributor

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import os
os.environ["TEST_PGVECTOR_PORT"] = "5432"
os.environ["TEST_PGVECTOR_DATABASE"] = "langchain_tests"
os.environ["TEST_PGVECTOR_USER"] = "postgres"
os.environ["TEST_PGVECTOR_PASSWORD"] = "postgres"


import os
from typing import List

from langchain_core.embeddings import Embeddings
from langchain_community.vectorstores.pgvector import PGVector


class FakeEmbeddings(Embeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [[float(1.0)] * 9 + [float(i)] for i in range(len(texts))]
    def embed_query(self, text: str) -> List[float]:
        return [float(1.0)] * 9 + [float(0.0)]

CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver=os.environ.get("TEST_PGVECTOR_DRIVER", "psycopg2"),
    host=os.environ.get("TEST_PGVECTOR_HOST", "localhost"),
    port=int(os.environ.get("TEST_PGVECTOR_PORT", "6024")),
    database=os.environ.get("TEST_PGVECTOR_DATABASE", "langchain"),
    user=os.environ.get("TEST_PGVECTOR_USER", "langchain"),
    password=os.environ.get("TEST_PGVECTOR_PASSWORD", "langchain"),
)

texts = ["foo", "bar", "baz"]
metadatas = [{"page": str(i)} for i in range(len(texts))]
docsearch = PGVector.from_texts(
    texts=texts,
    collection_name="test_collection_filter",
    embedding=FakeEmbeddings(),
    metadatas=metadatas,
    connection_string=CONNECTION_STRING,
    pre_delete_collection=True,
)

output = docsearch.similarity_search_with_score(
    "foo", k=3, filter={"page": {"OR": [{"EQ": "0"}, {"EQ": "2"}]}}
)

# Alternatively:
output = docsearch.similarity_search_with_score(
    "foo", k=3, filter={"page": {"AND": [{"IN": ["0","1"]}, {"NIN": ["1"]}]}}
)

This raises:

TypeError: PGVector._create_filter_clause() takes 2 positional arguments but 3 were given

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mferenaz/Projects/personal/langchain/libs/community/langchain_community/vectorstores/pgvector.py", line 601, in similarity_search_with_score
    docs = self.similarity_search_with_score_by_vector(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mferenaz/Projects/personal/langchain/libs/community/langchain_community/vectorstores/pgvector.py", line 626, in similarity_search_with_score_by_vector
    results = self.__query_collection(embedding=embedding, k=k, filter=filter)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mferenaz/Projects/personal/langchain/libs/community/langchain_community/vectorstores/pgvector.py", line 945, in __query_collection
    filter_clauses = self._create_filter_clause_json_deprecated(filter)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mferenaz/Projects/personal/langchain/libs/community/langchain_community/vectorstores/pgvector.py", line 828, in _create_filter_clause_json_deprecated
    filter_by_metadata = self._create_filter_clause_deprecated(key, value)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mferenaz/Projects/personal/langchain/libs/community/langchain_community/vectorstores/pgvector.py", line 797, in _create_filter_clause_deprecated
    or_clauses = [
                 ^
  File "/Users/mferenaz/Projects/personal/langchain/libs/community/langchain_community/vectorstores/pgvector.py", line 798, in <listcomp>
    self._create_filter_clause(key, sub_value)
TypeError: PGVector._create_filter_clause() takes 2 positional arguments but 3 were given

Description

While using PGVector without the newer jsonb format one can filter using the _create_filter_clause_deprecated method. That allows filtering with EQ, NIN, IN, etc. but also with OR and AND, the problem with this two is that the code should be calling itself but it calls the newer _create_filter_clause that does not support the same method signature and raises an error.

Expected Result:
When the PGVector is created without the usejsonb flag in True, should allow to filter with OR and AND conditions

Actual Result:
Adding an OR or AND filter without the usejsonb flag ends in an error raising

System Info

langchain==0.1.14
langchain-community==0.0.31
langchain-core==0.1.40
langchain-openai==0.0.8
langchain-text-splitters==0.0.1

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🔌: postgres Related to postgres integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Apr 14, 2024
eyurtsev pushed a commit that referenced this issue Apr 16, 2024
… and AND conditions (#20446)

**Description**: Support filter by OR and AND for deprecated PGVector
version
**Issue**: #20445 
**Dependencies**: N/A
**Twitter** handle: @martinferenaz
hinthornw pushed a commit that referenced this issue Apr 26, 2024
… and AND conditions (#20446)

**Description**: Support filter by OR and AND for deprecated PGVector
version
**Issue**: #20445 
**Dependencies**: N/A
**Twitter** handle: @martinferenaz
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 14, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: postgres Related to postgres integrations Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant