Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend metadata filtering support in ElasticsearchDocumentStore #2108

Merged
merged 31 commits into from
Feb 4, 2022

Conversation

bogdankostic
Copy link
Contributor

@bogdankostic bogdankostic commented Feb 2, 2022

This PR allows users to apply a wide variety of filters on Documents inside their ElasticsearchDocumentStore.

Filters are defined as nested dictionaries. The keys of the dictionaries can be a logical operator ("$and", "$or", "$not"), a comparison operator ("$eq", "$in", "$gt", "$gte", "$lt", "$lte") or a metadata field name.

Logical operator keys take a dictionary of metadata field names and/or logical operators as value. Metadata field names take a dictionary of comparison operators as value. Comparison operator keys take a single value or (in case of "$in") a list of values as value.

If no logical operator is provided, "$and" is used as default operation.
If no comparison operator is provided, "$eq" (or "$in" if the comparison value is a list) is used as default operation. Therefore, we don't add breaking changes.

The tests pass when using an OpenSearchDocumentStore as well, so this filtering functionality can als be used right away with OpenSearch.

Example:

filters = {
    "$and": {
        "type": {"$eq": "article"},
        "date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
        "rating": {"$gte": 3},
        "$or": {
            "genre": {"$in": ["economy", "politics"]},
            "publisher": {"$eq": "nytimes"}
        }
    }
}
# or simpler using default operators
filters = {
    "type": "article",
    "date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
    "rating": {"$gte": 3},
    "$or": {
        "genre": ["economy", "politics"],
        "publisher": "nytimes"
    }
}

Open To-Do DONE

With the current implementation, we can't use the same logical operator twice on the same level, i.e., the following filter is not valid as dictionary keys have to be unique:

{"$or":
    {"$and": {
         "Type": "News Paper",
         "Date": {"$lt": "2019-01-01"},
     "$and": {
         "Type": "Blog post",
         "Date": {"$gte": "2019-01-01"}
    }
}

To be able to handle the above mentioned case, we will allow logical operators to optionally take a list of dictionaries as value, such that the above filter would like this:

{"$or": [
    {"$and": {
        "Type": "News Paper",
        "Date": {"$lt": "2019-01-01}"
    }},
    "$and": {
        "Type": "Blog post",
        "Date": {"$gte": "2019-01-01"}
    }}   

]}

(Thanks to @MichelBartels for spotting this!)

Closes #2029

@bogdankostic bogdankostic requested a review from tholor February 3, 2022 16:13
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Left a few minor comments.

haystack/document_stores/elasticsearch.py Show resolved Hide resolved
haystack/document_stores/filter_utils.py Show resolved Hide resolved
return cls(conditions)

@abstractmethod
def convert_to_elasticsearch(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably I missed this one, but where does this method get implemented? I saw it being called in the ElasticsearchDocumentStore...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method gets implemented in each of the LogicalOperation and FilterOperation classes, such that each operation is responsible for converting their part of the whole filter clause.

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add some labels to the PR before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend filter functionality for ElasticsearchDocumentStore
2 participants