-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]Efficient filtering on parent document with nested field #1356
Comments
Will try to reproduce the issue and see what is causing this behavior |
Was able to reproduce the issue and will add more details soon on the issue. |
Root Cause AnalysisTo understand why we were seeing the above mentioned behavior we first need to understand how nested fields are indexed and nested queries work in Opensearch/Lucene. The way Opensearch treats documents with nested field is, main document is broken in 2 parts, parent document and child documents. The parent document contains all the top level fields and nested fields are created as child documents. Now, during the query execution if Opensearch identifies that this query is for child documents or this query may match the child documents Opensearch wraps the whole query in ToParentBlockJoinQuery. Ref1, Ref2 In efficient filtering(for both Lucene and Faiss), to get the filtered Ids, we create a new Filter query which has 2 conditions:
Hence, when the updated filter query is run, the condition 1 will fail because vector field doesn’t exist on the parent documents as it in the child documents. But when the user provided filter query is on nested documents then user provided query gets converted to ToParentBlockJoinQuery which ensures that right filtered documents are returned for doing further vector search. SolutionThe solution that we will be moving towards is: We will identify if vector field provided in the query is nested or not.
PR: #1372 Test PlanTo make sure that changes are BWC and all the different permutation and combinations are take care we should test all these cases:
Case 1
Case 2
Case 3
Case 4
Case 5
Case 6
Case 7
Case 8
Case 9
Case 10
Case 11
Case 12
Case 13
Case 14
|
Resolving this issue as the feature is merged and will be release in 2.12 |
Is your feature request related to a problem?
If efficient filter runs with nested field, the filter is applied for nested field but not parent field. For the filtering, most use cases are with parent field but not nested field.
What solution would you like?
I would like to filter the document on field in parent doc but not field in nested field.
What alternatives have you considered?
Post filtering on parent document
Do you have any additional context?
Create knn index with nested field.
Ingest sample data
Filter on field inside nested field.
Response
Filter on field in top level
Response
The text was updated successfully, but these errors were encountered: