-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k-NN Faiss filtering documentation #4476
Conversation
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@@ -11,12 +11,24 @@ has_math: true | |||
|
|||
To refine k-NN results, you can filter a k-NN search using one of the following methods: | |||
|
|||
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets. | |||
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines: | |
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned(if present). This approach is supported by the following engines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded.
@@ -11,12 +11,24 @@ has_math: true | |||
|
|||
To refine k-NN results, you can filter a k-NN search using one of the following methods: | |||
|
|||
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets. | |||
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines: | |||
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later) | |
- Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later) |
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets. | ||
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines: | ||
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later) | ||
- Faiss search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Faiss search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later) | |
- Faiss engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later) |
|
||
- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter. | ||
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter. | ||
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the intersection of their result sets is taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the intersection of their result sets is taken. | |
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the results are combined based on the query operator(should/must etc) provided in the query. |
|
||
- [Lucene k-NN filter](#using-a-lucene-k-nn-filter): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. You can only use this method with the Hierarchical Navigable Small World (HNSW) algorithm implemented by the Lucene search engine in k-NN plugin versions 2.4 and later. | ||
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets also add that, latencies can be high for this query.
|
||
Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause | ||
:--- | :--- | :--- | :--- | ||
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets remove ivf from here.
| 10M | 80 | 100 | Scoring script |Efficient k-NN filtering | | ||
| 1M | 2.5 | 100 |Efficient k-NN filtering | Scoring script | | ||
| 1M | 38 | 100 |Efficient k-NN filtering |Efficient k-NN filtering/scoring script | | ||
| 1M | 80 | 100 | Boolean filter |Efficient k-NN filtering | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1M | 80 | 100 | Boolean filter |Efficient k-NN filtering | | |
| 1M | 80 | 100 | Efficient k-NN filtering |Boolean filter | |
|
||
A scoring script filter first filters the documents and then uses a brute-force exact k-NN search on the results. For example, the following query searches for hotels with a rating between 8 and 10, inclusive, that provide parking and then performs a k-NN search to return the 3 hotels that are closest to the specified `location`: | ||
You can perform efficient k-NN filtering with the `lucene` or `faiss` search engines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can perform efficient k-NN filtering with the `lucene` or `faiss` search engines. | |
You can perform efficient k-NN filtering with the `lucene` or `faiss` engines. |
|
||
**Step 3: Search your data with a filter** | ||
|
||
Now you can create a k-NN search with filters. <!-- TODO: add details --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same thing which we wrote for lucene above is true for filters here too.
Note that there are multiple ways to construct a filter that returns hotels that provide parking, for example:
A term query clause in the should clause
A wildcard query clause in the should clause
A regexp query clause in the should clause
A must_not clause to eliminate hotels with parking set to false.
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minimal edits
Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!
|
||
- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter. | ||
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sub-bullets here need to be introduced using a colon.
|
||
- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter. | ||
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter. | ||
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Boolean post-filter"? "elements" instead of "parts"? "run" instead of "executed"?
|
||
### Using a Faiss efficient filter | ||
|
||
Consider an index that holds information about shirts for a clothing store. You want to find the top-rated shirts that are similar to the one you have but would like to restrict the results by shirt size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider an index that holds information about shirts for a clothing store. You want to find the top-rated shirts that are similar to the one you have but would like to restrict the results by shirt size. | |
Consider an index containing information about a particular brand of shirt. You want to find the top-rated shirts that are similar to one you already have but would like to restrict the results by shirt size. |
@@ -466,4 +507,196 @@ POST /hotels-index/_search | |||
} | |||
} | |||
``` | |||
{% include copy-curl.html %} | |||
|
|||
## Post filtering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hyphenated elsewhere. Ensure consistency across docs.
|
||
## Post filtering | ||
|
||
You can achieve post filtering with a Boolean filter or by providing the `post_filter` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
} | ||
``` | ||
|
||
### Post filter parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Post-filter?
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
* Add k-NN Faiss filtering documentation Signed-off-by: Fanit Kolchina <[email protected]> * Move the note Signed-off-by: Fanit Kolchina <[email protected]> * Add faiss and a filter table Signed-off-by: Fanit Kolchina <[email protected]> * Refactor boolean filtering section Signed-off-by: Fanit Kolchina <[email protected]> * Clarified that Faiss works with hnsw only Signed-off-by: Fanit Kolchina <[email protected]> * Add more Faiss filtering information Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Update _search-plugins/knn/filter-search-knn.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Implemented editorial comments Signed-off-by: Fanit Kolchina <[email protected]> * Implemented one more editorial comment Signed-off-by: Fanit Kolchina <[email protected]> --------- Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Melissa Vagi <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
* Add k-NN Faiss filtering documentation Signed-off-by: Fanit Kolchina <[email protected]> * Move the note Signed-off-by: Fanit Kolchina <[email protected]> * Add faiss and a filter table Signed-off-by: Fanit Kolchina <[email protected]> * Refactor boolean filtering section Signed-off-by: Fanit Kolchina <[email protected]> * Clarified that Faiss works with hnsw only Signed-off-by: Fanit Kolchina <[email protected]> * Add more Faiss filtering information Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Update _search-plugins/knn/filter-search-knn.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Implemented editorial comments Signed-off-by: Fanit Kolchina <[email protected]> * Implemented one more editorial comment Signed-off-by: Fanit Kolchina <[email protected]> --------- Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Melissa Vagi <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
Description
Fixes #4350
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.