Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k-NN Faiss filtering documentation #4476

Merged
merged 12 commits into from
Jul 18, 2023
Merged

Add k-NN Faiss filtering documentation #4476

merged 12 commits into from
Jul 18, 2023

Conversation

kolchfa-aws
Copy link
Collaborator

@kolchfa-aws kolchfa-aws commented Jul 3, 2023

Description

Fixes #4350

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kolchfa-aws kolchfa-aws self-assigned this Jul 3, 2023
@hdhalter hdhalter added the release-notes PR: Include this PR in the automated release notes label Jul 13, 2023
@@ -11,12 +11,24 @@ has_math: true

To refine k-NN results, you can filter a k-NN search using one of the following methods:

- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned(if present). This approach is supported by the following engines:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

@@ -11,12 +11,24 @@ has_math: true

To refine k-NN results, you can filter a k-NN search using one of the following methods:

- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
- Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)

- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
- Faiss search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Faiss search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later)
- Faiss engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later)


- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the intersection of their result sets is taken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the intersection of their result sets is taken.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the results are combined based on the query operator(should/must etc) provided in the query.


- [Lucene k-NN filter](#using-a-lucene-k-nn-filter): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. You can only use this method with the Hierarchical Navigable Small World (HNSW) algorithm implemented by the Lucene search engine in k-NN plugin versions 2.4 and later.
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also add that, latencies can be high for this query.


Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause
:--- | :--- | :--- | :---
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove ivf from here.

| 10M | 80 | 100 | Scoring script |Efficient k-NN filtering |
| 1M | 2.5 | 100 |Efficient k-NN filtering | Scoring script |
| 1M | 38 | 100 |Efficient k-NN filtering |Efficient k-NN filtering/scoring script |
| 1M | 80 | 100 | Boolean filter |Efficient k-NN filtering |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 1M | 80 | 100 | Boolean filter |Efficient k-NN filtering |
| 1M | 80 | 100 | Efficient k-NN filtering |Boolean filter |


A scoring script filter first filters the documents and then uses a brute-force exact k-NN search on the results. For example, the following query searches for hotels with a rating between 8 and 10, inclusive, that provide parking and then performs a k-NN search to return the 3 hotels that are closest to the specified `location`:
You can perform efficient k-NN filtering with the `lucene` or `faiss` search engines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can perform efficient k-NN filtering with the `lucene` or `faiss` search engines.
You can perform efficient k-NN filtering with the `lucene` or `faiss` engines.


**Step 3: Search your data with a filter**

Now you can create a k-NN search with filters. <!-- TODO: add details -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing which we wrote for lucene above is true for filters here too.

Note that there are multiple ways to construct a filter that returns hotels that provide parking, for example:

A term query clause in the should clause
A wildcard query clause in the should clause
A regexp query clause in the should clause
A must_not clause to eliminate hotels with parking set to false.

@kolchfa-aws kolchfa-aws marked this pull request as ready for review July 17, 2023 16:02
Copy link
Contributor

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minimal edits

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Show resolved Hide resolved
Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved

- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sub-bullets here need to be introduced using a colon.

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved

- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Boolean post-filter"? "elements" instead of "parts"? "run" instead of "executed"?

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved

### Using a Faiss efficient filter

Consider an index that holds information about shirts for a clothing store. You want to find the top-rated shirts that are similar to the one you have but would like to restrict the results by shirt size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consider an index that holds information about shirts for a clothing store. You want to find the top-rated shirts that are similar to the one you have but would like to restrict the results by shirt size.
Consider an index containing information about a particular brand of shirt. You want to find the top-rated shirts that are similar to one you already have but would like to restrict the results by shirt size.

@@ -466,4 +507,196 @@ POST /hotels-index/_search
}
}
```
{% include copy-curl.html %}

## Post filtering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hyphenated elsewhere. Ensure consistency across docs.


## Post filtering

You can achieve post filtering with a Boolean filter or by providing the `post_filter` parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

}
```

### Post filter parameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Post-filter?

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
kolchfa-aws and others added 4 commits July 18, 2023 10:25
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws merged commit 6c83dfd into main Jul 18, 2023
@hdhalter hdhalter mentioned this pull request Sep 25, 2023
29 tasks
@prudhvigodithi prudhvigodithi added release-notes PR: Include this PR in the automated release notes and removed release-notes PR: Include this PR in the automated release notes labels Oct 3, 2023
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
* Add k-NN Faiss filtering documentation

Signed-off-by: Fanit Kolchina <[email protected]>

* Move the note

Signed-off-by: Fanit Kolchina <[email protected]>

* Add faiss and a filter table

Signed-off-by: Fanit Kolchina <[email protected]>

* Refactor boolean filtering section

Signed-off-by: Fanit Kolchina <[email protected]>

* Clarified that Faiss works with hnsw only

Signed-off-by: Fanit Kolchina <[email protected]>

* Add more Faiss filtering information

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Update _search-plugins/knn/filter-search-knn.md

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented one more editorial comment

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli added a commit that referenced this pull request Dec 21, 2023
* Add k-NN Faiss filtering documentation

Signed-off-by: Fanit Kolchina <[email protected]>

* Move the note

Signed-off-by: Fanit Kolchina <[email protected]>

* Add faiss and a filter table

Signed-off-by: Fanit Kolchina <[email protected]>

* Refactor boolean filtering section

Signed-off-by: Fanit Kolchina <[email protected]>

* Clarified that Faiss works with hnsw only

Signed-off-by: Fanit Kolchina <[email protected]>

* Add more Faiss filtering information

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Update _search-plugins/knn/filter-search-knn.md

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented one more editorial comment

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
@Naarcha-AWS Naarcha-AWS deleted the knn-filter-update branch March 28, 2024 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes PR: Include this PR in the automated release notes v2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Faiss Engine Efficient Filtering
6 participants