Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled the efficient filtering support for Faiss Engine #907

Merged

Conversation

navneet1v
Copy link
Collaborator

@navneet1v navneet1v commented May 22, 2023

Description

This is an initial PR for enabling the pre-filtering support for Faiss Engine.

Changes include:

  1. Upgrading the Faiss to commit id: facebookresearch/faiss@3219e3d
  2. Doing an optimization to select when to use IDArraySelector and IDBitMapSelector.
  3. Don't run Faiss search if no docIds are returned from filters.

Next Steps:

  1. Optimize the JNI interfaces to have single query interface.
  2. Write Unit tests for both java and c++ code.
  3. Write ITs for filtering.
  4. Do perf testing, compare with lucene and share the results on the github issue.
  5. Figure out better way to select IDSelector Map.
  6. Implement Exact search optimization when filterIds < k

Issues Resolved

#903

###Testing
Tested the code using the example mentioned here: https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#using-a-lucene-k-nn-filter for lucene

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v navneet1v added skip-changelog Features Introduces a new unit of functionality that satisfies a requirement feature branch labels May 22, 2023
@codecov
Copy link

codecov bot commented May 22, 2023

Codecov Report

Merging #907 (3bf155b) into feature/faiss-filtering (f11f1f1) will decrease coverage by 1.15%.
The diff coverage is 30.00%.

❗ Current head 3bf155b differs from pull request most recent head 89a26b1. Consider uploading reports for the commit 89a26b1 to get more accurate results

@@                      Coverage Diff                      @@
##             feature/faiss-filtering     #907      +/-   ##
=============================================================
- Coverage                      85.13%   83.99%   -1.15%     
- Complexity                      1088     1094       +6     
=============================================================
  Files                            152      152              
  Lines                           4414     4485      +71     
  Branches                         392      400       +8     
=============================================================
+ Hits                            3758     3767       +9     
- Misses                           479      534      +55     
- Partials                         177      184       +7     
Impacted Files Coverage Δ
...rg/opensearch/knn/index/query/KNNQueryBuilder.java 83.55% <0.00%> (-0.66%) ⬇️
...java/org/opensearch/knn/index/query/KNNScorer.java 45.00% <0.00%> (-24.24%) ⬇️
...main/java/org/opensearch/knn/jni/FaissService.java 85.71% <ø> (ø)
...c/main/java/org/opensearch/knn/jni/JNIService.java 81.08% <0.00%> (-4.64%) ⬇️
...pensearch/knn/plugin/rest/RestGetModelHandler.java 77.77% <0.00%> (ø)
...earch/knn/plugin/transport/DeleteModelRequest.java 84.61% <0.00%> (ø)
.../java/org/opensearch/knn/index/query/KNNQuery.java 48.57% <11.76%> (-34.77%) ⬇️
...java/org/opensearch/knn/index/query/KNNWeight.java 62.28% <20.93%> (-27.20%) ⬇️
...rg/opensearch/knn/index/query/KNNQueryFactory.java 86.48% <83.33%> (-3.52%) ⬇️
...java/org/opensearch/knn/index/util/KNNLibrary.java 100.00% <100.00%> (ø)
... and 4 more

jni/include/org_opensearch_knn_jni_FaissService.h Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/jni/FaissService.java Outdated Show resolved Hide resolved
jni/src/faiss_wrapper.cpp Show resolved Hide resolved
jni/src/faiss_wrapper.cpp Outdated Show resolved Hide resolved
* Dimension = 128
* (1.1 * ( 4 * 128 + 8 * 16) * 7000000)/(1000*1000*1000) ~ 4.9GB
* Ids are sequential in a Segment which means for IDSelectorBitmap total size if the max ID has value of 7M will be
* 7000000/(8*1000) = 875KBs in worst case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 8*1024 = 854KBs worst case

Also, can we add offsets in order to prevent excess storage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by add offset to prevent excess storage?

@navneet1v navneet1v deleted the branch opensearch-project:feature/faiss-filtering May 31, 2023 08:46
@navneet1v navneet1v closed this May 31, 2023
@navneet1v navneet1v reopened this May 31, 2023
@navneet1v navneet1v requested a review from jmazanec15 May 31, 2023 08:50

@Override
public float getMaxScore(int upTo) throws IOException {
return Float.MIN_VALUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float.MIN_VALUE is not negative though. I believe it is 2^−149. Also, k-nn scores are not negative.

src/main/java/org/opensearch/knn/jni/JNIService.java Outdated Show resolved Hide resolved
@@ -59,27 +59,42 @@ public static Query create(CreateQueryRequest createQueryRequest) {
final String fieldName = createQueryRequest.getFieldName();
final int k = createQueryRequest.getK();
final float[] vector = createQueryRequest.getVector();
final Query filterQuery = getFilterQuery(createQueryRequest);
if (filterQuery != null && KNNEngine.getEnginesThatCreateCustomSegmentFiles().contains(createQueryRequest.getKnnEngine())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to pass null in for the filter query and merge this if statement with the one on line 67?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will see how i can simplify the checks

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good! Added a few comments that can be addressed in future revisions.

jni/src/faiss_wrapper.cpp Show resolved Hide resolved
jni/src/faiss_wrapper.cpp Outdated Show resolved Hide resolved
jni/src/faiss_wrapper.cpp Outdated Show resolved Hide resolved
jni/src/faiss_wrapper.cpp Outdated Show resolved Hide resolved
@navneet1v
Copy link
Collaborator Author

Overall, looks good! Added a few comments that can be addressed in future revisions.

resolved the comments and raised new revision

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, please try to address my comments for Java code in coming PRs

this.k = k;
this.indexName = indexName;
this.filterQuery = filterQuery;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rewrite above constructor to call this new one, only passing filter as null?

final BooleanQuery booleanQuery = new BooleanQuery.Builder().add(this.getFilterQuery(), BooleanClause.Occur.FILTER)
.add(new FieldExistsQuery(this.getField()), BooleanClause.Occur.FILTER)
.build();
final Query rewritten = searcher.rewrite(booleanQuery);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this call taking care of recursive part of rewrite calls?

final Query rewritten = searcher.rewrite(booleanQuery);
return searcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1f);
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are aware that null is possible, can we return Optional here and check if it's empty in the client?

@@ -283,7 +283,7 @@ protected Query doToQuery(QueryShardContext context) {
);
}

if (KNNEngine.getEnginesThatCreateCustomSegmentFiles().contains(knnEngine) && filter != null) {
if (KNNEngine.getEnginesThatCreateCustomSegmentFiles().contains(knnEngine) && filter != null && knnEngine != KNNEngine.FAISS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if condition grew up to the size of it's own method

return new KnnFloatVectorQuery(fieldName, vector, k);
}

private static Query getFilterQuery(CreateQueryRequest createQueryRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider returning Optional here, seems that we already know that null is possible here

* @return KNNQueryResult array of k neighbors
*/
public static KNNQueryResult[] queryIndex(long indexPointer, float[] queryVector, int k, String engineName) {
public static KNNQueryResult[] queryIndex(long indexPointer, float[] queryVector, int k, String engineName, int[] filteredIds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can leave the existing method signature unchanged, and call new method with filteredIds == null, this way we'll minimize number of places where we call that method and passing null as a new argument.

@navneet1v navneet1v merged commit 119b8d6 into opensearch-project:feature/faiss-filtering Jun 2, 2023
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit that referenced this pull request Jun 14, 2023
…es include (#936)

* Enabled the efficient filtering support for Faiss Engine (#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include (opensearch-project#936)

* Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jul 10, 2023
…es include (opensearch-project#936)

* Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit that referenced this pull request Jul 10, 2023
…es include (#936)

* Enabled the efficient filtering support for Faiss Engine (#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934)

Signed-off-by: Navneet Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature branch Features Introduces a new unit of functionality that satisfies a requirement skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants