-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabled the efficient filtering support for Faiss Engine #907
Enabled the efficient filtering support for Faiss Engine #907
Conversation
Codecov Report
@@ Coverage Diff @@
## feature/faiss-filtering #907 +/- ##
=============================================================
- Coverage 85.13% 83.99% -1.15%
- Complexity 1088 1094 +6
=============================================================
Files 152 152
Lines 4414 4485 +71
Branches 392 400 +8
=============================================================
+ Hits 3758 3767 +9
- Misses 479 534 +55
- Partials 177 184 +7
|
jni/src/faiss_wrapper.cpp
Outdated
* Dimension = 128 | ||
* (1.1 * ( 4 * 128 + 8 * 16) * 7000000)/(1000*1000*1000) ~ 4.9GB | ||
* Ids are sequential in a Segment which means for IDSelectorBitmap total size if the max ID has value of 7M will be | ||
* 7000000/(8*1000) = 875KBs in worst case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 8*1024 = 854KBs worst case
Also, can we add offsets in order to prevent excess storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by add offset to prevent excess storage?
|
||
@Override | ||
public float getMaxScore(int upTo) throws IOException { | ||
return Float.MIN_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Float.MIN_VALUE is not negative though. I believe it is 2^−149. Also, k-nn scores are not negative.
@@ -59,27 +59,42 @@ public static Query create(CreateQueryRequest createQueryRequest) { | |||
final String fieldName = createQueryRequest.getFieldName(); | |||
final int k = createQueryRequest.getK(); | |||
final float[] vector = createQueryRequest.getVector(); | |||
final Query filterQuery = getFilterQuery(createQueryRequest); | |||
if (filterQuery != null && KNNEngine.getEnginesThatCreateCustomSegmentFiles().contains(createQueryRequest.getKnnEngine())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to pass null in for the filter query and merge this if statement with the one on line 67?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will see how i can simplify the checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good! Added a few comments that can be addressed in future revisions.
Signed-off-by: Navneet Verma <[email protected]>
resolved the comments and raised new revision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good, please try to address my comments for Java code in coming PRs
this.k = k; | ||
this.indexName = indexName; | ||
this.filterQuery = filterQuery; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rewrite above constructor to call this new one, only passing filter as null?
final BooleanQuery booleanQuery = new BooleanQuery.Builder().add(this.getFilterQuery(), BooleanClause.Occur.FILTER) | ||
.add(new FieldExistsQuery(this.getField()), BooleanClause.Occur.FILTER) | ||
.build(); | ||
final Query rewritten = searcher.rewrite(booleanQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this call taking care of recursive part of rewrite calls?
final Query rewritten = searcher.rewrite(booleanQuery); | ||
return searcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1f); | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are aware that null is possible, can we return Optional here and check if it's empty in the client?
@@ -283,7 +283,7 @@ protected Query doToQuery(QueryShardContext context) { | |||
); | |||
} | |||
|
|||
if (KNNEngine.getEnginesThatCreateCustomSegmentFiles().contains(knnEngine) && filter != null) { | |||
if (KNNEngine.getEnginesThatCreateCustomSegmentFiles().contains(knnEngine) && filter != null && knnEngine != KNNEngine.FAISS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if condition grew up to the size of it's own method
return new KnnFloatVectorQuery(fieldName, vector, k); | ||
} | ||
|
||
private static Query getFilterQuery(CreateQueryRequest createQueryRequest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider returning Optional here, seems that we already know that null is possible here
* @return KNNQueryResult array of k neighbors | ||
*/ | ||
public static KNNQueryResult[] queryIndex(long indexPointer, float[] queryVector, int k, String engineName) { | ||
public static KNNQueryResult[] queryIndex(long indexPointer, float[] queryVector, int k, String engineName, int[] filteredIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can leave the existing method signature unchanged, and call new method with filteredIds == null, this way we'll minimize number of places where we call that method and passing null as a new argument.
…project#907) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (#936) * Enabled the efficient filtering support for Faiss Engine (#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (opensearch-project#936) * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (opensearch-project#936) * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (#936) * Enabled the efficient filtering support for Faiss Engine (#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934) Signed-off-by: Navneet Verma <[email protected]>
Description
This is an initial PR for enabling the pre-filtering support for Faiss Engine.
Changes include:
Next Steps:
Issues Resolved
#903
###Testing
Tested the code using the example mentioned here: https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#using-a-lucene-k-nn-filter for lucene
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.