Improved Exact Search to return only K results and added client side latency metric for query Benchmarks #933

navneet1v · 2023-06-09T20:26:07Z

Description

Improved Exact Search to return only K results and added client side latency metric for query Benchmarks

Issues Resolved

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

heemin32 · 2023-06-09T20:29:02Z

benchmarks/perf-tool/okpt/test/steps/steps.py

@@ -472,7 +475,7 @@ def _action(self):
        return results

    def _get_measures(self) -> List[str]:
-        measures = ['took', 'memory_kb']
+        measures = ['took', 'memory_kb', 'client_time']


Could be nice to have a unit as well. client_time_millis or client_time_seconds

Also, can we use constant?

For this I don't see constant will make a lot of sense. Plus these are benchmark scripts, so I don't see a big advantage here.

For time unit, we don't do it for other metrics. Hence I feel for consistency we should keep it. Everything is already in milliseconds only.

As long as its the same unit as took, thats fine

its the same unit.

heemin32 · 2023-06-09T20:36:53Z

src/main/java/org/opensearch/knn/index/query/KNNWeight.java

                float score = spaceType.getVectorSimilarityFunction().compare(queryVector, vector);
-                docToScore.put(docId, score);
+                if(score > topDoc.score) {
+                    topDoc.score = score;


I don't understand this logic.
You are updating topDoc with new score and new docId. Which means, the queue will replace current top document with new top document. Instead shouldn't we remove the bottom doc and add the new topDoc in the queue?

So this is a min heap and not max heap. Plus we did the init of the heap. The init of the heap will set all the values as DocId: MAX_DOC_ID and Score as -INF. Now, everytime we update the top, the top element will have score of -INF.

I will add comments around this.

Ah I see. So we basically need to keep track of the worst result we have come across so far so that we know which one to replace.

Yes that is correct. This is what min heap does for you.

heemin32 · 2023-06-09T20:38:01Z

src/main/java/org/opensearch/knn/index/query/KNNWeight.java

+                }
+            }
+            // If scores are negative we will remove them
+            while (queue.size() > 0 && queue.top().score < 0) {


If queue.top() score is negative, doesn't that mean every score in the queue is negative? Then, we can simply empties it instead of looping through it?

Given that heap is init with all values having score of -INF. So in case filterIds < k, some ids in the heap can have -INF value. Hence we need to remove them.

jmazanec15 · 2023-06-09T23:02:57Z

src/main/java/org/opensearch/knn/index/query/KNNWeight.java

+                if(score > topDoc.score) {
+                    topDoc.score = score;
+                    topDoc.doc = docId;
+                    // As the HitQueue is min heap, updating top will bring the doc with -INF score value on top.


Not just -INF - the lowest score in the queue, correct?

Yes the lowest score too.

Let me update the documentation

jmazanec15

Looks good to me. Might want to add a few comments around heap operations.

…latency metric for query Benchmarks Signed-off-by: Navneet Verma <[email protected]>

navneet1v · 2023-06-09T23:06:15Z

Looks good to me. Might want to add a few comments around heap operations.

updated

…latency metric for query Benchmarks (opensearch-project#933) Signed-off-by: Navneet Verma <[email protected]>

…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>

…es include (#936) * Enabled the efficient filtering support for Faiss Engine (#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934) Signed-off-by: Navneet Verma <[email protected]>

…es include (opensearch-project#936) * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>

…es include (#936) * Enabled the efficient filtering support for Faiss Engine (#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934) Signed-off-by: Navneet Verma <[email protected]>

navneet1v requested review from heemin32, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei and martin-gaievski as code owners June 9, 2023 20:26

navneet1v added the skip-changelog label Jun 9, 2023

heemin32 reviewed Jun 9, 2023

View reviewed changes

navneet1v force-pushed the feature/faiss-filtering branch from 7e67a34 to 93c7595 Compare June 9, 2023 22:09

heemin32 approved these changes Jun 9, 2023

View reviewed changes

jmazanec15 reviewed Jun 9, 2023

View reviewed changes

jmazanec15 approved these changes Jun 9, 2023

View reviewed changes

Improved Exact Search to return only K results and added client side …

825dbc9

…latency metric for query Benchmarks Signed-off-by: Navneet Verma <[email protected]>

navneet1v force-pushed the feature/faiss-filtering branch from 93c7595 to 825dbc9 Compare June 9, 2023 23:05

navneet1v merged commit f5ff953 into opensearch-project:feature/faiss-filtering Jun 9, 2023

navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023

Improved Exact Search to return only K results and added client side …

a81ee6a

…latency metric for query Benchmarks (opensearch-project#933) Signed-off-by: Navneet Verma <[email protected]>

navneet1v mentioned this pull request Jun 14, 2023

Added support for Efficient Pre-filtering for Faiss Engine #936

Merged

5 tasks

navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023

Improved Exact Search to return only K results and added client side …

eeca861

…latency metric for query Benchmarks (opensearch-project#933) Signed-off-by: Navneet Verma <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Exact Search to return only K results and added client side latency metric for query Benchmarks #933

Improved Exact Search to return only K results and added client side latency metric for query Benchmarks #933

navneet1v commented Jun 9, 2023

heemin32 Jun 9, 2023

heemin32 Jun 9, 2023

navneet1v Jun 9, 2023

jmazanec15 Jun 9, 2023

navneet1v Jun 9, 2023

heemin32 Jun 9, 2023

navneet1v Jun 9, 2023

heemin32 Jun 9, 2023

jmazanec15 Jun 9, 2023

navneet1v Jun 9, 2023

heemin32 Jun 9, 2023

navneet1v Jun 9, 2023

jmazanec15 Jun 9, 2023

navneet1v Jun 9, 2023

navneet1v Jun 9, 2023

jmazanec15 left a comment

navneet1v commented Jun 9, 2023

Improved Exact Search to return only K results and added client side latency metric for query Benchmarks #933

Improved Exact Search to return only K results and added client side latency metric for query Benchmarks #933

Conversation

navneet1v commented Jun 9, 2023

Description

Issues Resolved

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmazanec15 left a comment

Choose a reason for hiding this comment

navneet1v commented Jun 9, 2023