-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Exact Search to return only K results and added client side latency metric for query Benchmarks #933
Conversation
@@ -472,7 +475,7 @@ def _action(self): | |||
return results | |||
|
|||
def _get_measures(self) -> List[str]: | |||
measures = ['took', 'memory_kb'] | |||
measures = ['took', 'memory_kb', 'client_time'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nice to have a unit as well. client_time_millis
or client_time_seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we use constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this I don't see constant will make a lot of sense. Plus these are benchmark scripts, so I don't see a big advantage here.
For time unit, we don't do it for other metrics. Hence I feel for consistency we should keep it. Everything is already in milliseconds only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as its the same unit as took, thats fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its the same unit.
float score = spaceType.getVectorSimilarityFunction().compare(queryVector, vector); | ||
docToScore.put(docId, score); | ||
if(score > topDoc.score) { | ||
topDoc.score = score; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this logic.
You are updating topDoc with new score and new docId. Which means, the queue will replace current top document with new top document. Instead shouldn't we remove the bottom doc and add the new topDoc in the queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a min heap and not max heap. Plus we did the init of the heap. The init of the heap will set all the values as DocId: MAX_DOC_ID and Score as -INF. Now, everytime we update the top, the top element will have score of -INF.
I will add comments around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. So we basically need to keep track of the worst result we have come across so far so that we know which one to replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct. This is what min heap does for you.
} | ||
} | ||
// If scores are negative we will remove them | ||
while (queue.size() > 0 && queue.top().score < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If queue.top() score is negative, doesn't that mean every score in the queue is negative? Then, we can simply empties it instead of looping through it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that heap is init with all values having score of -INF. So in case filterIds < k, some ids in the heap can have -INF value. Hence we need to remove them.
7e67a34
to
93c7595
Compare
if(score > topDoc.score) { | ||
topDoc.score = score; | ||
topDoc.doc = docId; | ||
// As the HitQueue is min heap, updating top will bring the doc with -INF score value on top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just -INF - the lowest score in the queue, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the lowest score too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me update the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Might want to add a few comments around heap operations.
…latency metric for query Benchmarks Signed-off-by: Navneet Verma <[email protected]>
93c7595
to
825dbc9
Compare
updated |
…latency metric for query Benchmarks (opensearch-project#933) Signed-off-by: Navneet Verma <[email protected]>
…latency metric for query Benchmarks (opensearch-project#933) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (#936) * Enabled the efficient filtering support for Faiss Engine (#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (opensearch-project#936) * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (opensearch-project#936) * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934) Signed-off-by: Navneet Verma <[email protected]>
…es include (#936) * Enabled the efficient filtering support for Faiss Engine (#907) * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926) * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928) * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933) * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934) Signed-off-by: Navneet Verma <[email protected]>
Description
Improved Exact Search to return only K results and added client side latency metric for query Benchmarks
Issues Resolved
#903
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.