-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage Weight#count when size is set to 0 #94858
Leverage Weight#count when size is set to 0 #94858
Conversation
We have removed shortcut total hit count with elastic#89047 and later noticed a couple of benchmark regressions. While we have moved to skip counting when possible when not collecting hits (e.g. size=0), which is the case where Elasticsearch uses TotalHitCountCollector and the shortcutting is supported natively in Lucene. For the case where hits are collected, the total hit count is counted as part of the collection in TopScoreDocCollector and TopFieldCollector, where Lucene does not support skipping the counting as it is hard to determine whether more competitive hits need to be collected or not. The previous change caused a regression specifically when collecting hits because we ended up removing our manual shortcut in favour of counting which causes overhead. With this change we reintroduce the shortcut total hit count method, and only use it when strictly necessary. When size is 0, we rely entirely on Lucene to shortcut the total hit counting, while when hits are collected we do it our way, for now. While at it, a few more tests are added to cover for situations that were not covered before.
Pinging @elastic/es-search (Team:Search) |
run elasticsearch-ci/part-2 |
That's what this change does, but without reinstating calling the method when size is set to 0 (EmptyTopDocsCollectorContext). What's the part that you find hard to follow? |
assertThat(context.queryResult().topDocs().topDocs.totalHits.relation, equalTo(TotalHits.Relation.EQUAL_TO)); | ||
assertThat(context.queryResult().topDocs().topDocs.scoreDocs.length, equalTo(0)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this up because we were trying to replace the parsed query in the existing context but that silently fails, hence we were testing a different scenario and having the wrong expectations as well.
@@ -496,6 +540,7 @@ public void testIndexSortingEarlyTermination() throws Exception { | |||
|
|||
QueryPhase.executeInternal(context); | |||
assertThat(context.queryResult().topDocs().topDocs.totalHits.value, equalTo((long) numDocs)); | |||
assertThat(context.queryResult().topDocs().topDocs.totalHits.relation, equalTo(TotalHits.Relation.EQUAL_TO)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no changes in all the equal_to assertions, I just wanted to make sure we don't have regressions in the future.
@dnhatn this is ready ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @javanna.
/** | ||
* Collector that wraps another collector and collects only documents that have a score that's greater or equal than the | ||
* provided minimum score. Given that this collector filters documents out, it does and should not override {@link #setWeight(Weight)}, | ||
* as that may lead to exposing total hit count that does not reflect the filtering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this doc. I found it helpful :). Can we override and make the setWeight
method final this class and have this comment in the method instead? It's totally optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I considered that option too. It did feel slightly weird to have the same impl as the default one though. But enforcing it in the code is better than just javadocs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take these two out of this PR and open another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #94886
Hi @javanna, I've created a changelog YAML for you. |
We have removed shortcut total hit count with #89047 and later noticed a couple of benchmark regressions, which made us restore our shortcut total hit count mechanism.
When not collecting hits (e.g. size=0) we can leverage Lucene skipping mechanism instead of our handmade shortcut total hit count, as Elasticsearch uses
TotalHitCountCollector
which callsWeight#count
. The advantage of this is that it supports shortcutting for many more queries than the only 3 which our manual mechanism supports (match_all, term and field exists).While at it, a few more tests are added to cover for situations that were not covered before.