Leverage Weight#count when size is set to 0 #94858

javanna · 2023-03-29T10:02:28Z

We have removed shortcut total hit count with #89047 and later noticed a couple of benchmark regressions, which made us restore our shortcut total hit count mechanism.

When not collecting hits (e.g. size=0) we can leverage Lucene skipping mechanism instead of our handmade shortcut total hit count, as Elasticsearch uses TotalHitCountCollector which calls Weight#count. The advantage of this is that it supports shortcutting for many more queries than the only 3 which our manual mechanism supports (match_all, term and field exists).

While at it, a few more tests are added to cover for situations that were not covered before.

We have removed shortcut total hit count with elastic#89047 and later noticed a couple of benchmark regressions. While we have moved to skip counting when possible when not collecting hits (e.g. size=0), which is the case where Elasticsearch uses TotalHitCountCollector and the shortcutting is supported natively in Lucene. For the case where hits are collected, the total hit count is counted as part of the collection in TopScoreDocCollector and TopFieldCollector, where Lucene does not support skipping the counting as it is hard to determine whether more competitive hits need to be collected or not. The previous change caused a regression specifically when collecting hits because we ended up removing our manual shortcut in favour of counting which causes overhead. With this change we reintroduce the shortcut total hit count method, and only use it when strictly necessary. When size is 0, we rely entirely on Lucene to shortcut the total hit counting, while when hits are collected we do it our way, for now. While at it, a few more tests are added to cover for situations that were not covered before.

elasticsearchmachine · 2023-03-29T10:02:53Z

Pinging @elastic/es-search (Team:Search)

javanna · 2023-03-29T11:46:06Z

run elasticsearch-ci/part-2

dnhatn · 2023-03-29T15:21:52Z

With this change we reintroduce the shortcut total hit count method, and only use it when strictly necessary.

@javanna Would it be possible to revert #89047 first, then having the only adjustment in this PR? That would make the review easier. Thanks!

javanna · 2023-03-29T15:45:48Z

With this change we reintroduce the shortcut total hit count method, and only use it when strictly necessary.

@javanna Would it be possible to revert #89047 first, then having the only adjustment in this PR? That would make the review easier. Thanks!

That's what this change does, but without reinstating calling the method when size is set to 0 (EmptyTopDocsCollectorContext). What's the part that you find hard to follow?

dnhatn · 2023-03-29T16:11:22Z

@javanna The PR partially reverts #89047 and includes additional changes. I had to compare this PR with #89047 to make sure we don't overlook anything.

javanna · 2023-03-29T18:03:46Z

server/src/test/java/org/elasticsearch/search/query/QueryPhaseTests.java

+            assertThat(context.queryResult().topDocs().topDocs.totalHits.relation, equalTo(TotalHits.Relation.EQUAL_TO));
+            assertThat(context.queryResult().topDocs().topDocs.scoreDocs.length, equalTo(0));
+        }
+


I moved this up because we were trying to replace the parsed query in the existing context but that silently fails, hence we were testing a different scenario and having the wrong expectations as well.

javanna · 2023-03-29T18:04:46Z

server/src/test/java/org/elasticsearch/search/query/QueryPhaseTests.java

@@ -496,6 +540,7 @@ public void testIndexSortingEarlyTermination() throws Exception {

        QueryPhase.executeInternal(context);
        assertThat(context.queryResult().topDocs().topDocs.totalHits.value, equalTo((long) numDocs));
+        assertThat(context.queryResult().topDocs().topDocs.totalHits.relation, equalTo(TotalHits.Relation.EQUAL_TO));


no changes in all the equal_to assertions, I just wanted to make sure we don't have regressions in the future.

javanna · 2023-03-29T18:05:10Z

@dnhatn this is ready ;)

dnhatn

LGTM. Thanks @javanna.

dnhatn · 2023-03-29T21:00:09Z

server/src/main/java/org/elasticsearch/common/lucene/MinimumScoreCollector.java

+/**
+ * Collector that wraps another collector and collects only documents that have a score that's greater or equal than the
+ * provided minimum score. Given that this collector filters documents out, it does and should not override {@link #setWeight(Weight)},
+ * as that may lead to exposing total hit count that does not reflect the filtering.


Thanks for adding this doc. I found it helpful :). Can we override and make the setWeight method final this class and have this comment in the method instead? It's totally optional.

yes I considered that option too. It did feel slightly weird to have the same impl as the default one though. But enforcing it in the code is better than just javadocs.

I will take these two out of this PR and open another one.

I opened #94886

elasticsearchmachine · 2023-03-30T07:27:26Z

Hi @javanna, I've created a changelog YAML for you.

javanna added :Search/Search Search-related issues that do not fall into other categories >refactoring v8.8.0 labels Mar 29, 2023

elasticsearchmachine added the Team:Search Meta label for search team label Mar 29, 2023

javanna mentioned this pull request Mar 29, 2023

Reintroduce shortcut total hit count when collecting hits #94170

Closed

javanna requested a review from dnhatn March 29, 2023 12:28

Merge branch 'main' into refactoring/restore_shortcut_total_hit_count

8356db5

javanna changed the title ~~Restore shortcut total hit count~~ Leverage Weight#count when size is set to 0 Mar 29, 2023

javanna commented Mar 29, 2023

View reviewed changes

dnhatn approved these changes Mar 29, 2023

View reviewed changes

javanna added >enhancement and removed >refactoring labels Mar 30, 2023

javanna and others added 2 commits March 30, 2023 09:27

Update docs/changelog/94858.yaml

1e6f035

remove unrelated changes, will open a separate PR

05104c7

javanna merged commit affcad1 into elastic:main Mar 30, 2023

javanna deleted the refactoring/restore_shortcut_total_hit_count branch March 30, 2023 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage Weight#count when size is set to 0 #94858

Leverage Weight#count when size is set to 0 #94858

javanna commented Mar 29, 2023 •

edited

Loading

elasticsearchmachine commented Mar 29, 2023

javanna commented Mar 29, 2023

dnhatn commented Mar 29, 2023

javanna commented Mar 29, 2023

dnhatn commented Mar 29, 2023 •

edited

Loading

javanna Mar 29, 2023

javanna Mar 29, 2023

javanna commented Mar 29, 2023

dnhatn left a comment

dnhatn Mar 29, 2023

javanna Mar 30, 2023

javanna Mar 30, 2023

javanna Mar 30, 2023

elasticsearchmachine commented Mar 30, 2023

Leverage Weight#count when size is set to 0 #94858

Leverage Weight#count when size is set to 0 #94858

Conversation

javanna commented Mar 29, 2023 • edited Loading

elasticsearchmachine commented Mar 29, 2023

javanna commented Mar 29, 2023

dnhatn commented Mar 29, 2023

javanna commented Mar 29, 2023

dnhatn commented Mar 29, 2023 • edited Loading

javanna Mar 29, 2023

Choose a reason for hiding this comment

javanna Mar 29, 2023

Choose a reason for hiding this comment

javanna commented Mar 29, 2023

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Mar 29, 2023

Choose a reason for hiding this comment

javanna Mar 30, 2023

Choose a reason for hiding this comment

javanna Mar 30, 2023

Choose a reason for hiding this comment

javanna Mar 30, 2023

Choose a reason for hiding this comment

elasticsearchmachine commented Mar 30, 2023

javanna commented Mar 29, 2023 •

edited

Loading

dnhatn commented Mar 29, 2023 •

edited

Loading