Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage Weight#count when size is set to 0 #94858

Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Mar 29, 2023

We have removed shortcut total hit count with #89047 and later noticed a couple of benchmark regressions, which made us restore our shortcut total hit count mechanism.

When not collecting hits (e.g. size=0) we can leverage Lucene skipping mechanism instead of our handmade shortcut total hit count, as Elasticsearch uses TotalHitCountCollector which calls Weight#count. The advantage of this is that it supports shortcutting for many more queries than the only 3 which our manual mechanism supports (match_all, term and field exists).

While at it, a few more tests are added to cover for situations that were not covered before.

We have removed shortcut total hit count with elastic#89047 and later noticed a
couple of benchmark regressions. While we have moved to skip counting
when possible when not collecting hits (e.g. size=0), which is the case
where Elasticsearch uses TotalHitCountCollector and the shortcutting is
supported natively in Lucene.

For the case where hits are collected, the total hit count is counted as
part of the collection in TopScoreDocCollector and TopFieldCollector,
where Lucene does not support skipping the counting as it is hard to
determine whether more competitive hits need to be collected or not.

The previous change caused a regression specifically when collecting
hits because we ended up removing our manual shortcut in favour of
counting which causes overhead.

With this change we reintroduce the shortcut total hit count method,
and only use it when strictly necessary. When size is 0, we rely
entirely on Lucene to shortcut the total hit counting, while when hits
are collected we do it our way, for now.

While at it, a few more tests are added to cover for situations that
were not covered before.
@javanna javanna added :Search/Search Search-related issues that do not fall into other categories >refactoring v8.8.0 labels Mar 29, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Mar 29, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna
Copy link
Member Author

javanna commented Mar 29, 2023

run elasticsearch-ci/part-2

@javanna javanna requested a review from dnhatn March 29, 2023 12:28
@dnhatn
Copy link
Member

dnhatn commented Mar 29, 2023

With this change we reintroduce the shortcut total hit count method, and only use it when strictly necessary.

@javanna Would it be possible to revert #89047 first, then having the only adjustment in this PR? That would make the review easier. Thanks!

@javanna
Copy link
Member Author

javanna commented Mar 29, 2023

With this change we reintroduce the shortcut total hit count method, and only use it when strictly necessary.

@javanna Would it be possible to revert #89047 first, then having the only adjustment in this PR? That would make the review easier. Thanks!

That's what this change does, but without reinstating calling the method when size is set to 0 (EmptyTopDocsCollectorContext). What's the part that you find hard to follow?

@dnhatn
Copy link
Member

dnhatn commented Mar 29, 2023

@javanna The PR partially reverts #89047 and includes additional changes. I had to compare this PR with #89047 to make sure we don't overlook anything.

@javanna javanna changed the title Restore shortcut total hit count Leverage Weight#count when size is set to 0 Mar 29, 2023
assertThat(context.queryResult().topDocs().topDocs.totalHits.relation, equalTo(TotalHits.Relation.EQUAL_TO));
assertThat(context.queryResult().topDocs().topDocs.scoreDocs.length, equalTo(0));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this up because we were trying to replace the parsed query in the existing context but that silently fails, hence we were testing a different scenario and having the wrong expectations as well.

@@ -496,6 +540,7 @@ public void testIndexSortingEarlyTermination() throws Exception {

QueryPhase.executeInternal(context);
assertThat(context.queryResult().topDocs().topDocs.totalHits.value, equalTo((long) numDocs));
assertThat(context.queryResult().topDocs().topDocs.totalHits.relation, equalTo(TotalHits.Relation.EQUAL_TO));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no changes in all the equal_to assertions, I just wanted to make sure we don't have regressions in the future.

@javanna
Copy link
Member Author

javanna commented Mar 29, 2023

@dnhatn this is ready ;)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @javanna.

/**
* Collector that wraps another collector and collects only documents that have a score that's greater or equal than the
* provided minimum score. Given that this collector filters documents out, it does and should not override {@link #setWeight(Weight)},
* as that may lead to exposing total hit count that does not reflect the filtering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this doc. I found it helpful :). Can we override and make the setWeight method final this class and have this comment in the method instead? It's totally optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I considered that option too. It did feel slightly weird to have the same impl as the default one though. But enforcing it in the code is better than just javadocs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take these two out of this PR and open another one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #94886

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@javanna javanna merged commit affcad1 into elastic:main Mar 30, 2023
@javanna javanna deleted the refactoring/restore_shortcut_total_hit_count branch March 30, 2023 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants