-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize composite aggregation based on index sorting #48399
Conversation
…sort is applicable. Add index sort uts.
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
@howardhuanghua I couldn't push directly to your branch so I opened a new pr that retains your commit. I hope you don't mind if we continue to work here since we both have the right to push modifications. I merged the two approaches that we discussed in the original pr so I think this pr is ready for a review. We still need to update the documentation but I wanted to get your opinion first. |
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
Hi @jimczi , I am ok and thanks for your merging. It's my pleasure that I could contribute on this optimization with you together. I left a comment about the merged code, please help to check. Thank you. |
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Outdated
Show resolved
Hide resolved
...st/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregatorTests.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Outdated
Show resolved
Hide resolved
Thanks for looking @jpountz and @howardhuanghua . I pushed some commits to address your comments and added a section in the docs to explain the early termination. Could you take another look ? |
private void testSearchCase(List<Query> queries, | ||
List<Map<String, List<Object>>> dataset, | ||
Supplier<CompositeAggregationBuilder> create, | ||
Consumer<InternalComposite> verify) throws IOException { | ||
for (Query query : queries) { | ||
executeTestCase(false, query, dataset, create, verify); | ||
executeTestCase(true, query, dataset, create, verify); | ||
executeTestCase(false, false, query, dataset, create, verify); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these two TestCase set useIndexSort
as false, then it could not be early terminated. This would make testEarlyTermination()
failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testEarlyTermination
uses executeTestCase
directly and sets useIndexSort
to true so I think we're good here. However I found a test bug when the AssertingCodec is used, this codec wraps docvalues and that breaks DocValues#unwrapSingleton
contract. So I pushed 1fa763b to suppress this code in the early termination tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even suppressed codecs, it seems still got the chance to use
keyword -> {AssertingDocValuesFormat$AssertingDocValuesProducer@4968}
. And testEarlyTermination
would failed occasionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch thanks, I pushed b52d5c7 to force the suppression of the Asserting codec but that means that the test is ignored when the randomization choose this codec. I need to dig a bit more to understand how we can ensure that the test always runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jimczi , I have checked that system properties like tests.codec
, tests.docvaluesformat
and tests.postingsformat
default value is random
in LuceneTestCase.java
, AssertingCodec
would be selected.
It seems we could set default Lucene80
codec directly in executeTestCase
method as follow to solve the problem:
if (indexSort != null) {
config.setCodec(TestUtil.getDefaultCodec()); // set codec as default to avoid AssertingCodec
config.setIndexSort(indexSort);
}
|
||
The `composite` aggregation can early terminate the collection under some circumstances: | ||
|
||
* If the primary source extracts values from a numeric or a keyword field and the query matches all documents (`match_all` query). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early terminate optimization could also be used for non match_all
query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if the index is sorted, yes. I also wanted to mention that we early terminate if the index is not sorted but I see how it's confusing now. I pushed 1fa763b to reword the paragraph to emphasize on index sorting first.
...st/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregatorTests.java
Show resolved
Hide resolved
I pushed another commit to address the remaining TODOs and the reviews. There is still one TODO left regarding how we could handle |
...java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesSourceConfig.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesSourceConfig.java
Show resolved
Hide resolved
Hi @jimczi , LGTM. Thanks for your great efforts to fix these issues. |
Hi @jimczi, would you please help to merge this PR? Or we need further review? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass and left some questions. I will be doing a second pass and focus on the tests.
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
...st/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregatorTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/searchafter/SearchAfterBuilder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/internal/ContextIndexSearcher.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic makes sense to me. lgtm
Thanks @howardhuanghua and sorry for the delay. I merged in master and added you as co-author. I'll now backport to 7.x. |
Co-authored-by: Daniel Huang <[email protected]> This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true
@jimczi It's my pleasure to co-work with you on this optimization. Thanks for for your help. |
Co-authored-by: Daniel Huang <[email protected]> This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true
Co-authored-by: Daniel Huang <[email protected]> This is a spinoff of elastic#48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true
@@ -73,6 +86,8 @@ | |||
private RoaringDocIdSet.Builder docIdSetBuilder; | |||
private BucketCollector deferredCollectors; | |||
|
|||
private boolean earlyTerminated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
提前结束
This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification.
In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue.
The optimization is also applicable when a query other than
match_all
is provided. However the optimization is deactivated for sources that match the index sort in the following cases:missing_bucket
is set to trueI retained the commit from #48130 and merged the original idea that @howardhuanghua described in the pr to early terminate even when the sort is reversed in the leading source.