-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of aggregation operator #19425
Draft
fgwang7w
wants to merge
8
commits into
prestodb:master
Choose a base branch
from
fgwang7w:optimizemultichannelgroupby
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cherry-pick of trinodb/trino@0a70468 co-authored-by: Karol Sobczak <[email protected]>
Cherry-pick of trinodb/trino@301ff47 Co-authored-by: skrzypo987<[email protected]>
If the number of combinations of all dictionaries in a page is below certain number, we can store the results in a small array and reuse found groups Cherry-pick of trinodb/trino@ffd1ee8 Co-authored-by: skrzypo987<[email protected]>
For simplicity and tiny performance gain. Cherry-pick of trinodb/trino@7ec3bd0 Co-authored-by: skrzypo987 <[email protected]>
Cherry-pick of trinodb/trino@7ee53ea Co-authored-by: skrzypo987 <[email protected]>
Cherry-pick of trinodb/trino@27e0c32 Co-authored-by: skrzypo987<[email protected]>
Previously the hash table capacity was checked every row to see whether a rehash is needed. Now the input page is split into batches and it is assumed that every row in batch will create a new group (which is rarely the case) and rehashing is done in advance before processing. This may slightly increase memory footprint for small number of groups, however there is a tiny performance gain as the capacity is not checked every row. Cherry-pick of trinodb/trino@88cd492 Co-authored-by: skrzypo987<[email protected]>
|
There's an off-by-one error in the check that can cause a failure when the page is empty Cherry-pick of trinodb/trino@08db4fb Co-authored-by: Karol Sobczak <[email protected]>
fgwang7w
force-pushed
the
optimizemultichannelgroupby
branch
from
April 18, 2023 20:16
3cb3d2a
to
cfcbaae
Compare
@tdcmeehan Do you know how we solve the CLA problems? |
@tdcmeehan @yingsu00 gentle ping. we still have CLA compliance issues unsolved and need community's support to figure out how to make it passed. thanks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reduce large long[] memory usage and Improve Group-by performance
For memory optimization:
MultiChannelGroupByHash
.e.g we are looking at 64MB of long[] bytes * 15 = 960MB that can be avoided for memory allocation
Cherry-pick of trinodb/trino#9514
Cherry-pick of trinodb/trino#10965
Cherry-pick of trinodb/trino#12336
Cherry-pick of trinodb/trino#12597
Test Result: (sample query from tpcds-q10 with multiple grouping sets)
Before:
Peak User Memory | 11.37MB
Peak Total Memory | 78.63MB
Elapsed Time | 7.68s
After:
Peak User Memory | 5.65MB
Peak Total Memory | 61.71MB
Elapsed Time | 2.08s
Performance test on TPC-H 1TB benchmark: