Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more conservative when turning off partial aggregation #17143

Merged
merged 5 commits into from
Apr 25, 2023

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Apr 20, 2023

  • Compare input pages size with PA buffer size as a threshold when to consider turning PA off
  • Turn PA on periodically

RN

# General
* Make aggregation performance more stable with regard to input file size variability. ({issue}`issuenumber`)

Fixes #11361

sopel39 added 2 commits April 19, 2023 16:11
Page with 0 rows is indistinguishable
from not processing a page.
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments.
Generally looks good

@sopel39
Copy link
Member Author

sopel39 commented Apr 25, 2023

Unpart results:

  label TPCH wall time TPC-DS wall time TPCH CPU time TPC-DS CPU time TPCH peak mem TPC-DS peak mem
baseline 578.440167 1261.163333 45826.2 144182.661333 6.818936e+08 1.224555e+09
baseline 570.306667 1200.626500 42817.3 135459.880500 6.725836e+08 1.227770e+09
after 560.129333 1210.207667 43115.6 137519.206667 6.681480e+08 1.236771e+09
after 579.676000 1215.169333 43279.5 136996.734167 6.666234e+08 1.233815e+09
after 574.821500 1267.355833 45980.5 145342.129333 6.770095e+08 1.227259e+09
after 559.821667 1215.303833 43221.8 137020.437000 6.707842e+08 1.227357e+09

part results:

label TPCH wall time TPC-DS wall time TPCH CPU time TPC-DS CPU time TPCH peak mem TPC-DS peak mem
baseline 551.025167 834.002500 48395.1 80963.572667 1.338932e+09 1.034911e+09
baseline 591.820500 878.372000 50361.3 84346.527167 1.282509e+09 1.032088e+09
after 547.558667 830.258833 47708.3 80774.878500 1.286471e+09 1.036311e+09
after 570.878500 832.361667 47283.5 79942.997333 1.307153e+09 1.034362e+09

sopel39 added 3 commits April 25, 2023 13:18
Partial aggregation might have better efficiency when PA buffer
is fully utilized. Therefore it doesn't make sense to disable
partial aggregation eagerly after small amount of input data was
processed (e.g. for few small initial splits which didn't fully
utilized PA buffer).
Re-enable partial aggregation periodically in
case aggregation efficiency improved while
data is being processed (e.g. larger splits
are observed).
@sopel39 sopel39 merged commit e4ad6b6 into trinodb:master Apr 25, 2023
@sopel39 sopel39 deleted the ks/pa_improve branch April 25, 2023 11:21
@github-actions github-actions bot added this to the 415 milestone Apr 25, 2023
prithvip added a commit to prithvip/presto that referenced this pull request Nov 27, 2023
When partial aggregation is not effectively reducing cardinality, instead send
raw input rows directly to the final aggregation step.

Port of:
github.com/trinodb/trino/pull/11011
github.com/trinodb/trino/pull/17143

Co-authored-by: Lukasz Stec <[email protected]>
Co-authored-by: Karol Sobczak <[email protected]>
prithvip added a commit to prestodb/presto that referenced this pull request Nov 27, 2023
When partial aggregation is not effectively reducing cardinality, instead send
raw input rows directly to the final aggregation step.

Port of:
github.com/trinodb/trino/pull/11011
github.com/trinodb/trino/pull/17143

Co-authored-by: Lukasz Stec <[email protected]>
Co-authored-by: Karol Sobczak <[email protected]>
mlyublena pushed a commit to prestodb/presto that referenced this pull request Nov 28, 2023
When partial aggregation is not effectively reducing cardinality, instead send
raw input rows directly to the final aggregation step.

Port of:
github.com/trinodb/trino/pull/11011
github.com/trinodb/trino/pull/17143

Co-authored-by: Lukasz Stec <[email protected]>
Co-authored-by: Karol Sobczak <[email protected]>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Mar 14, 2024
When partial aggregation is not effectively reducing cardinality, instead send
raw input rows directly to the final aggregation step.

Port of:
github.com/trinodb/trino/pull/11011
github.com/trinodb/trino/pull/17143

Co-authored-by: Lukasz Stec <[email protected]>
Co-authored-by: Karol Sobczak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Support re-enabling partial aggregation adaptively
3 participants