-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: xform: use ordering from LIMIT as a hint for streaming group-by #94603
Conversation
1f154d4
to
f7a9ff4
Compare
f7a9ff4
to
f8033dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 10 of 10 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a pressing motivation for backporting this rule to v22.2? I think we should probably gate it behind a session setting to avoid changing customers' query plans in a patch release.
FYI, you can backport commits from two PRs with the backport tool: backport 12345 98765 -r 22.2
(not sure if that's what you did, but looks like you did this more manually based on the PR description).
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a pressing motivation for backporting this rule to v22.2?
This is being backported per request from @Leeeeeeeroy-Jenkins for issue https://github.com/cockroachlabs/support/issues/1949
I think we should probably gate it behind a session setting to avoid changing customers' query plans in a patch release.
It is gated behind session setting optimizer_use_limit_ordering_for_streaming_group_by
in this PR.
FYI, you can backport commits from two PRs with the backport tool...
Thanks
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @rharding6373)
I just saw #94672. Can you include that commit in this backport? |
Oh I see, sorry! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the session setting first on 22.2, that's why you don't see a backport message of the 2nd commit.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @cucaroach)
Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
…setting This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `false`. When `true`, an exploration rule which uses the ordering specified in a `SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation. When `false`, the exploration rule is disabled. This setting allows backport of the new exploration rule without impacting query plans of running clusters. Release note: None
f8033dc
to
d102acb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach and @DrewKimball)
Backport 1/1 commits from #93858.
/cc @cockroachdb/release
Fixes #93410
A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan.
Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided.
This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible.
Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
Release justification: low risk fix for missing exploration of streaming group-by, disabled by default
sql: add optimizer_use_limit_ordering_for_streaming_group_by session setting
This commit adds the
optimizer_use_limit_ordering_for_streaming_group_by
session setting that defaults to
false
. Whentrue
, an explorationrule which uses the ordering specified in a
SELECT ... GROUP BY ... ORDER BY ... LIMIT n;
statement as aninteresting ordering to require from the input to the group-by
expression, possibly eliminating a top-k operation. When
false
, theexploration rule is disabled. This setting allows backport of the new
exploration rule without impacting query plans of running clusters.
Release note: None