Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stable grouping set symbol orderings #18721

Merged
merged 2 commits into from
Sep 1, 2023

Conversation

pettyjamesm
Copy link
Member

Description

Previously, GroupIdNode grouping set and output symbol orderings were potentially unstable for the same logical sub-plan due to the use of set constructions that were not order-preserving. While this did not affect correctness, it could result in different symbol orderings within grouping sets in differing branches of UNION ALL operations with the same query on both sides. For example, a query like:

(SELECT
  shippriority, custkey, sum(totalprice)
FROM
  orders
GROUP BY ROLLUP (shippriority, custkey))
UNION ALL
(SELECT
  shippriority, custkey, sum(totalprice)
FROM
  orders
GROUP BY ROLLUP (shippriority, custkey))

could result in logical plans with GroupIdNode grouping sets of either:

  1. [[],[“tpch:shippriority$gid”],[“tpch:shippriority$gid”,“tpch:custkey$gid”]]
  2. [[],[“tpch:shippriority$gid”],[“tpch:custkey$gid”,“tpch:shippriority$gid”]]

This unnecessary variation could make the logical plan harder than necessary for a human to interpret.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@pettyjamesm pettyjamesm requested a review from martint August 17, 2023 18:00
@cla-bot cla-bot bot added the cla-signed label Aug 17, 2023
@pettyjamesm pettyjamesm marked this pull request as ready for review August 18, 2023 14:07
@pettyjamesm pettyjamesm force-pushed the stable-rollup-plans branch 3 times, most recently from d4681c2 to 2b4af81 Compare August 22, 2023 17:43
@pettyjamesm pettyjamesm requested review from kasiafi and sopel39 August 23, 2023 17:12
@sopel39 sopel39 requested a review from lukasz-stec August 25, 2023 10:12
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pettyjamesm pettyjamesm requested a review from findepi September 1, 2023 15:13
@@ -1887,7 +1886,7 @@ public PhysicalOperation visitGroupId(GroupIdNode node, LocalExecutionPlanContex

int outputChannel = 0;

for (Symbol output : node.getGroupingSets().stream().flatMap(Collection::stream).collect(Collectors.toSet())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be enough to say toImmutableSet here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this particular usage site it would be, but the same logic of producing distinct grouping set symbols is used in a few other places inside of GroupIdNode so I think it makes sense to just use it as a public method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be nice to separate "fix stability" from "reduce code duplication". maybe two commits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately all of these changes are necessary to avoid the usage of unstable-ordering Set implementations, so breaking the commit into two would just introduce temporary changes that would immediately be deleted by the code de-duplication refactor- so I think it's simpler to leave it as a single commit.

Previously, GroupIdNode grouping set and output symbol orderings were
potentially unstable for the same logical sub-plan due to the use of
set constructions that were not order-preserving. While this did not
affect correctness, it could result in different symbol orderings within
grouping sets in differing branches of UNION ALL operations with the
same query on both sides. For example, a query like:

(SELECT
  shippriority, custkey, sum(totalprice)
FROM
  orders
GROUP BY ROLLUP (shippriority, custkey))
UNION ALL
(SELECT
  shippriority, custkey, sum(totalprice)
FROM
  orders
GROUP BY ROLLUP (shippriority, custkey))

could result in logical plans with GroupIdNode grouping sets of either:
1. [[],[“tpch:shippriority$gid”],[“tpch:shippriority$gid”,“tpch:custkey$gid”]]
2. [[],[“tpch:shippriority$gid”],[“tpch:custkey$gid”,“tpch:shippriority$gid”]]

This does not affect correctness, but would make the logical plan harder
than necessary for a human to interpret unnecessarily.
@pettyjamesm pettyjamesm merged commit f1556ff into trinodb:master Sep 1, 2023
@github-actions github-actions bot added this to the 426 milestone Sep 1, 2023
@pettyjamesm pettyjamesm deleted the stable-rollup-plans branch September 1, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants