Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: alias group_by exprs in single_distinct_to_groupby optimizer #3305

Merged
merged 6 commits into from
Sep 7, 2022

Conversation

waynexia
Copy link
Member

Signed-off-by: Ruihang Xia [email protected]

Which issue does this PR close?

Closes #2994.

Rationale for this change

The original issue is apache/horaedb#59, and a minimal reproducer can be found in #2994.

This is trying to fix the single_distinct_to_groupby optimizer cannot handle a GROUP BY with an expression, like GROUP BY c+1. To fix this, we need to alias the "original" GROUP BY in the subquery and refer to that aliased column in outer. E.g.:

Before optimize
SELECT id+1, count(distinct(bank_account))
FROM users
GROUP BY id+1;

After optimize

SELECT group_by_alias1 as "id+1", count(bank_account)
FROM (
  SELECT id+1 as group_by_alias1, bank_account
  FROM users 
  GROUP BY id+1, bank_account
)
GROUP BY group_by_alias1;

What changes are included in this PR?

  • alias group_by exprs
  • rename some symbols and add comment
  • a test to cover this case group_by_with_expr

Are there any user-facing changes?

No

@codecov-commenter
Copy link

Codecov Report

Merging #3305 (578de65) into master (c574269) will decrease coverage by 0.08%.
The diff coverage is 97.67%.

@@            Coverage Diff             @@
##           master    #3305      +/-   ##
==========================================
- Coverage   85.83%   85.75%   -0.09%     
==========================================
  Files         293      294       +1     
  Lines       53170    53750     +580     
==========================================
+ Hits        45637    46091     +454     
- Misses       7533     7659     +126     
Impacted Files Coverage Δ
...fusion/optimizer/src/single_distinct_to_groupby.rs 98.89% <97.67%> (+0.07%) ⬆️
datafusion/expr/src/expr.rs 77.97% <0.00%> (-7.43%) ⬇️
datafusion/expr/src/expr_schema.rs 63.58% <0.00%> (-6.02%) ⬇️
datafusion/expr/src/expr_visitor.rs 59.30% <0.00%> (-4.45%) ⬇️
datafusion/expr/src/expr_rewriter.rs 83.75% <0.00%> (-2.34%) ⬇️
datafusion/sql/src/utils.rs 49.62% <0.00%> (-2.33%) ⬇️
datafusion/proto/src/to_proto.rs 50.99% <0.00%> (-1.86%) ⬇️
...tafusion/optimizer/src/common_subexpr_eliminate.rs 92.85% <0.00%> (-1.45%) ⬇️
datafusion/core/src/physical_plan/planner.rs 79.44% <0.00%> (-1.44%) ⬇️
datafusion/expr/src/window_frame.rs 92.43% <0.00%> (-0.85%) ⬇️
... and 32 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

input.schema().metadata().clone(),
)
.unwrap();
let grouped_agg = LogicalPlan::Aggregate(Aggregate {
let grouped_aggr = LogicalPlan::Aggregate(Aggregate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Aggregate::try_new here, once #3286 is merged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have removed all unwraps

Signed-off-by: Ruihang Xia <[email protected]>
@andygrove
Copy link
Member

test failure caused by #3327

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me -- thanks @waynexia

@alamb
Copy link
Contributor

alamb commented Sep 6, 2022

Sorry for the delay in reviews -- I was out for a week and the DataFusion has been very busy

@alamb
Copy link
Contributor

alamb commented Sep 6, 2022

Screen Shot 2022-09-06 at 10 50 13 AM

I took the liberty of merging this branch with master to resolve merge conflicts. I'll plan to merge it when CI passes

@waynexia
Copy link
Member Author

waynexia commented Sep 7, 2022

I took the liberty of merging this branch with master to resolve merge conflicts.

Thanks! I've reviewed 94c014e and it looks fine 👍

@alamb alamb merged commit 79922b4 into apache:master Sep 7, 2022
@alamb
Copy link
Contributor

alamb commented Sep 7, 2022

Thanks again @waynexia

@ursabot
Copy link

ursabot commented Sep 7, 2022

Benchmark runs are scheduled for baseline = d16457a and contender = 79922b4. 79922b4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@waynexia waynexia deleted the fix-2994 branch September 12, 2022 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

single_distinct_to_groupby panic when group by expr is a binaryExpr
5 participants