Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop copying LogicalPlan and Exprs in SingleDistinctToGroupBy #10527

Conversation

appletreeisyellow
Copy link
Contributor

@appletreeisyellow appletreeisyellow commented May 15, 2024

Which issue does this PR close?

Closes #10295

Rationale for this change

Make the optimizer faster by not copying

What changes are included in this PR?

  1. Use the OptimizerRule::rewrite API
  2. Stop copying LogicalPlan and Exprs in SingleDistinctToGroupBy

Are these changes tested?

Existing CI

Are there any user-facing changes?

Functional change: No
Performance change: maybe 1-2% faster

Details
$ critcmp main SingleDistinctToGroupBy
group                                         SingleDistinctToGroupBy                main
-----                                         -----------------------                ----
interleave_batches                            1.00   469.5±15.18µs        ? ?/sec    1.01   476.4±13.45µs        ? ?/sec
logical_aggregate_with_join                   1.00   818.8±13.45µs        ? ?/sec    1.00   820.9±15.87µs        ? ?/sec
logical_plan_tpcds_all                        1.00    120.3±1.16ms        ? ?/sec    1.01    121.7±8.87ms        ? ?/sec
logical_plan_tpch_all                         1.00     12.2±0.30ms        ? ?/sec    1.00     12.2±0.27ms        ? ?/sec
logical_select_all_from_1000                  1.00     37.6±1.32ms        ? ?/sec    1.00     37.4±0.17ms        ? ?/sec
logical_select_one_from_700                   1.00   673.0±17.66µs        ? ?/sec    1.00   672.2±16.47µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   612.6±20.57µs        ? ?/sec    1.00   614.8±21.51µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   583.4±15.09µs        ? ?/sec    1.02   592.3±20.22µs        ? ?/sec
merge_batches_no_overlap_large                1.00   472.0±13.96µs        ? ?/sec    1.01   477.6±11.62µs        ? ?/sec
merge_batches_no_overlap_small                1.00   478.9±17.43µs        ? ?/sec    1.02   486.5±19.50µs        ? ?/sec
merge_batches_small_into_large                1.00   400.6±11.79µs        ? ?/sec    1.02   408.5±18.37µs        ? ?/sec
merge_batches_some_overlap_large              1.00   473.4±17.64µs        ? ?/sec    1.01   478.2±23.23µs        ? ?/sec
merge_batches_some_overlap_small              1.00   475.1±12.63µs        ? ?/sec    1.01   482.1±14.89µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1021.7±6.53ms        ? ?/sec    1.00   1021.9±5.70ms        ? ?/sec
physical_plan_tpch_all                        1.00     66.0±0.57ms        ? ?/sec    1.00     66.1±2.30ms        ? ?/sec
physical_plan_tpch_q1                         1.00      3.3±0.05ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.1±0.03ms        ? ?/sec    1.00      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q11                        1.01      2.6±0.03ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.1±0.05ms        ? ?/sec    1.02      2.1±0.10ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1416.7±20.17µs        ? ?/sec    1.01  1435.7±46.31µs        ? ?/sec
physical_plan_tpch_q14                        1.00  1779.1±20.39µs        ? ?/sec    1.00  1776.0±28.18µs        ? ?/sec
physical_plan_tpch_q16                        1.00      2.5±0.05ms        ? ?/sec    1.00      2.5±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      2.8±0.04ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q19                        1.01      4.8±0.50ms        ? ?/sec    1.00      4.8±0.10ms        ? ?/sec
physical_plan_tpch_q2                         1.00      5.8±0.07ms        ? ?/sec    1.01      5.8±0.23ms        ? ?/sec
physical_plan_tpch_q20                        1.02      3.3±0.49ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q21                        1.01      4.6±0.21ms        ? ?/sec    1.00      4.6±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.01      2.3±0.04ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
physical_plan_tpch_q3                         1.01      2.2±0.15ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.00  1627.1±25.40µs        ? ?/sec    1.00  1632.7±28.36µs        ? ?/sec
physical_plan_tpch_q5                         1.00      3.3±0.03ms        ? ?/sec    1.00      3.3±0.04ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1080.6±25.76µs        ? ?/sec    1.01  1087.2±30.44µs        ? ?/sec
physical_plan_tpch_q7                         1.01      4.2±0.28ms        ? ?/sec    1.00      4.1±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.4±0.07ms        ? ?/sec    1.00      5.4±0.05ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.0±0.04ms        ? ?/sec    1.00      4.0±0.05ms        ? ?/sec
physical_select_all_from_1000                 1.00    115.4±1.69ms        ? ?/sec    1.00    115.0±0.42ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec

@github-actions github-actions bot added the optimizer Optimizer rules label May 15, 2024
@erratic-pattern
Copy link
Contributor

Since what I was working on overlapped a bit with the code changes here, I made a PR on this branch to remove some additional clones,

@appletreeisyellow appletreeisyellow#8

@erratic-pattern
Copy link
Contributor

There are still some clones but I think they are necessary here because we are building two aggregates from one

erratic-pattern and others added 2 commits May 17, 2024 13:51
* refactor: use HashSet<&Expr> instead of HashSet<String>

* refactor: remove more cloning
@appletreeisyellow appletreeisyellow marked this pull request as ready for review May 17, 2024 19:46
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @appletreeisyellow and @erratic-pattern

This is looking quite good. The only concern I have is with using the hash value of an expression to check for set membership. Other than that this I think this PR is ready to go

I found whitespace blind diff easier to review: https://github.com/apache/datafusion/pull/10527/files?w=1

datafusion/optimizer/src/single_distinct_to_groupby.rs Outdated Show resolved Hide resolved
let arg = args.swap_remove(0);

let expr_id = distinct_aggr_exprs.hasher().hash_one(&arg);
if distinct_aggr_exprs.insert(expr_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code appears to deduplicate distinct aggregate expressions by comparing their hash values rather than by equality. While expressions that hash to the same value are very likely the same expression, this isn't guaranteed I don't think.

I think for correctness the distinct_agg_exprs should hold the arguments by value -- e.g. distinct_agg_exprs.insert(arg.clone()) or use the display name as in the previous version

I realize this is another clone, but I think it is required for correctness

In general finding some way to compute a digest / fingerprint of an expression that can be used for equality would be great, but maybe it should be done as a separate project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the correctness is more important here 👍

Since arg.display_name is a smaller clone than arg.clone, so I updated the code to use the display name as in the previous version. Updated in 2ba102f

@appletreeisyellow
Copy link
Contributor Author

@alamb Thanks for the review! I have updated the code according to your feedback

I found whitespace blind diff easier to review: #10527 (files)

Indeed! It greatly reduced my mental matching effort 😄

@appletreeisyellow appletreeisyellow requested a review from alamb May 20, 2024 20:48
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks again @appletreeisyellow

@alamb alamb merged commit 8cffb68 into apache:main May 22, 2024
24 checks passed
@appletreeisyellow appletreeisyellow deleted the chunchun/stop-copy-in-SingleDistinctToGroupBy branch May 22, 2024 13:41
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…he#10527)

* chore: merge main and resolve conflict

* chore: use less copy

* chore: remove clone

* remove more clones (apache#8)

* refactor: use HashSet<&Expr> instead of HashSet<String>

* refactor: remove more cloning

* chore: reduce string allocation

Co-authored-by: Adam Curtis <[email protected]>

* chore: return internal error instead of panacing

* chore: use arg display_name as hash key instead of a hashed value

---------

Co-authored-by: Adam Curtis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop copying LogicalPlan and Exprs in SingleDistinctToGroupBy
3 participants