-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SingleDistinctToGroupBy
to get the same plan as the group by
query
#11360
Comments
For query like, datafusion test
duckdb test
I think we can
|
What is the rationale of single distinct to group by? Take |
Before we had specialized and optimized accumulators (e.g the groups accumulators) I think the hash group by was more efficient than using special aggregators. Now that we have special accumulators for distinct aggregates, it may not be needed. we could run some benchmarks to check It also lets you do stuff like reuse the grouping if the same argument is shared between accumlators ( |
No clear advantage of whether convert to group or not |
Is your feature request related to a problem or challenge?
While working on #11299 , I meet the issue that the
single distinct plan
is different fromgroup by
plan.https://github.com/apache/datafusion/pull/11299/files#r1667248774
I solve the issue by handling different values I got in
update_batach
. But, I think this is not the root cause of the problem.SingleDistinctToGroupBy
is convertingdistinct
togroup by
expression. Ideally the optimized plan should be the same as thegroup by
version, but the following plan is not what I expect.Describe the solution you'd like
Rewrite
SingleDistinctToGroupBy
so the optimized plan should be the same like thegroup by
version.I think it is possible to have just one
Aggregate
if outer group by expr is empty and inner aggregate expr is emptyDescribe alternatives you've considered
Do nothing but add the docs about the reason of why we can't
Additional context
No response
The text was updated successfully, but these errors were encountered: