Make schema calculations for LogicalPlan::Aggregate
and LogicalPlan::Distinct
consistent with the HashExec
#8766
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft
Which issue does this PR close?
Closes #8738
Rationale for this change
This is a second sketch of how to close #8738
#8291 / #7647 changed DataFusion's Grouping operator so that it never dictionary encoded the output grouping columns.
Previously, the types of the input grouping expressions were the same as the types of the output group by
The idea I think is that since the values in the group columns are unique, there is no reason to dictionary encode them (as each dictionary entry would have a single value). I actually am not sure about this for reasons I will explain shortly.
What changes are included in this PR?
This PR changes
LogicalPlan::Aggregate
andLogicalPlan::Distinct
(which both use the HashAggregateExecutionPlan
) to report a schema that is dictionary unencoded.Are these changes tested?
Yes, there is a regression test added in #8750
Are there any user-facing changes?