Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-organize and rename aggregates physical plan #2388

Merged
merged 3 commits into from
May 2, 2022

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Apr 30, 2022

Which issue does this PR close?

Closes #2387.

Rationale for this change

  • We currently have a hash-based implementation, GroupedHashAggregateStream for aggregate with grouping keys, and a non-hash implementation for aggregate without grouping keys but named HashAggregateStream.
  • We could further enrich the aggregation method from hash-based to sort-based at runtime when we are run out of memory, as described in Memory Limited GroupBy (Externalized / Spill) #1570

What changes are included in this PR?

  1. Promote hash_aggregates to a directory aggregates, and re-organize code inside this aggregates module.
  2. Rename HashAggregateExec to AggregateExec, since it's not always hashing.

Are there any user-facing changes?

No.

No.

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Apr 30, 2022
};

/// stream struct for aggregation without grouping columns
pub(crate) struct NoGroupingAggregateStream {
Copy link
Member Author

@yjshen yjshen Apr 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on the name of this single-state aggregation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just AggregateStream?

@@ -151,7 +151,7 @@ fn build_exec_plan_diagram(
id: &mut AtomicUsize,
draw_entity: bool,
) -> Result<usize> {
let operator_str = if plan.as_any().downcast_ref::<HashAggregateExec>().is_some() {
let operator_str = if plan.as_any().downcast_ref::<AggregateExec>().is_some() {
"HashAggregateExec"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also change the operator string here to remove Hash ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove. I've updated all these explanation strings and checked all occurrences of hash_aggregate and HashAggregate.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left one question.

@yjshen yjshen self-assigned this May 1, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks @yjshen

@@ -306,19 +306,21 @@ impl AsExecutionPlan for PhysicalPlanNode {
Arc::new((&input_schema).try_into()?),
)?))
}
PhysicalPlanType::HashAggregate(hash_agg) => {
PhysicalPlanType::Aggregate(hash_agg) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

};

/// stream struct for aggregation without grouping columns
pub(crate) struct NoGroupingAggregateStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just AggregateStream?

@yjshen yjshen merged commit 3b42f3d into apache:master May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-organize and rename aggregates physical plan
3 participants