Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream #13995

kosiew · 2025-01-03T06:04:49Z

Which issue does this PR close?

Rationale for this change

When an aggregation operator spills intermediate (partial) state to disk, it needs a schema that includes both the group-by columns and partial-aggregator columns (e.g., partial sums, counts, etc.). Previously, the code used the original input schema for spilling, which does not match the additional columns representing aggregator states. As a result, reading back the spilled data caused a mismatch error:

ArrowError(InvalidArgumentError(
  "number of columns(3) must match number of fields(2) in schema"
))

This PR addresses that by introducing a partial aggregation schema that combines group columns and aggregator state columns, ensuring consistency when spilling and later reading the spilled data.

What changes are included in this PR?

A new helper function, build_partial_agg_schema(), creates a partial schema by merging:

Group-by fields
Each aggregator’s internal “state fields”

The aggregate operator is updated to use this partial schema when spilling or merging spilled data rather than the original (input) schema, which fixes the column mismatch error.

Are these changes tested?

Yes

Are there any user-facing changes?

No

…aggregate schema

… GroupedHashAggregateStream

2010YOUY01

Thank you. I found the fix easy to follow 😄, and the change makes sense to me.

I have a suggestion to improve test coverage:
Since min/max only has one intermediate aggregate state (partial min/max), we should also test aggregate functions that produce more than one intermediate state, like avg (partial sum and count).
Duplicating the existing test and modifying one of the aggregate functions to avg should be sufficient.

2010YOUY01 · 2025-01-04T16:06:25Z

datafusion/core/src/dataframe/mod.rs

+
+        let result =
+            common::collect(single_aggregate.execute(0, Arc::clone(&task_ctx))?).await?;
+


I suggest to add an assertion here to make sure spilling actually happened for certain test cases. Like:

let metrics = single_aggregate.metrics(); // ...and assert some metrics inside like 'spill count' is > 0

Thanks @2010YOUY01 for the review and suggestions.
I have implemented both.

korowa · 2025-01-06T10:01:19Z

datafusion/core/src/dataframe/mod.rs

@@ -2743,6 +2754,143 @@ mod tests {
        Ok(())
    }

+    // test for https://github.com/apache/datafusion/issues/13949
+    async fn run_test_with_spill_pool_if_necessary(


I suppose it'll be better to move this test to other aggregate tests in datafusion/physical-plan/src/mod.rs

hi @korowa ,

ie move to datafusion/physical-plan/src/aggregates/mod.rs, am I correct?

My bad, yes, I meant aggregates/mod.rs

korowa · 2025-01-06T10:08:51Z

datafusion/physical-plan/src/aggregates/row_hash.rs

@@ -522,7 +527,7 @@ impl GroupedHashAggregateStream {
        let spill_state = SpillState {
            spills: vec![],
            spill_expr,
-            spill_schema: Arc::clone(&agg_schema),
+            spill_schema: partial_agg_schema,


It seems like the issue was related only to AggregateMode::Single[Partitioned] cases, since for both Final and FinalPartitioned, there is a reassignment right before spilling (the new value is a schema for Partial output which is exactly group_by + state fields). Perhaps we can remove this reassignment now and rely on original spill_schema value set on stream creation (before removing it, we need to ensure that spill schema will be equal to intermediate result schema for any aggregation mode which supports spilling)?

hi @korowa ,

remove this reassignment now

In other words, remove these lines, am I correct?

datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs

Lines 967 to 969 in 487b952

// Use input batch (Partial mode) schema for spilling because

// the spilled data will be merged and re-evaluated later.

self.spill_state.spill_schema = batch.schema();

Yes, this line seems to be redundant now -- I'd expect all aggregation modes to have the same spill schema (which is set by this PR), so it shouldn't depend on stream input anymore.

Thanks for confirming.
The lines are removed.

korowa · 2025-01-06T10:14:25Z

datafusion/physical-plan/src/aggregates/row_hash.rs

+/// This helper function constructs such a schema:
+/// `[group_col_1, group_col_2, ..., state_col_1, state_col_2, ...]`
+/// so that partial aggregation data can be handled consistently.
+fn build_partial_agg_schema(


Perhaps instead of the new helper we could reuse aggregates::create_schema?

I checked create_schema and it handles aggregates like MIN, MAX well but it does not handle AVG which has multiple intermediate states (partial sum, partial count).

If I'm not mistaken, it should for mode = AggregateMode::Partial -- for this case it also returns state_fields instead of result field

Aaa..... 🤔
Thanks for the pointer. It does work.

…ction

…aggregates

korowa

LGTM, thank you @kosiew @2010YOUY01

Going to merge it tomorrow, in case anyone else would like to review it.

korowa · 2025-01-07T17:51:03Z

datafusion/core/src/dataframe/mod.rs

@@ -43,6 +38,10 @@ use crate::physical_plan::{
    ExecutionPlan, SendableRecordBatchStream,
 };
 use crate::prelude::SessionContext;
+use std::any::Any;


minor: this import reordering can be reverted to leave the file unmodified

alamb · 2025-01-08T14:16:01Z

❤️

Friede80 · 2025-01-08T15:23:01Z

Thanks for the rapid fix, @kosiew!

Refactor spill handling in GroupedHashAggregateStream to use partial …

da2b11a

…aggregate schema

github-actions bot added the physical-expr Physical Expressions label Jan 3, 2025

kosiew added 2 commits January 3, 2025 14:12

Implement aggregate functions with spill handling in tests

01d2b60

Merge branch 'main' into fix-spill

e094adb

github-actions bot added the core Core DataFusion crate label Jan 3, 2025

Add tests for aggregate functions with and without spill handling

d066aff

kosiew marked this pull request as ready for review January 3, 2025 06:43

kosiew changed the title ~~Refactor spill handling in GroupedHashAggregateStream to use partial …~~ Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream Jan 3, 2025

kosiew added 4 commits January 3, 2025 16:32

Move test related imports into mod test

04d9123

Rename spill pool test functions for clarity and consistency

242f5ab

Refactor aggregate function imports to use fully qualified paths

270efd7

Remove outdated comments regarding input batch schema for spilling in…

38ade08

… GroupedHashAggregateStream

2010YOUY01 reviewed Jan 4, 2025

View reviewed changes

kosiew added 2 commits January 6, 2025 09:45

Update aggregate test to use AVG instead of MAX

f4fedea

assert spill count

9d6f405

korowa reviewed Jan 6, 2025

View reviewed changes

kosiew added 5 commits January 6, 2025 20:14

Refactor partial aggregate schema creation to use create_schema function

5471775

Refactor partial aggregation schema creation and remove redundant fun…

b682e8c

…ction

Remove unused import of Schema from arrow::datatypes in row_hash.rs

8a00829

move spill pool testing for aggregate functions to physical-plan/src/…

4e312e1

…aggregates

Use Arc::clone for schema references in aggregate functions

f521846

korowa approved these changes Jan 7, 2025

View reviewed changes

korowa reviewed Jan 7, 2025

View reviewed changes

2010YOUY01 approved these changes Jan 8, 2025

View reviewed changes

korowa merged commit 81b50c4 into apache:main Jan 8, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream #13995

Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream #13995

kosiew commented Jan 3, 2025

2010YOUY01 left a comment

2010YOUY01 Jan 4, 2025 •

edited

Loading

kosiew Jan 6, 2025

korowa Jan 6, 2025

kosiew Jan 7, 2025 •

edited

Loading

korowa Jan 7, 2025

korowa Jan 6, 2025

kosiew Jan 6, 2025 •

edited

Loading

korowa Jan 6, 2025

kosiew Jan 7, 2025

korowa Jan 6, 2025

kosiew Jan 6, 2025

korowa Jan 6, 2025

kosiew Jan 7, 2025

korowa left a comment •

edited

Loading

korowa Jan 7, 2025

alamb commented Jan 8, 2025

Friede80 commented Jan 8, 2025


		let result =
		common::collect(single_aggregate.execute(0, Arc::clone(&task_ctx))?).await?;

	// Use input batch (Partial mode) schema for spilling because
	// the spilled data will be merged and re-evaluated later.
	self.spill_state.spill_schema = batch.schema();

Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream #13995

Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream #13995

Conversation

kosiew commented Jan 3, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

2010YOUY01 left a comment

Choose a reason for hiding this comment

2010YOUY01 Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kosiew Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kosiew Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

korowa left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jan 8, 2025

Friede80 commented Jan 8, 2025

2010YOUY01 Jan 4, 2025 •

edited

Loading

kosiew Jan 7, 2025 •

edited

Loading

kosiew Jan 6, 2025 •

edited

Loading

korowa left a comment •

edited

Loading