fix: scalar reductions on empty inputs #1715
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
When performing a scalar reduction on an empty DataFrame, Polars, pandas, and PyArrow generated disparate result sets with default options. For example, when computing the sum of an empty column (via Narwhals)
pandas & PyArrow backends now produce output that is consistent with Polars.
Furthermore, there is a difference in the result set when using
.select
vs.with_columns
where the former may return a value even if the the input was empty whereas the latter will return an empty DataFrame since its input was empty. Both Polars and PyArrow backends exhibited this behavior so pandas does the same as well now.See the following example for how the behavior of
.sum
was changed.Old behavior
New behavior
select
operations will produce a consistent scalar output even if the input was emptyall
βTrue
any
βFalse
sum
β0
max
βNull
orNaN
if non-nullablemin
βNull
orNaN
if non-nullablemean
βNull
orNaN
if non-nullablewith_columns
returns empty if the input was empty