-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix pyarrow len
aggregation behaviour for depth-1 exprs
#1589
fix: fix pyarrow len
aggregation behaviour for depth-1 exprs
#1589
Conversation
@pytest.mark.parametrize( | ||
("expr", "expected"), | ||
[ | ||
(nw.col("b").sum(), {"a": [1, 2], "b": [3, 3]}), | ||
(nw.col("b").mean(), {"a": [1, 2], "b": [1.5, 3]}), | ||
(nw.col("b").max(), {"a": [1, 2], "b": [2, 3]}), | ||
(nw.col("b").min(), {"a": [1, 2], "b": [1, 3]}), | ||
(nw.col("b").std(), {"a": [1, 2], "b": [0.707107, None]}), | ||
(nw.col("b").len(), {"a": [1, 2], "b": [3, 1]}), | ||
(nw.col("b").n_unique(), {"a": [1, 2], "b": [3, 1]}), | ||
(nw.col("b").count(), {"a": [1, 2], "b": [2, 1]}), | ||
], | ||
) | ||
def test_group_by_depth_1_agg( | ||
constructor: Constructor, | ||
expr: nw.Expr, | ||
expected: dict[str, list[int | float]], | ||
) -> None: | ||
data = {"a": [1, 1, 1, 2], "b": [1, None, 2, 3]} | ||
result = nw.from_native(constructor(data)).group_by("a").agg(expr).sort("a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also median
is failing here for pyarrow π€, reason why I've left it out and maintained the original test for it. Somehow, though, we're pointing it out in docs that median's results might slightly differ across backends due to differences in the underlying algorithms...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a tonne for spotting and doing this, just got a comment
narwhals/_arrow/group_by.py
Outdated
function_name = POLARS_TO_ARROW_AGGREGATIONS.get(function_name, function_name) | ||
|
||
option = get_function_name_option(function_name) | ||
if expr._function_name != "col->len": | ||
option = get_function_name_option(function_name) | ||
else: | ||
option = pc.CountOptions(mode="all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be cleaner to have
function_name, option = POLARS_TO_ARROW_AGGREGATIONS[function_name]
, and map to all the options in POLARS_TO_ARROW_AGGREGATIONS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Adds a specific condition for dealing with pyarrow depth-1
len
expressions in group-by contexts.