patch: pandas-like and pyarrow scalar reduction fix #716

FBruzzesi · 2024-08-04T15:40:22Z

What type of PR is this? (check all applicable)

Related issues

Related issue [Bug]: DaskExpr reduction to scalar can't be select-ed #694, as mentioned in comment there are cases in which the issue in 694 doesn't match polars behavior for pandas and pyarrow as well.

Compared to dask, this is easier to fix as we know series length due to eagerness

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

narwhals/_arrow/utils.py

FBruzzesi · 2024-08-04T16:09:29Z

narwhals/_pandas_like/utils.py

+    lengths = [len(s) for s in series]
+    max_length = max(lengths)
+
+    idx = series[lengths.index(max_length)]._native_series.index


This is the first index that has max length, hence the left-most non scalar index will still guide the resulting index

FBruzzesi · 2024-08-04T16:10:40Z

tests/expr_and_series/reduction_test.py

+    expected = {"min": [1, 1, 1], "max": [6, 6, 6], "a": [2, 2, 2], "b": [5, 5, 5]}
+    compare_dicts(result, expected)
+
+    df = nw.from_native(constructor(data))


I am recreating the dataframe each time because modin was resulting is a weird weird bug, or at least I think so.

I will try to re-factor later on

MarcoGorelli

nice one! thanks for noticing plus fixing

narwhals/_arrow/utils.py

narwhals/_pandas_like/utils.py

narwhals/_arrow/utils.py

FBruzzesi · 2024-08-05T17:46:38Z

narwhals/_arrow/utils.py

+                pa.array(
+                    np.full(shape=max_length, fill_value=value, dtype=np_dtype),
+                    type=pa_dtype,


Using pa.array directly because of pyarrow-numpy integration docs

FBruzzesi · 2024-08-05T18:10:49Z

tests/expr_and_series/reduction_test.py

+def test_scalar_reduction_select(
+    request: Any, constructor: Any, expr: list[Any], expected: dict[str, list[Any]]
+) -> None:
+    if "dask" in str(constructor) and int(request.node.callspec.id[-1]) != 1:


Don't panic (just yet): this friendly request.node.callspec.id will look like the following: <constructor_name>-<id>. The id is specified in @pytest.mark.parametrize and ranges from 0 to 4 (number of test cases).

As dask passes the second test which corresponds to request.node.callspec.id = "dask_lazy_constructor-1", I am extracting the id value

MarcoGorelli

nice one, thanks @FBruzzesi !

MarcoGorelli · 2024-08-05T20:44:12Z

narwhals/_arrow/utils.py

+            if hasattr(value, "as_py"):  # pragma: no cover
+                value = value.as_py()


does this need doing everywhere or just for self._backend_version < (13,)?

Need to check, numpy was certainly failing to create an array with a value of arrow scalar

FBruzzesi added 4 commits August 4, 2024 17:30

patch: scalar reduction select for pandas-like and pyarrow

d4eaf63

patch: scalar reduction select for pandas-like and pyarrow

1522cdb

increadibly weird bug for moding?

e7e0b56

different implementation to find index

be86ef3

FBruzzesi commented Aug 4, 2024

View reviewed changes

narwhals/_arrow/utils.py Outdated Show resolved Hide resolved

FBruzzesi commented Aug 4, 2024

View reviewed changes

FBruzzesi added the fix label Aug 4, 2024

FBruzzesi added 2 commits August 5, 2024 16:14

Merge branch 'main' into patch/pandas-pyarrow-scalar-reduction

9a0ee91

WIP

1445257

MarcoGorelli reviewed Aug 5, 2024

View reviewed changes

narwhals/_arrow/utils.py Outdated Show resolved Hide resolved

narwhals/_pandas_like/utils.py Outdated Show resolved Hide resolved

narwhals/_pandas_like/utils.py Outdated Show resolved Hide resolved

arrow to numpy dtype

cde6048

FBruzzesi commented Aug 5, 2024

View reviewed changes

narwhals/_arrow/utils.py Outdated Show resolved Hide resolved

FBruzzesi commented Aug 5, 2024

View reviewed changes

FBruzzesi added 2 commits August 5, 2024 19:49

series item -> iloc

bcdb720

test refactor

7220f88

FBruzzesi commented Aug 5, 2024

View reviewed changes

rollback arrow broadcast

f660e7f

MarcoGorelli approved these changes Aug 5, 2024

View reviewed changes

MarcoGorelli merged commit b0625fe into main Aug 5, 2024
23 checks passed

FBruzzesi deleted the patch/pandas-pyarrow-scalar-reduction branch August 5, 2024 20:55

aivanoved pushed a commit to aivanoved/narwhals that referenced this pull request Aug 6, 2024

patch: pandas-like and pyarrow scalar reduction fix (narwhals-dev#716)

cbb1165

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patch: pandas-like and pyarrow scalar reduction fix #716

patch: pandas-like and pyarrow scalar reduction fix #716

FBruzzesi commented Aug 4, 2024

FBruzzesi Aug 4, 2024

FBruzzesi Aug 4, 2024

MarcoGorelli left a comment

FBruzzesi Aug 5, 2024

FBruzzesi Aug 5, 2024 •

edited

Loading

MarcoGorelli left a comment

MarcoGorelli Aug 5, 2024

FBruzzesi Aug 5, 2024

		if hasattr(value, "as_py"): # pragma: no cover
		value = value.as_py()

patch: pandas-like and pyarrow scalar reduction fix #716

patch: pandas-like and pyarrow scalar reduction fix #716

Conversation

FBruzzesi commented Aug 4, 2024

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

FBruzzesi Aug 4, 2024

Choose a reason for hiding this comment

FBruzzesi Aug 4, 2024

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

FBruzzesi Aug 5, 2024

Choose a reason for hiding this comment

FBruzzesi Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli Aug 5, 2024

Choose a reason for hiding this comment

FBruzzesi Aug 5, 2024

Choose a reason for hiding this comment

FBruzzesi Aug 5, 2024 •

edited

Loading