Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch: pandas-like and pyarrow scalar reduction fix #716

Merged
merged 10 commits into from
Aug 5, 2024

Conversation

FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Compared to dask, this is easier to fix as we know series length due to eagerness

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

narwhals/_arrow/utils.py Outdated Show resolved Hide resolved
lengths = [len(s) for s in series]
max_length = max(lengths)

idx = series[lengths.index(max_length)]._native_series.index
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first index that has max length, hence the left-most non scalar index will still guide the resulting index

expected = {"min": [1, 1, 1], "max": [6, 6, 6], "a": [2, 2, 2], "b": [5, 5, 5]}
compare_dicts(result, expected)

df = nw.from_native(constructor(data))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am recreating the dataframe each time because modin was resulting is a weird weird bug, or at least I think so.

I will try to re-factor later on

@FBruzzesi FBruzzesi added the fix label Aug 4, 2024
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one! thanks for noticing plus fixing

narwhals/_arrow/utils.py Outdated Show resolved Hide resolved
narwhals/_pandas_like/utils.py Outdated Show resolved Hide resolved
narwhals/_pandas_like/utils.py Outdated Show resolved Hide resolved
narwhals/_arrow/utils.py Outdated Show resolved Hide resolved
Comment on lines 280 to 282
pa.array(
np.full(shape=max_length, fill_value=value, dtype=np_dtype),
type=pa_dtype,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pa.array directly because of pyarrow-numpy integration docs

def test_scalar_reduction_select(
request: Any, constructor: Any, expr: list[Any], expected: dict[str, list[Any]]
) -> None:
if "dask" in str(constructor) and int(request.node.callspec.id[-1]) != 1:
Copy link
Member Author

@FBruzzesi FBruzzesi Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't panic (just yet): this friendly request.node.callspec.id will look like the following: <constructor_name>-<id>. The id is specified in @pytest.mark.parametrize and ranges from 0 to 4 (number of test cases).

As dask passes the second test which corresponds to request.node.callspec.id = "dask_lazy_constructor-1", I am extracting the id value

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one, thanks @FBruzzesi !

Comment on lines +270 to +271
if hasattr(value, "as_py"): # pragma: no cover
value = value.as_py()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need doing everywhere or just for self._backend_version < (13,)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check, numpy was certainly failing to create an array with a value of arrow scalar

@MarcoGorelli MarcoGorelli merged commit b0625fe into main Aug 5, 2024
23 checks passed
@FBruzzesi FBruzzesi deleted the patch/pandas-pyarrow-scalar-reduction branch August 5, 2024 20:55
aivanoved pushed a commit to aivanoved/narwhals that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants