Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41098: [Python] Add copy keyword in Array.__array__ for numpy 2.0+ compatibility #41071

Merged
merged 8 commits into from
Apr 15, 2024
3 changes: 2 additions & 1 deletion python/pyarrow/array.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -1543,7 +1543,8 @@ cdef class Array(_PandasConvertible):
def _to_pandas(self, options, types_mapper=None, **kwargs):
return _array_like_to_pandas(self, options, types_mapper=types_mapper)

def __array__(self, dtype=None):
def __array__(self, dtype=None, copy=None):
# TODO honor the copy keyword
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to open an issue for that

values = self.to_numpy(zero_copy_only=False)
if dtype is None:
return values
Expand Down
6 changes: 4 additions & 2 deletions python/pyarrow/table.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -525,7 +525,8 @@ cdef class ChunkedArray(_PandasConvertible):

return values

def __array__(self, dtype=None):
def __array__(self, dtype=None, copy=None):
# copy keyword can be ignored because to_numpy() already returns a copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, copy=False should then raise an error, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that comment is from before I decided to handle the copy=False case for Array. Indeed we should just raise an error here that a no-copy is not possible.

Updated.

values = self.to_numpy()
if dtype is None:
return values
Expand Down Expand Up @@ -1533,7 +1534,8 @@ cdef class _Tabular(_PandasConvertible):
raise TypeError(f"Do not call {self.__class__.__name__}'s constructor directly, use "
f"one of the `{self.__class__.__name__}.from_*` functions instead.")

def __array__(self, dtype=None):
def __array__(self, dtype=None, copy=None):
# copy keyword can be ignored as this always already returns a copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?

column_arrays = [
np.asarray(self.column(i), dtype=dtype) for i in range(self.num_columns)
]
Expand Down
12 changes: 12 additions & 0 deletions python/pyarrow/tests/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -3301,6 +3301,18 @@ def test_array_from_large_pyints():
pa.array([int(2 ** 63)])


def test_numpy_array_protocol():
# test the __array__ method on pyarrow.Array
arr = pa.array([1, 2, 3])
result = np.asarray(arr)
expected = np.array([1, 2, 3], dtype="int64")
np.testing.assert_array_equal(result, expected)

# this should not raise a deprecation warning with numpy 2.0+
result = np.asarray(arr, copy=False)
np.testing.assert_array_equal(result, expected)


def test_array_protocol():

class MyArray:
Expand Down
Loading