-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add ExtensionArray.to_numpy to have control over conversion to numpy array #30322
Changes from 4 commits
a3dab78
8e669fc
fd1c04a
92f14d2
bf9592c
278447d
1da45bd
3889986
6832f42
e8460ac
f1e34c6
1624712
34307ca
2b43793
afc7350
008b54f
91a4639
fff20a0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -780,7 +780,7 @@ def array(self) -> ExtensionArray: | |
|
||
return result | ||
|
||
def to_numpy(self, dtype=None, copy=False): | ||
def to_numpy(self, dtype=None, copy=False, **kwargs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we instead add na_value here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is changing the signature of all EA, but are you adding tests? (e.g. StringArray / IntArray), or as followup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Note this is not changing the signature of the EA method (this is the Series/Index method) I think in general it might be useful to pass through kwargs, in that way ExtensionArray authors can have more specific control over the conversion to numpy (if we do this, we should add a test for it for one of the test EAs, like DecimalArray, with an additional keyword in the Specifically for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right, actually meant that. yeah I would be explict here I think. the question is can you? e.g. you want the default to be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's what makes it more complicated here. It's really a EA-specific keyword, and the general solution for that is passing through keywords (as done here), I think. Now, since the other dtypes don't yet implement this, the default of If we add it here explicitly, I would probably use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this would be the better long term soln ,yes? |
||
""" | ||
A NumPy ndarray representing the values in this Series or Index. | ||
|
||
|
@@ -795,6 +795,11 @@ def to_numpy(self, dtype=None, copy=False): | |
another array. Note that ``copy=False`` does not *ensure* that | ||
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that | ||
a copy is made, even if not strictly necessary. | ||
**kwargs | ||
Additional keywords passed through to the ``to_numpy`` method | ||
of the underlying array (for extension arrays). | ||
|
||
.. versionadded:: 1.0.0 | ||
|
||
Returns | ||
------- | ||
|
@@ -864,6 +869,14 @@ def to_numpy(self, dtype=None, copy=False): | |
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'], | ||
dtype='datetime64[ns]') | ||
""" | ||
if is_extension_array_dtype(self.dtype) and hasattr(self.array, "to_numpy"): | ||
return self.array.to_numpy(dtype, copy=copy, **kwargs) | ||
else: | ||
if kwargs: | ||
msg = "to_numpy() got an unexpected keyword argument '{}'".format( | ||
list(kwargs.keys())[0] | ||
) | ||
raise TypeError(msg) | ||
if is_datetime64tz_dtype(self.dtype) and dtype is None: | ||
# note: this is going to change very soon. | ||
# I have a WIP PR making this unnecessary, but it's | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give some guidance on when no-copy is possible? Is it only when there are no missing values and we're going to the numpy dtype (bool in this case)?
And thinking forward, can a pyarrow array with no NAs be converted to an ndarray without any copies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is only possible with no NAs and bool dtype. I first thought int8 would also be possible, but numpy doesn't seem to do such conversion without copy.
For boolean not (since it is bits, not bytes), but in general (eg for IntegerArray without nulls) yes