-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Uses pd.NA in IntegerArray #29964
API: Uses pd.NA in IntegerArray #29964
Conversation
pandas/core/arrays/integer.py
Outdated
@@ -377,14 +377,28 @@ def __getitem__(self, item): | |||
return self._data[item] | |||
return type(self)(self._data[item], self._mask[item]) | |||
|
|||
def _coerce_to_ndarray(self): | |||
def _coerce_to_ndarray(self, dtype=None, na_value=libmissing.NA): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to make a to_array
that's basically this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have to_numeric
which is the canonical form of to_array
(rather do conversions there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jeff, the discussion about to_numpy
(the to_array
was a typo I think) moved to #30038 in the mean time. Can you move your comment there if relevant?
Note that to_numeric
is a function that converts any thing to a numeric type. While this function here is to convert a numeric type (this IntegerArray) to any other numpy dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to_array
should have been to_numpy
.
Still some failing tests. Will have to put this on hold for a day or two now. |
I'm not able to reproduce these locally. |
What's the status of this? This is ready to go? (with the special cases for pow etc already merged in other PRs) |
@@ -643,25 +665,30 @@ def cmp_method(self, other): | |||
if len(self) != len(other): | |||
raise ValueError("Lengths must match to compare") | |||
|
|||
# numpy will show a DeprecationWarning on invalid elementwise | |||
# comparisons, this will raise in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous question about this. Is this comment no longer relevant or correct? Or why was it removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, do you know how this is actually hit? If NumPy is going to raise in the future, shouldn't they be seeing that warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is about the warning you get with comparisons with objects / non-broadcastable arrays. Eg:
In [29]: np.array([1, 2]) == "b"
/home/joris/miniconda3/envs/dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
#!/home/joris/miniconda3/envs/dev/bin/python
Out[29]: False
In [30]: pd.array([1, 2]) == "b"
Out[30]: array([False, False])
(it seems IntegerArray already handles this fine, not sure there is a explicit test for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems IntegerArray already handles this fine,
Gotch. It's silencing the same warning from NumPy, and falling back to invalid_comparison
, which returns the expected result. I'll restore the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... the comment is incorrect. NumPy will perform elementwise comparison in the future, not raise. If they were to raise on that in the future the implementation would be incorrect.
Though I'm still a bit confused, as the NumPy op is returning NotImplemented since we're calling it directly. Will that continue to return NotImplemented? Or will the elementwise result be different?
I think it's ready. All the |
I am planning to merge this in a bit. I am not sure if @jreback's review was a full one, but the comments have been addressed/answered. But it would be good to have this in master for IntegerArray as well, and this is blocking some follow-ups (completing |
let me have a look a n hour or 2 |
@jreback did you still want to / have time to take a look? |
let me look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. i think need a big whatsnew sub-section that explains show the implications of this change.
- getitem is now NA
- comparisions now return BooleanArray
@@ -39,6 +41,12 @@ NumPy's ``'int64'`` dtype: | |||
|
|||
pd.array([1, 2, np.nan], dtype="Int64") | |||
|
|||
All NA-like values are replaced with :attr:`pandas.NA`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may want to add a versionchanged tag here (and below)
Thanks, added whatsnew for those changes. |
All green. Anything else @jreback or were the two in #29964 (review) it? |
thanks @TomAugspurger |
Just a WIP for now. I haven't closely reviewed this since pd.NA was merged into master.
cc @jorisvandenbossche
Closes #15375