-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 #47393
Conversation
mroeschke
commented
Jun 16, 2022
•
edited
Loading
edited
- closes CI: nighlty numpy broke ci #47391 (Replace xxxx with the Github issue number)
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
Most of the remaining ones look like things we want to change anyway? Saw one test that was not supposed to raise a FutureWarning |
Correct, the numpy The additional |
pandas/core/dtypes/cast.py
Outdated
if is_integer_dtype(dtype) and isna(value): | ||
if not length: | ||
# GH 47391: numpy > 1.24 will raise filling np.nan into int dtypes | ||
return np.array([], dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a weird case. how does it happen? is it clear that we'd want to prioritize the dtype as being "right" instead of the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted an example of the state that's reached in numpy/numpy#21784
Namely before this change, length=0
would pass all the if checks down to np.empty(0, dtype=integer).fill(np.nan)
which in numpy < 1.24 would just return np.array([], dtype=integer)
but in numpy >=1.24 will raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question (also for me from NumPy!) is which pandas code runs into this path. For pandas, the question is what the actual result should be (in the future). For me the question is how bad it will be if that code path breaks. Because especially if it is bad, we may want to make sure it doesn't break yet (from within NumPy).
(At this point I suspect that at least the NaN case may need a work-around in NumPy as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an example test where this is hit is
s = Series([], index=pd.date_range(start="2018-01-01", periods=0), dtype=int)
result = s.apply(lambda x: x)
tm.assert_series_equal(result, s)
so I think when operating over these empty Series/DataFrames, the value representation is np.nan
when construct_1d_arraylike_from_scalar(np.nan, length=0, dtype=dtype)
is called.
construct_1d_arraylike_from_scalar
has a if length and is_integer_dtype(dtype) and isna(value)
condition to ensure that integer dtypes were not coerced to float64 (because we relied on np.empty(0, dtype=integer).fill(np.nan) == np.array([], dtype=integer)
) for these empty Series/DataFrames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for these empty-like cases in pandas, it appears pandas was relying on np.empty(0, dtype=integer).full(np.nan)
to preserve integer dtypes. Having pandas explicitly preserve the dtype for these empty cases e.g. np.array([], dtype=integer)
is an okay change to make on our end IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm. (but why not keep this bit unchanged and just put the subarr.fill(value)
on L1715 inside a if length:
to skip for all zero-length arrays?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sure I can make the change there instead
this is failing |
"are not equal to their int representation.", | ||
UserWarning, | ||
) | ||
# GH 47391 numpy > 1.24 will raise a RuntimeError for nan -> int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 1.5 we ought to actually remove the nans first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in 1.5. add a deprecation noting that nans will be dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i mean i think u can remove the nans before comparing to avoid the warning (this is all internal anyhow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. Yeah can clean this for 1.5 in a separate PR
pandas/core/dtypes/cast.py
Outdated
@@ -1696,7 +1696,7 @@ def construct_1d_arraylike_from_scalar( | |||
|
|||
else: | |||
|
|||
if length and is_integer_dtype(dtype) and isna(value): | |||
if is_integer_dtype(dtype) and isna(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think still need the length here as this part of the code is this logic to determine the the dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. revert to original.
will merge later today if no objections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…uct_1d_arraylike_from_scalar cast to float64
Thanks @mroeschke |
…construct_1d_arraylike_from_scalar cast to float64) (#47460) * Backport PR #47393: CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 Co-authored-by: Matthew Roeschke <[email protected]> Co-authored-by: Simon Hawkins <[email protected]>