-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514
BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setitem
change / bug fix is unrelated to the constructor fix? Or it's because you are testing that through setitem as well?
pandas/core/arrays/floating.py
Outdated
mask2 = isna(values) | ||
if not (mask == mask2).all(): | ||
# e.g. if we have a timedelta64("NaT") | ||
raise TypeError(f"{values.dtype} cannot be converted to a FloatingDtype") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could it be libmissing.is_numeric_na
that already raises on encountering a "non-numeric NA"? (is there a use case for is_numeric_na to not be strict about this, i.e. to get a "partial" mask?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you planning to address this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planning to address in a follow-up
The setitem bug was identified first and the cause tracked back to the constructor. |
ok to merge, @jorisvandenbossche comments could be afollow up (or here is ok too) |
Let’s do as follow up, these constructors will need plenty more work.
…On Sun, Nov 21, 2021 at 3:40 PM Jeff Reback ***@***.***> wrote:
ok to merge, @jorisvandenbossche <https://github.com/jorisvandenbossche>
comments could be afollow up (or here is ok too)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#44514 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5UM6HWBAVF7UB36P6IPDTUNF7O3ANCNFSM5IKC63JQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@@ -357,6 +364,31 @@ def test_setitem_series(self, data, full_indexer): | |||
) | |||
self.assert_series_equal(result, expected) | |||
|
|||
def test_setitem_frame_2d_values(self, data, using_array_manager, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this test out of the extension
base tests, or remove the need to use using_array_manager
? (this is not defined by external users of those tests, and would be a bit annoying to replicate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
gentle ping; this is a blocker for fixing a bug in Series.where, which in turn should allow us to share some more Block methods. |
df = pd.DataFrame({"A": data}) | ||
|
||
# Avoiding using_array_manager fixture | ||
# https://github.com/pandas-dev/pandas/pull/44514#discussion_r754002410 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for changing to not use the fixture. But generally, is this actually needed to have as base extension test? (since the fix was inside the Blocks code, it's not really testing a specific behaviour that the EA needs to have?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually needed to have as base extension test?
This seems like the best way to systematically test it for all EA dtypes.
it's not really testing a specific behaviour that the EA needs to have?
That seems like it applies to most tests that aren't directly testing EA methods.
Two questions, but feel free to merge as well |
updated to raise inside is_numeric_na |
@jorisvandenbossche if you want to look or can merge |
i think comments have all been addressed here? bugfix follow-up is ready. |
This seems to give a big slowdown in some benchmarks (eg 10x in https://pandas.pydata.org/speed/pandas/#groupby.Cumulative.time_frame_transform?python=3.8&Cython=0.29.24&p-dtype='Float64'&p-method='cumsum'&commits=f5107e41-12afff15). Can you take a look? |
Yah, looks like a lot of time is being taken in is_numeric_na.
Looks like we are calling is_numeric_na in cases where we have float64 dtype, so can just use np.isnan. Should be an easy patch. |
Same yak-shaving as #44495 (which turned out to be a dead end for this particular yak, but still a perf bump)