Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fillna on f32 column raising for f64 #43455

Merged
merged 6 commits into from
Sep 9, 2021

Conversation

mzeitlin11
Copy link
Member

@mzeitlin11 mzeitlin11 commented Sep 8, 2021

For now have put whatsnew in 1.3.3, but might be better for 1.4 since regression is later (worked in 1.1.5) and the can_hold_element change causes subtle indexing behavior differences (but that are now more consistent between int and float)

@mzeitlin11 mzeitlin11 added Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version labels Sep 8, 2021
if tipo.kind not in ["f", "i", "u"]:
# Anything other than float/integer we cannot hold
return False
elif dtype.itemsize < tipo.itemsize:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the itemsize only be compared if dtype.kind and tipo.kind are the same? i.e. tipo.kind is also 'f'. How is the itemsize of an int or uint relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. I think it ultimately comes down to -> if can_hold_element holds True, then using np.putmask is assumed to be safe. So itemsize of int or uint matters because numpy allows safe casting of i16 to f32 for example, but not i64 to f32.

But I think the itemsize check is not sufficient here, because numpy won't putmask int32 infloat32. Maybe np.can_cast should be used instead?

@jreback
Copy link
Contributor

jreback commented Sep 8, 2021

does this break any current tests?

@mzeitlin11
Copy link
Member Author

does this break any current tests?

Think the failures are flaky

@mzeitlin11 mzeitlin11 marked this pull request as draft September 8, 2021 18:47
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a tests for the complex case? (e.g. you are closing the issue).

this looks ok to backport

cc @jbrockmendel

@mzeitlin11 mzeitlin11 marked this pull request as ready for review September 8, 2021 22:07
@mzeitlin11
Copy link
Member Author

mzeitlin11 commented Sep 8, 2021

Have changed the approach to be less invasive, so backport I think safer now.

I think the can_hold_element changes were in the right direction, but was unsure of the exact contract can_hold_element is meant to uphold and what subtle changes might be caused by what I changed.

EDIT: more reasoning for changing approach was not being able satisfactorily answer #43455 (comment) and other failing test cases I discovered that weren't handled by just the itemsize patch. The test now covers all numpy int/floating types

@jreback jreback added this to the 1.3.3 milestone Sep 9, 2021
@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

lgtm. cc @jbrockmendel @phofl

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback jreback merged commit 0e92697 into pandas-dev:master Sep 9, 2021
@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

thanks @mzeitlin11

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Sep 9, 2021

Something went wrong ... Please have a look at my logs.

@mzeitlin11 mzeitlin11 deleted the fillna_regr branch September 9, 2021 12:55
phofl pushed a commit that referenced this pull request Sep 9, 2021
…g for f64) (#43474)

* Backport PR #43455: REGR: fillna on f32 column raising for f64

* Update pandas/tests/series/methods/test_fillna.py

Co-authored-by: Matthew Zeitlin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.fillna raising with float32 dtype when using value arg
4 participants