Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.any() and .all() don't return bool values if dtype=object #30416

Closed
wants to merge 1 commit into from

Conversation

ShaharNaveh
Copy link
Member

@ShaharNaveh ShaharNaveh commented Dec 22, 2019

pandas/core/nanops.py Outdated Show resolved Hide resolved
@alimcmaster1 alimcmaster1 added Dtype Conversions Unexpected or buggy dtype conversions Compat pandas objects compatability with Numpy or Python functions labels Dec 23, 2019
@ShaharNaveh ShaharNaveh force-pushed the COMPAT-fix-12863 branch 2 times, most recently from f9037cf to 876b763 Compare December 23, 2019 12:47
@pep8speaks
Copy link

pep8speaks commented Dec 23, 2019

Hello @MomIsBestFriend! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-12 16:37:00 UTC

@ShaharNaveh ShaharNaveh force-pushed the COMPAT-fix-12863 branch 3 times, most recently from 78da081 to 48f08c0 Compare December 23, 2019 13:14
@ShaharNaveh ShaharNaveh marked this pull request as ready for review December 23, 2019 13:47
@jreback jreback changed the title BUG: Fix for #12863 Series.any() and .all() don't return bool values if dtype=object Dec 23, 2019
@jreback jreback changed the title Series.any() and .all() don't return bool values if dtype=object BUG: Series.any() and .all() don't return bool values if dtype=object Dec 23, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you add a whatsnew note. i think this deserves a small subsection (even though its a bug fix), you can put in api-breaking changes, just show the before and after (model after some existing changes like this)

pandas/core/nanops.py Show resolved Hide resolved
@jreback jreback added this to the 1.0 milestone Dec 23, 2019
@ShaharNaveh ShaharNaveh force-pushed the COMPAT-fix-12863 branch 2 times, most recently from 35391e9 to 650c906 Compare December 24, 2019 21:47
@ShaharNaveh
Copy link
Member Author

ShaharNaveh commented Dec 24, 2019

The only implementation I can think of right now, is to create a function to iterate over the values and returns a bool if it the any/all the values elevated to to True.

tl;dr
Implement any and all methods.

# GH #12863
# Checking if the `axis` is None because numpy
# doesn't handle ``any`` and ``all`` on
# object arrays correclty. see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correclty -> correctly. small preference for this to go inside the "if", not sure how widely shared that is

assert s1.all(skipna=True)
assert np.isnan(s2.any(skipna=False)) # nan || False => nan
assert s1.any(skipna=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add the GH ref here too

# Alternative types, with implicit 'object' dtype.
s = Series(["abc", True])
assert "abc" == s.any() # 'abc' || True => 'abc'
# GH 12863
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well make this a separate test test_all_any_object_dtype

@jbrockmendel
Copy link
Member

Does this bugfix affect DataFrame at all? if so, test for that?

optional: could test the affected nanops functions directly in test_nanops.

@jbrockmendel
Copy link
Member

what case is breaking that necessitates the brute force patch?

@ShaharNaveh
Copy link
Member Author

what case is breaking that necessitates the brute force patch?

The original test case (from #12863)

pd.Series(index=range(5), data=['a', 'b', 'c', 'd', 'e'])

@ShaharNaveh ShaharNaveh force-pushed the COMPAT-fix-12863 branch 2 times, most recently from d6996ca to cf72a99 Compare December 25, 2019 22:21
@ShaharNaveh
Copy link
Member Author

I can't figure it out, when the original case is working, every test is failing.
I can xfail them all, but it's just seems wrong, that so many fails.

if is_object_dtype(dtype) and axis is None:
output = values.any()
else:
output = values.any(axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just return directly from here.

else:
output = values.any(axis)

if isinstance(output, bool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this NOT true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the bug, for example (from the original issue).

master:

>>> import pandas as pd
>>> s = pd.Series(index=range(5), data=['a', 'b', 'c', 'd', 'e'], dtype=object)
>>> s.any()
'a'
>>> s.all()
'e'

This branch (with brute force function):

>>> import pandas as pd
>>> s = pd.Series(index=range(5), data=['a', 'b', 'c', 'd', 'e'], dtype=object)
>>> s.any()
True
>>> s.all()
True

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

can you merge master and we'll take a look

@ShaharNaveh
Copy link
Member Author

can you merge master and we'll take a look

I have merged master @jreback

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@MomIsBestFriend so the tests are failing, ping when green.

@ShaharNaveh
Copy link
Member Author

ShaharNaveh commented Jan 1, 2020

@MomIsBestFriend so the tests are failing, ping when green.

@jreback I'm really not sure what to do, I see pretty much everywhere the error message of:

AttributeError: type object 'bool' has no attribute 'any'

No idea on where to even start. should I change the nanany/nanall functions?

or change about 30 - 50 tests, it's just seems wrong to do so.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@MomIsBestFriend so the tests are failing, ping when green.

@jreback I'm really not sure what to do, I see pretty much everywhere the error message of:

AttributeError: type object 'bool' has no attribute 'any'

No idea on where to even start. should I change the nanany/nanall functions?

or change about 30 - 50 tests, it's just seems wrong to do so.

try using np.any

@ShaharNaveh
Copy link
Member Author

@MomIsBestFriend so the tests are failing, ping when green.

@jreback I'm really not sure what to do, I see pretty much everywhere the error message of:

AttributeError: type object 'bool' has no attribute 'any'

No idea on where to even start. should I change the nanany/nanall functions?
or change about 30 - 50 tests, it's just seems wrong to do so.

try using np.any

This also doesn't work.

@jbrockmendel
Copy link
Member

This also doesn't work.

@MomIsBestFriend now that #29847 has been merged, I'm planning to do a significant cleanup of DataFrame._reduce and core.nanops. So I think the path forward is for you and I to trade tasks. Up for something new?

@ShaharNaveh
Copy link
Member Author

Up for something new?

Yes just take it away from me 😆

@jreback jreback removed this from the 1.0 milestone Jan 3, 2020
@ShaharNaveh ShaharNaveh force-pushed the COMPAT-fix-12863 branch 2 times, most recently from 7864171 to 2217be0 Compare January 12, 2020 16:34
@ShaharNaveh
Copy link
Member Author

Closing this, because this I don't know how to fix it without breaking everything else.

@gimseng
Copy link
Contributor

gimseng commented Oct 8, 2020

Did the issue eventually got fixed? Can someone tldr the thread? Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.any() and .all() don't return bool values if dtype=object
6 participants