-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: BooleanArray any/all with NA logic #30062
API: BooleanArray any/all with NA logic #30062
Conversation
pandas/core/arrays/boolean.py
Outdated
@@ -557,6 +557,30 @@ def _values_for_argsort(self) -> np.ndarray: | |||
data[self._mask] = -1 | |||
return data | |||
|
|||
def any(self, skipna=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with np.any
with this? Do we need any keywords for compatibility?
Is the expected behavior here different from nanops.nanany
? / nanops.nanall
?
Yes, still need to do that. If we want this to work (without getting into
Ah, didn't look yet at those. They actually accept a Also still need to add docstrings. |
So I didn't use those methods, because indeed the behaviour that is now implemented in nanany/nanall for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing these directly on BooleanArray makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this issue number in the whatsnew where BooleanArray was added
pandas/core/arrays/boolean.py
Outdated
@@ -560,6 +561,143 @@ def _values_for_argsort(self) -> np.ndarray: | |||
data[self._mask] = -1 | |||
return data | |||
|
|||
def any(self, skipna=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if skipna: | ||
return result | ||
else: | ||
if result or len(self) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use not len(self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas/core, we actually use the len(..) == 0
pattern more than not len(..)
. I personally also find that easier to read.
(the typical pythonic idiom recommendation is about doing if (not) container:
instead of if (not) len(container)
for empty containers, but that of course doesn't hold for arrays)
pandas/core/arrays/boolean.py
Outdated
else: | ||
return self.dtype.na_value | ||
|
||
def all(self, skipna=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
See Also | ||
-------- | ||
numpy.all : Numpy version of this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to add a link for kleene logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the See Also section, we can only add links to other API pages. But, in the long description of the docstring a bit above, I already included a link about the Kleene logic.
if skipna: | ||
return result | ||
else: | ||
if not result or len(self) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -656,6 +794,10 @@ def cmp_method(self, other): | |||
return set_function_name(cmp_method, name, cls) | |||
|
|||
def _reduce(self, name, skipna=True, **kwargs): | |||
|
|||
if name in {"any", "all"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use lists for these checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this file we actually use more in {}
than in []
(both are used), but since Tom and I wrote this file, that's probably not an argument ;)
Happy to change it, purely performance wise the set is faster (but this is about nanoseconds of course ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I'm probably to blame for the sets :) I like them more for membership tests, though it doesn't matter for small sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
|
||
See Also | ||
-------- | ||
numpy.all : Numpy version of this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the See Also section, we can only add links to other API pages. But, in the long description of the docstring a bit above, I already included a link about the Kleene logic.
pandas/core/arrays/boolean.py
Outdated
@@ -560,6 +561,143 @@ def _values_for_argsort(self) -> np.ndarray: | |||
data[self._mask] = -1 | |||
return data | |||
|
|||
def any(self, skipna=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if skipna: | ||
return result | ||
else: | ||
if result or len(self) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas/core, we actually use the len(..) == 0
pattern more than not len(..)
. I personally also find that easier to read.
(the typical pythonic idiom recommendation is about doing if (not) container:
instead of if (not) len(container)
for empty containers, but that of course doesn't hold for arrays)
@@ -656,6 +794,10 @@ def cmp_method(self, other): | |||
return set_function_name(cmp_method, name, cls) | |||
|
|||
def _reduce(self, name, skipna=True, **kwargs): | |||
|
|||
if name in {"any", "all"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this file we actually use more in {}
than in []
(both are used), but since Tom and I wrote this file, that's probably not an argument ;)
Happy to change it, purely performance wise the set is faster (but this is about nanoseconds of course ..)
pandas/core/arrays/boolean.py
Outdated
else: | ||
return self.dtype.na_value | ||
|
||
def all(self, skipna=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This is good to go? The failure on Azure is the flaky resource warning thing. |
Closes #29686
Implementation and tests for
any
/all
with the updated logic as discussed in the linked issue.