-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: EA._hasnans, EADtype.empty #45024
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -425,7 +425,7 @@ def __contains__(self, item: object) -> bool | np.bool_: | |
if not self._can_hold_na: | ||
return False | ||
elif item is self.dtype.na_value or isinstance(item, self.dtype.type): | ||
return self.isna().any() | ||
return self._hasnans | ||
else: | ||
return False | ||
else: | ||
|
@@ -603,6 +603,16 @@ def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll: | |
""" | ||
raise AbstractMethodError(self) | ||
|
||
@property | ||
def _hasnans(self) -> bool: | ||
# GH#22680 | ||
""" | ||
Equivalent to `self.isna().any()`. | ||
|
||
Some ExtensionArray subclasses may be able to optimize this check. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is a method that subclasses can override, can you add it to the documentation at the top of this file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also rename it to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I used _hasnans to be consistent with the datetimelike arrays (didn't realize the masked arrays had _hasna until just now). Also consistent-ish with the hasnans we have in Index and Series. We should revisit the naming convention(s) once the NA/NaN API design discussion reaches some kind of conclusion.
I chose to keep this private so we would have the option of changing it later. (Side-note: "private" for EA methods is really ambiguous, could mean a) user shouldn't use use, b) should only be called from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current situation is IMO implicitly: everything in the base class is automatically "public" for developers of third party EAs, unless explicitly documented.
The question about making it public for users (no leading underscore) is indeed something that can also be handled later. |
||
""" | ||
return self.isna().any() | ||
|
||
def _values_for_argsort(self) -> np.ndarray: | ||
""" | ||
Return values for sorting. | ||
|
@@ -686,7 +696,7 @@ def argmin(self, skipna: bool = True) -> int: | |
ExtensionArray.argmax | ||
""" | ||
validate_bool_kwarg(skipna, "skipna") | ||
if not skipna and self.isna().any(): | ||
if not skipna and self._hasnans: | ||
raise NotImplementedError | ||
return nargminmax(self, "argmin") | ||
|
||
|
@@ -710,7 +720,7 @@ def argmax(self, skipna: bool = True) -> int: | |
ExtensionArray.argmin | ||
""" | ||
validate_bool_kwarg(skipna, "skipna") | ||
if not skipna and self.isna().any(): | ||
if not skipna and self._hasnans: | ||
raise NotImplementedError | ||
return nargminmax(self, "argmax") | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
from pandas._libs.hashtable import object_hash | ||
from pandas._typing import ( | ||
DtypeObj, | ||
Shape, | ||
npt, | ||
type_t, | ||
) | ||
|
@@ -208,6 +209,19 @@ def construct_array_type(cls) -> type_t[ExtensionArray]: | |
""" | ||
raise AbstractMethodError(cls) | ||
|
||
def empty(self, shape: Shape) -> type_t[ExtensionArray]: | ||
""" | ||
Construct an ExtensionArray of this dtype with the given shape. | ||
|
||
Analogous to numpy.empty. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a Parameters section? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, just added this to me next follow-ups branch. |
||
Returns | ||
------- | ||
ExtensionArray | ||
""" | ||
cls = self.construct_array_type() | ||
return cls._empty(shape, dtype=self) | ||
|
||
@classmethod | ||
def construct_from_string( | ||
cls: type_t[ExtensionDtypeT], string: str | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -126,6 +126,12 @@ def test_construct_empty_dataframe(self, dtype): | |
def test_empty(self, dtype): | ||
cls = dtype.construct_array_type() | ||
result = cls._empty((4,), dtype=dtype) | ||
|
||
assert isinstance(result, cls) | ||
assert result.dtype == dtype | ||
assert result.shape == (4,) | ||
|
||
# GH#19600 method on ExtensionDtype | ||
result2 = dtype.empty((4,)) | ||
assert isinstance(result2, cls) | ||
assert result2.dtype == dtype | ||
assert result2.shape == (4,) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we make this public, we should maybe a test that passing a scalar integer works as well (if we want to allow that, following np.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not use a cached_property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, EAs are mutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right (though actually we could do this as we know when they are mutated, e.g keep a dirty bit). but out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've looked into this w/r/t
DatetimeArray.freq
and found that propagating it through views (e.g.arr[:5][0] = np.nan
) makes this a giant pain