-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: EA._hasnans, EADtype.empty #45024
Conversation
jbrockmendel
commented
Dec 23, 2021
- closes Add ExtesnsionDtype.empty classmethod #19600
- closes Add a method to ExtensionArray interface for whether an array contains NAs #22680
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment
@@ -603,6 +603,16 @@ def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll: | |||
""" | |||
raise AbstractMethodError(self) | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not use a cached_property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, EAs are mutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right (though actually we could do this as we know when they are mutated, e.g keep a dirty bit). but out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've looked into this w/r/t DatetimeArray.freq
and found that propagating it through views (e.g. arr[:5][0] = np.nan
) makes this a giant pain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! A couple of comments though.
Can you add a reference to this public EADtype.empty from ExtensionArray._empty (which you added in #40602), indicating that EADtype.empty is the official public variant people should use?
Construct an ExtensionArray of this dtype with the given shape. | ||
|
||
Analogous to numpy.empty. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a Parameters section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, just added this to me next follow-ups branch.
""" | ||
Equivalent to `self.isna().any()`. | ||
|
||
Some ExtensionArray subclasses may be able to optimize this check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a method that subclasses can override, can you add it to the documentation at the top of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also rename it to _hasna
or so, to be consistent with our public NA related properties (we could also make this public?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also rename it to _hasna or so, to be consistent with our public NA related properties
I used _hasnans to be consistent with the datetimelike arrays (didn't realize the masked arrays had _hasna until just now). Also consistent-ish with the hasnans we have in Index and Series.
We should revisit the naming convention(s) once the NA/NaN API design discussion reaches some kind of conclusion.
(we could also make this public?)
I chose to keep this private so we would have the option of changing it later.
(Side-note: "private" for EA methods is really ambiguous, could mean a) user shouldn't use use, b) should only be called from self
, c) 3rd party authors shouldn't use, d) 3rd party implementors shouldn't override. Not sure how we can clarify or tighten this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current situation is IMO implicitly: everything in the base class is automatically "public" for developers of third party EAs, unless explicitly documented.
_hasnans
is being used in methods that EA authors can override, and is useful for them as well, and they are also allowed to override it. So I think this is "public" for developers (regardless of it being public for users or not), and thus should decide on a name now, IMO.
The question about making it public for users (no leading underscore) is indeed something that can also be handled later.
result2 = dtype.empty((4,)) | ||
assert isinstance(result2, cls) | ||
assert result2.dtype == dtype | ||
assert result2.shape == (4,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make this public, we should maybe a test that passing a scalar integer works as well (if we want to allow that, following np.empty)