Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: EA._hasnans, EADtype.empty #45024

Merged
merged 3 commits into from
Dec 23, 2021
Merged

Conversation

jbrockmendel
Copy link
Member

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment

@@ -603,6 +603,16 @@ def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll:
"""
raise AbstractMethodError(self)

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not use a cached_property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, EAs are mutable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right (though actually we could do this as we know when they are mutated, e.g keep a dirty bit). but out of scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've looked into this w/r/t DatetimeArray.freq and found that propagating it through views (e.g. arr[:5][0] = np.nan) makes this a giant pain

@jreback jreback added this to the 1.4 milestone Dec 23, 2021
@jreback jreback added API - Consistency Internal Consistency of API/Behavior ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 23, 2021
@jreback jreback merged commit 8e38f28 into pandas-dev:master Dec 23, 2021
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! A couple of comments though.

Can you add a reference to this public EADtype.empty from ExtensionArray._empty (which you added in #40602), indicating that EADtype.empty is the official public variant people should use?

Construct an ExtensionArray of this dtype with the given shape.

Analogous to numpy.empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a Parameters section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just added this to me next follow-ups branch.

"""
Equivalent to `self.isna().any()`.

Some ExtensionArray subclasses may be able to optimize this check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a method that subclasses can override, can you add it to the documentation at the top of this file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also rename it to _hasna or so, to be consistent with our public NA related properties (we could also make this public?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also rename it to _hasna or so, to be consistent with our public NA related properties

I used _hasnans to be consistent with the datetimelike arrays (didn't realize the masked arrays had _hasna until just now). Also consistent-ish with the hasnans we have in Index and Series.

We should revisit the naming convention(s) once the NA/NaN API design discussion reaches some kind of conclusion.

(we could also make this public?)

I chose to keep this private so we would have the option of changing it later.

(Side-note: "private" for EA methods is really ambiguous, could mean a) user shouldn't use use, b) should only be called from self, c) 3rd party authors shouldn't use, d) 3rd party implementors shouldn't override. Not sure how we can clarify or tighten this)

Copy link
Member

@jorisvandenbossche jorisvandenbossche Dec 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current situation is IMO implicitly: everything in the base class is automatically "public" for developers of third party EAs, unless explicitly documented.

_hasnans is being used in methods that EA authors can override, and is useful for them as well, and they are also allowed to override it. So I think this is "public" for developers (regardless of it being public for users or not), and thus should decide on a name now, IMO.

The question about making it public for users (no leading underscore) is indeed something that can also be handled later.

result2 = dtype.empty((4,))
assert isinstance(result2, cls)
assert result2.dtype == dtype
assert result2.shape == (4,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make this public, we should maybe a test that passing a scalar integer works as well (if we want to allow that, following np.empty)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
3 participants