-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: ExtensionArray.astype to only return extension arrays ? #24877
Comments
This is worth considering.
In theory :) In practice, see #24866 (though I'm not really happy with that PR right now). |
It's also worth spelling out whether we would change Categorical.astype (probably? Should be easy to deprecate). SparseArray.astype does the "right" thing (returning an ExtensionArray) already . |
is be in favor of this |
As mentioned in the comment of #24674, I'd obviously be in favour of this. I like the consistency of calls to methods on pandas objects staying withing the pandas-universe, and getting to numpy-land needing an explicit It would also circumvent some numpy issues like numpy/numpy#12550, which I discovered while working on the still-WIP PR for #23833. |
Would require some small adjustments to corner cases in
|
Regarding IntervalArray
Yes, that is currenlty the case for all Arrays. It would then return a PandasArray[object].
That currently indeed works. But should we actually allow this? It feels strange to me to special case the all NA case, as in general, casting to float raises an error that it cannot cast IntervalArray to float64 (although I see in the code that this is actually not being special cases, but simply the result of doing It seems we have agreement on this? In that case, I think ideally we should do this for 0.24.0. |
This might fine. But let's actually get 0.24.0 out the door. This is an 11-hour change. |
We're again approaching the 11th hour of the release, but this issue should not be postponed again IMO. |
@h-vetinari if there is a PR we could consider it |
This probably isn't a blocker for 1.0 right? |
I'm not wild about this.
So now Series.astype (or something else in the call stack) has to do something like
which isn't that onerous, but de facto we have to use something like this pattern everywhere that we currently do a .astype while working with ArrayLike values. |
agree with @jbrockmendel here i don't see it feasible to return PandasArray ever i don't see why you would always return only EA when we have perfectly usable numpy dtypes |
Related to the discussion happening in #24674 (comment), further xref #24773, #22384
The question being posed here is what the return type of
ExtensionArray.astype(..)
should be.Currently, it can be both a numpy.ndarray or an ExtensionArray. And the
astype
method in the base class is actually advertised as "Cast to a NumPy array with 'dtype'" (this is not fully correct, as currently casting eg a DatetimeArray to period dtype will give a PeriodArray EA, not a numpy array).However, this gives some discussion about what eg
DatetimeArray.astype("datetime64[ns]")
should return, as we actually have a DatetimeArray EA that supports that dtype.You get similar inconsistencies / dubious cases in eg
DatetimeArray.astype('int64')
which currently returns an int64 ndarray, whilepd.array(DatetimeArray(), dtype='int64')
will give a PandasArray[int64].So one idea would be to restrict
ExtensionArray
to be a method to convert from one type of ExtensionArray to another type of ExtensionArray, i.e. the output type would be expected to always be an ExtensionArray.(similarly, in the end, as how
ndarray.astype
only gives other ndarrays, orpyarrow.Array.cast
only gives other pyarrow arrays; there are other (more explicit) methods to convert to a numpy array)In practice this would mean returning a PandasArray for numpy dtypes instead of a numpy ndarray. Regarding storing the result of it in a Series/DataFrame should not change, because then we unpack such a PandasArray anyway.
cc @TomAugspurger @jreback @jbrockmendel @shoyer @h-vetinari
The text was updated successfully, but these errors were encountered: