-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: pd.array(index_or_series[object]) should infer like Series and Index constructors #39117
Comments
What's the motivation here? The docs in https://pandas.pydata.org/docs/reference/api/pandas.array.html state that dtype is optional, and
I'd prefer to avoid special casing object-dtype here, unless we have a compelling reason to (especially since object-dtype should become less common now that we have more extension types). |
To get pd.array behavior to match Index and Series xref #27460
I'm on board with the sentiment, but point 1. you quotes means we're special-casing ndarray vs EA/Index/Series. Among other things, this means that |
It's not possible for |
Although unfortunate, I think that's inevitable because ndarray cannot hold all the data types that the others can hold. Currently eg nullable integer, or tz-aware datetime64, or period, etc, become an ndarray without proper dtype. So for those I think it makes sense to do more inference on ndarrays. |
There's a step in the logic here I don't understand. EA can hold dtypes that ndarray cannot, but this is about dtypes that both can hold. |
@jbrockmendel can you give some practical code examples? I think that would help a lot to clear up the misunderstanding/confusion (eg I don't know which dtypes you are speaking about) |
The motivation comes from DatetimeLikeArrayMixin._validate_listlike, which uses pd.array for inference and is under the hood of a bunch of methods.
Of course, "dont use pd.array for this" is also a viable approach. |
I am only talking about object dtype, for which we do lib.infer_dtype with ndarray but not Index/Series/PandasArray |
Can you make the example even more concrete? In the title you mention Index or Series of object dtype, if I understand correctly. With a quick test, I don't see different type inference between Series and array constructor for such input:
Both preserve the object dtype when passed a Series? It's actually when passed an ndarray that both infer differently:
|
Your example was using datetimes, and also for that I don't see a difference in behaviour for Series vs array: In [13]: arr = np.array([pd.Timestamp("2020-01-01")], dtype=object)
In [14]: s = pd.Series(arr, dtype=object)
# both Series and array infer object-dtype np.ndarray
In [15]: pd.Series(arr)
Out[15]:
0 2020-01-01
dtype: datetime64[ns]
In [16]: pd.array(arr)
Out[16]:
<DatetimeArray>
['2020-01-01 00:00:00']
Length: 1, dtype: datetime64[ns]
# and both Series and array do not infer object-dtype Series
In [17]: pd.Series(s)
Out[17]:
0 2020-01-01 00:00:00
dtype: object
In [18]: pd.array(s)
Out[18]:
<PandasArray>
[Timestamp('2020-01-01 00:00:00')]
Length: 1, dtype: object but there is actually one for Index (both for Index constructor as when passing index object to the Series constructor): # the Index constructor infers both for ndarray and Series
In [23]: pd.Index(s)
Out[23]: DatetimeIndex(['2020-01-01'], dtype='datetime64[ns]', freq=None)
In [24]: pd.Index(arr)
Out[24]: DatetimeIndex(['2020-01-01'], dtype='datetime64[ns]', freq=None)
# and passing an Index to the Series constructor also infers (in contrast to passing a Series)
In [19]: idx = pd.Index(arr, dtype=object)
In [20]: idx
Out[20]: Index([2020-01-01 00:00:00], dtype='object')
In [21]: pd.Series(idx)
Out[21]:
0 2020-01-01
dtype: datetime64[ns]
In [22]: pd.array(idx)
Out[22]:
<PandasArray>
[Timestamp('2020-01-01 00:00:00')]
Length: 1, dtype: object |
Additional example: Series constructor does not infer the type when the object-dtype Index holds integers (so in contrast with the example above of an object-dtype Index with timestamps), but the Index constructor does also infer in that case:
|
I can't figure out what past-me had in mind here. Best guess is it involved some of the now-deprecated-and-removed string inference that used to be done in the Series constructor. Closing. |
No description provided.
The text was updated successfully, but these errors were encountered: