-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Pandas roundtrip doesn't preserve list of datetime objects #19770
Comments
Joris Van den Bossche / @jorisvandenbossche: I think in general ListArrays are always converted to columns of arrays (not preserving the original column of lists), and here in addition the question is whether it should be object dtype or datetime64 (for the nested array dtype). |
Antoine Pitrou / @pitrou: |
Krisztian Szucs / @kszucs: |
Joris Van den Bossche / @jorisvandenbossche: >>> df = pd.DataFrame({'a': datetime_data})
>>> table = pa.table(df)
>>> table
pyarrow.Table
a: list<item: timestamp[us]>
child 0, item: timestamp[us]
>>> table.to_pandas()['a'][0]
array(['2015-01-05T12:00:00.000000', '2020-08-22T10:05:00.000000'],
dtype='datetime64[us]')
>>> df['a'][0]
[datetime.datetime(2015, 1, 5, 12, 0), datetime.datetime(2020, 8, 22, 10, 5)] But as I mentioned above, not sure we actually want to change this behaviour. |
Krisztian Szucs / @kszucs: So |
I think this can be closed as the datetime object can now be preserved with the use of import pyarrow as pa
pa.__version__
# '12.0.0.dev279+gb20734438'
import pandas as pd
from datetime import datetime
datetime_data = [
[datetime(2015, 1, 5, 12, 0, 0), datetime(2020, 8, 22, 10, 5, 0)],
[datetime(2024, 5, 5, 5, 49, 1), datetime(2015, 12, 24, 22, 10, 17)],
[datetime(1996, 4, 30, 2, 38, 11)],
None,
[datetime(1987, 1, 27, 8, 21, 59)]
]
df = pd.DataFrame({'a': datetime_data})
table = pa.table(df)
table.to_pandas(timestamp_as_object=True).values
# array([[array([datetime.datetime(2015, 1, 5, 12, 0),
# datetime.datetime(2020, 8, 22, 10, 5)], dtype=object)],
# [array([datetime.datetime(2024, 5, 5, 5, 49, 1),
# datetime.datetime(2015, 12, 24, 22, 10, 17)], dtype=object)],
# [array([datetime.datetime(1996, 4, 30, 2, 38, 11)], dtype=object)],
# [None],
# [array([datetime.datetime(1987, 1, 27, 8, 21, 59)], dtype=object)]],
# dtype=object) There is still an issue where the list roundtrips to a numpy array of numpy arrays, but there are other issues tracking this (#34574, #20222) - we could think of supporting an option to preserve list dtype also. But this should come after the optimisation of |
Adding the following to the pandas_example.py::dataframe_with_lists functionn:
breaks the tests cases, because the roundtrip doesn't preserve the object type.
Reporter: Krisztian Szucs / @kszucs
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-3448. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: