Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3448: [Python] Add test for roundtrip of pandas DataFrame with column of lists of datetime objects #10866

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions python/pyarrow/tests/pandas_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# under the License.

from collections import OrderedDict
from datetime import date, time
from datetime import date, time, datetime

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -143,13 +143,27 @@ def dataframe_with_lists(include_index=False, parquet_compatible=False):
None,
[time(0, 0, 0), time(18, 0, 2), time(12, 7, 3)]
]
datetime_data = [
[datetime(2015, 1, 5, 12, 0, 0), datetime(2020, 8, 22, 10, 5, 0)],
[datetime(2024, 5, 5, 5, 49, 1), datetime(2015, 12, 24, 22, 10, 17)],
[datetime(1996, 4, 30, 2, 38, 11)],
None,
[datetime(1987, 1, 27, 8, 21, 59)]
]
pandas_timestamp_data = [
list(map(pd.Timestamp, x)) if x else None for x in datetime_data
]

temporal_pairs = [
(pa.date32(), date_data),
(pa.date64(), date_data),
(pa.time32('s'), time_data),
(pa.time32('ms'), time_data),
(pa.time64('us'), time_data)
(pa.time64('us'), time_data),
(pa.timestamp('s'), datetime_data),
(pa.timestamp('ms'), datetime_data),
(pa.timestamp('us'), datetime_data),
(pa.timestamp('ns'), pandas_timestamp_data),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you get an error when you pass datetime_data here? (there are no nanoseconds here, so I would expect this to not matter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes:

E   AssertionError: DataFrame.iloc[:, 12] (column name="timestamp[ns]_list") are different
E
E   DataFrame.iloc[:, 12] (column name="timestamp[ns]_list") values are different (80.0 %)
E   [index]: [0, 1, 2, 3, 4]
E   [left]:  [[2015-01-05T12:00:00.000000000, 2020-08-22T10:05:00.000000000], [2024-05-05T05:49:01.000000000, 2015-12-24T22:10:17.000000000], [1996-04-30T02:38:11.000000000], None, [1987-01-27T08:21:59.000000000, NaT]]
E   [right]: [[2015-01-05 12:00:00, 2020-08-22 10:05:00], [2024-05-05 05:49:01, 2015-12-24 22:10:17], [1996-04-30 02:38:11], None, [1987-01-27 08:21:59, None]]

]
if not parquet_compatible:
temporal_pairs += [
Expand Down