Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] to_pandas is not preserving datetime & timedelta types present in StructColumn #9383

Closed
galipremsagar opened this issue Oct 6, 2021 · 0 comments · Fixed by #9388
Closed
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
When a StructColumn has datetime & timedelta types in it, converting to pandas is only returning integer values instead of datetime or timedelta values. This seems to be a bug in the arrow in to_pandas, however, we can workaround using tolist in cudf just for StructColumn.

Steps/Code to reproduce bug

>>> import cudf
>>> df = cudf.DataFrame({'a':[1, 2,  3], 'b':[1202312, 23123312, 133212]})
>>> df['b'] = df['b'].astype('datetime64[ns]')
>>> df
   a                             b
0  1 1970-01-01 00:00:00.001202312
1  2 1970-01-01 00:00:00.023123312
2  3 1970-01-01 00:00:00.000133212
>>> df.to_struct()
0     {'a': 1, 'b': 1202312}
1    {'a': 2, 'b': 23123312}
2      {'a': 3, 'b': 133212}
dtype: struct
>>> df.to_struct()[0]
{'a': 1, 'b': Timestamp('1970-01-01 00:00:00.001202312')}

Expected behavior

>>> df.to_struct()
0    {'a': 1, 'b': 1970-01-01 00:00:00.001202312}
1    {'a': 2, 'b': 1970-01-01 00:00:00.023123312}
2    {'a': 3, 'b': 1970-01-01 00:00:00.000133212}
dtype: object

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [source]

Additional context
Surfaced while running fuzz-tests for ORC reader/writer.

@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Oct 6, 2021
@galipremsagar galipremsagar self-assigned this Oct 6, 2021
@rapids-bot rapids-bot bot closed this as completed in #9388 Oct 6, 2021
rapids-bot bot pushed a commit that referenced this issue Oct 6, 2021
Fixes: #9383 

This PR preserves `datetime` & `timedelta` types in `StructColumn` upon conversion to `pd.Series`, and also preserves `int`   values to be `int` rather than converting to `float`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Michael Wang (https://github.com/isVoid)

URL: #9388
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant