Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf.to_datetime with arg with Z drops UTC offset incorrectly #14039

Closed
mroeschke opened this issue Sep 6, 2023 · 2 comments · Fixed by #14074
Closed

[BUG] cudf.to_datetime with arg with Z drops UTC offset incorrectly #14039

mroeschke opened this issue Sep 6, 2023 · 2 comments · Fixed by #14074
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@mroeschke
Copy link
Contributor

Describe the bug
cudf.to_datetime with arg with Z drops UTC offset incorrectly

Steps/Code to reproduce bug

In [1]: import cudf

In [2]: cudf.to_datetime(["2019-01-01T00:00:00.000Z"])
Out[2]: DatetimeIndex(['2019-01-01'], dtype='datetime64[ns]')

Expected behavior
Since cudf does not support timezones currently, I would expect this to raise a NotImplementedError

It appears there was a commit that tried to address this, 9508210, but it looks to simply ignore "Z"

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. labels Sep 6, 2023
rapids-bot bot pushed a commit that referenced this issue Sep 14, 2023
…ring (#14074)

closes #14039
Avoids this discrepancy when a date string has a tz component 

```python
In [1]: import pandas

In [2]: import cudf

In [3]: data = ["2019-01-01T00:00:00.000Z"]

In [4]: cudf.to_datetime(data)
Out[4]: DatetimeIndex(['2019-01-01'], dtype='datetime64[ns]')

In [5]: pandas.to_datetime(data)
Out[5]: DatetimeIndex(['2019-01-01 00:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
```

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14074
@cwharris
Copy link
Contributor

cwharris commented Jan 2, 2024

It appears there was a commit that tried to address this, 9508210, but it looks to simply ignore "Z"

This is exactly how it should work. Adding Z does not change the timezone. All time data should be assumed to be in UTC unless otherwise specified by a timezone offset. Throwing an exception is not appropriate and is a breaking change that is effecting nv-morpheus/morpheus as we try to update to newer versions of CUDF.

@bdice
Copy link
Contributor

bdice commented Jan 3, 2024

All time data should be assumed to be in UTC unless otherwise specified by a timezone offset.

Pandas differentiates between "naive" datetimes and timezone-aware datetimes. In pandas, parsing a Z will return a datetime64[ns, UTC] type instead of a datetime64[ns] type. cudf needs to align with pandas behavior.

We are temporarily reverting this change of behavior in #14701 when cuDF is used with pandas compatibility mode turned off. See my proposal for a deprecation step followed by a behavior change in cuDF here: #14701 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants