Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] to_datetime fails to parse nanoseconds properly if %f is provided in format #7945

Closed
kumaranvpl opened this issue Apr 13, 2021 · 2 comments
Assignees
Labels
bug Something isn't working doc Documentation good first issue Good for newcomers Python Affects Python cuDF API.

Comments

@kumaranvpl
Copy link

Describe the bug
As per docs, cudf.to_datetime should parse nanoseconds if '%f' is provided in format. But cudf replaces actual nanoseconds with '000' when '%f' is provided in format. This replacing behavior does not happens if format is not provided.

Steps/Code to reproduce bug

import cudf

print(cudf.to_datetime("2021-04-13 12:30:04.123456789", format="%Y-%m-%d %H:%M:%S.%f"))
print(cudf.to_datetime("2021-04-13 12:30:04.123456789"))

This outputs

2021-04-13T12:30:04.123456000
2021-04-13T12:30:04.123456789

Expected behavior

2021-04-13T12:30:04.123456789
2021-04-13T12:30:04.123456789

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: Docker
  • Command used to pull docker: docker pull rapidsai/rapidsai:0.18-cuda11.0-base-ubuntu20.04-py3.8

Additional context
Pandas parses nanoseconds correctly irrespective of format

import pandas as pd

print(pd.to_datetime("2021-04-13 12:30:04.123456789", format="%Y-%m-%d %H:%M:%S.%f"))
print(pd.to_datetime("2021-04-13 12:30:04.123456789"))

This outputs

2021-04-13T12:30:04.123456789
2021-04-13T12:30:04.123456789
@kumaranvpl kumaranvpl added Needs Triage Need team to review and classify bug Something isn't working labels Apr 13, 2021
@davidwendt
Copy link
Contributor

The cudf documentation linked in the description has the following for the format parameter:

formatstr, default None
  The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. 
  ...

Ignoring the poor sentence structure, the accompanying link for strftime indicates that %f only supports microseconds and not nanoseconds. So the cudf documentation here would probably need to be updated.

For now, you can force cudf to parse nanoseconds (9 character digits) by specifying %9f instead:

>>> print(pd.to_datetime("2021-04-13 12:30:04.123456789", format="%Y-%m-%d %H:%M:%S.%9f"))
2021-04-13T12:30:04.123456789

@kkraus14 kkraus14 added Python Affects Python cuDF API. doc Documentation and removed Needs Triage Need team to review and classify labels Apr 15, 2021
@kkraus14 kkraus14 added the good first issue Good for newcomers label Apr 15, 2021
@beckernick beckernick added this to the Time Series Analysis milestone Jul 14, 2021
@marlenezw marlenezw self-assigned this Aug 20, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 23, 2021
This is a quick fix to close PR #7945 
This PR checks to see if `%f` is passed as part of `format`  into `cudf.to_datetime`. Previously, cudf would not return nanoseconds, while pandas does.

Authors:
  - Marlene  (https://github.com/marlenezw)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #9081
@marlenezw
Copy link
Contributor

Closing this issue since #9081 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working doc Documentation good first issue Good for newcomers Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

5 participants