Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict the allowed pandas timezone objects in cudf #16013

Merged
merged 13 commits into from
Jun 24, 2024

Conversation

mroeschke
Copy link
Contributor

Description

Since cudf's timezone support is based on the OS's tz data and hence zoneinfo, cudf cannot naturally support the variety of timezone objects supported by pandas (pytz, dateutil, etc). Therefore:

  • In pandas compatible mode, only accept pandas objects with zoneinfo timezones.
  • Otherwise, try to convert the pandas timezone to an equivalent zoneinfo object e.g. pytz.timezone("US/Pacific")-> zoneinfo.ZoneInfo("US/Pacific")

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 12, 2024
@mroeschke mroeschke requested a review from a team as a code owner June 12, 2024 20:49
@mroeschke mroeschke requested review from wence- and charlesbluca June 12, 2024 20:49
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor queries, and one name-typo fix.

python/cudf/cudf/core/_internals/timezones.py Show resolved Hide resolved
from cudf.core.column.timedelta import TimeDeltaColumn


def get_compatable_timezone(dtype: pd.DatetimeTZDtype) -> pd.DatetimeTZDtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_compatable_timezone(dtype: pd.DatetimeTZDtype) -> pd.DatetimeTZDtype:
def get_compatible_timezone(dtype: pd.DatetimeTZDtype) -> pd.DatetimeTZDtype:

et seq., throughout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch!

python/cudf/cudf/core/_internals/timezones.py Show resolved Hide resolved
@mroeschke
Copy link
Contributor Author

Looks like we're back to 91% as of 413690d

Merging this PR would result in 194973/212690 (91.67%)

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mroeschke

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 0c6b828 into rapidsai:branch-24.08 Jun 24, 2024
73 checks passed
@mroeschke mroeschke deleted the fix/pd/timezone branch July 26, 2024 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants