Add tz_convert method to convert between timestamps #13328

shwina · 2023-05-10T16:27:20Z

Description

Closes #13329

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

python/cudf/cudf/core/_internals/timezones.py

…onvert

shwina · 2023-05-12T12:01:29Z

rerun tests

…onvert

mroeschke

My comments have been addressed

shwina · 2023-05-16T19:33:17Z

python/cudf/cudf/core/column/datetime.py

@@ -589,6 +601,18 @@ def as_string_column(
    ) -> "cudf.core.column.StringColumn":
        return self._local_time.as_string_column(dtype, format, **kwargs)

+    def __repr__(self):


This is really unrelated to the rest of the PR, but a quality-of-life thing for debugging. Pandas always prints the local timestamps when looking at a tz-aware column and pyarrow always prints the UTC timestamps.

python/cudf/cudf/core/_internals/timezones.py

bdice · 2023-05-16T19:37:26Z

python/cudf/cudf/core/column/datetime.py

+    def __repr__(self):
+        # Arrow prints the UTC timestamps, but we want to print the
+        # local timestamps:
+        arr = self._local_time.to_arrow().cast(


Maybe a silly question. Why convert to arrow and then attempt to mirror pandas repr conventions instead of just converting it to pandas and using that repr?

Mainly because pyarrow has the convenient to_string() method that lets us assemble the repr with a custom class name.

In [7]: print(dti._column.to_arrow().to_string()) [ 2001-01-01 05:00:00.000000000, 2001-01-01 06:00:00.000000000, 2001-01-01 07:00:00.000000000, 2001-01-01 08:00:00.000000000, 2001-01-01 09:00:00.000000000, 2001-01-01 10:00:00.000000000, 2001-01-01 11:00:00.000000000, 2001-01-01 12:00:00.000000000, 2001-01-01 13:00:00.000000000, 2001-01-01 14:00:00.000000000 ]

python/cudf/cudf/core/index.py

bdice · 2023-05-16T19:40:41Z

python/cudf/cudf/core/index.py

+        Parameters
+        ----------
+        tz: str
+            Time zone for time. Corresponding timestamps would be converted


The pandas docstring is worded in a weird way. However, I would leave this as-is to match pandas.

bdice · 2023-05-16T19:43:43Z

python/cudf/cudf/core/index.py

+                       '2018-03-03 14:00:00+00:00'],
+                      dtype='datetime64[ns, Europe/London]')
+        """
+        from cudf.core._internals.timezones import convert, localize


I forgot, is there a reason we defer this import in each function rather than doing it once at the top?

Circular imports :-(

python/cudf/cudf/core/series.py

bdice · 2023-05-16T19:47:14Z

python/cudf/cudf/tests/series/test_datetimelike.py

+    "to_tz", ["Europe/London", "America/Chicago", "UTC", None]
+)
+def test_convert(from_tz, to_tz):
+    ps = pd.Series(pd.date_range("2023-01-01", periods=3, freq="H"))


I’d love it if we could add some complexity to our test inputs. Maybe a data fixture that has some times on either side of a DST change, ambiguous times, pre-1900 times, etc. Include some times that we know have raised issues in the past (issue tracker has a few).

#7314 (comment)

I agree. @mroeschke does Pandas do something like this? Just wondering if there's tooling we can borrow/steal/vendor from Pandas

Unfortunately not. The only related fixture we used is a fixed variety of timezones we use where applicable

https://github.com/pandas-dev/pandas/blob/0fa150016911de08025d82ef6975750278c5ad7b/pandas/conftest.py#L1196-L1214

OK - I threw the problem at ChatGPT and it generated some edge case tests that I added here.

pre-1900 times

I did find that we return a result different from Pandas for this pre-1900 example:

>>> pd.Series(["1899-01-01 12:00"], dtype="datetime64[s]").dt.tz_localize("Europe/Paris").dt.tz_convert("America/New_York") 0 1899-01-01 06:55:00-04:56 dtype: datetime64[ns, America/New_York] >>> cudf.Series(["1899-01-01 12:00"], dtype="datetime64[s]").dt.tz_localize("Europe/Paris").dt.tz_convert("America/New_York") 0 1899-01-01 06:50:39-04:56 dtype: datetime64[s, America/New_York]

However, our result is the same as you would get with zoneinfo:

>>> datetime(1899, 1, 1, 12, 0, tzinfo=ZoneInfo("Europe/Paris")).astimezone(ZoneInfo("America/New_York")) datetime.datetime(1899, 1, 1, 6, 50, 39, tzinfo=zoneinfo.ZoneInfo(key='America/New_York'))

@mroeschke I'm curious if this aligns with your experience with the difference between Pandas (pytz) and ZoneInfo?

@shwina If you want to add pre-1900 times in a later PR, that's fine. I think you hit a decent number of edge cases for now. But if we know we disagree with pandas for this specific case, I'd like to document that in an issue. I would consider that a bug.

Co-authored-by: Bradley Dice <[email protected]>

…onvert

python/cudf/cudf/tests/series/test_datetimelike.py

shwina

Suggestion

Co-authored-by: Ashwin Srinath <[email protected]>

shwina · 2023-05-18T09:50:40Z

/merge

shwina added 2 commits May 10, 2023 12:08

Add tz_convert method

c287c7c

Update implementation for None

949a95b

shwina added feature request New feature or request Python Affects Python cuDF API. non-breaking Non-breaking change labels May 10, 2023

Merge branch 'branch-23.06' into add-tz-convert

904f3d3

shwina commented May 11, 2023

View reviewed changes

python/cudf/cudf/core/_internals/timezones.py Show resolved Hide resolved

mroeschke reviewed May 11, 2023

View reviewed changes

python/cudf/cudf/core/_internals/timezones.py Show resolved Hide resolved

shwina added 3 commits May 11, 2023 16:38

Short-circuit when zone names are the same

5e745fb

Merge branch 'add-tz-convert' of github.com:shwina/cudf into add-tz-c…

40fad81

…onvert

Fix repr

7a470ff

github-actions bot added the conda label May 11, 2023

Merge branch 'branch-23.06' of github.com:rapidsai/cudf into add-tz-c…

123ddd6

…onvert

shwina force-pushed the add-tz-convert branch from 8ce6f31 to 7a470ff Compare May 11, 2023 23:31

shwina and others added 5 commits May 12, 2023 13:22

Merge branch 'branch-23.06' into add-tz-convert

95de2bf

Merge branch 'branch-23.06' into add-tz-convert

6c3e403

Merge branch 'branch-23.06' of github.com:rapidsai/cudf into add-tz-c…

c7626c5

…onvert

Add test for slicing tzdatetimes

c8f58a8

Merge branch 'branch-23.06' into add-tz-convert

5629269

shwina marked this pull request as ready for review May 16, 2023 19:27

shwina requested a review from a team as a code owner May 16, 2023 19:27

shwina requested review from wence- and vyasr May 16, 2023 19:27

mroeschke approved these changes May 16, 2023

View reviewed changes

shwina commented May 16, 2023

View reviewed changes

bdice reviewed May 16, 2023

View reviewed changes

shwina and others added 3 commits May 16, 2023 16:59

Apply suggestions from code review

47a65cb

Co-authored-by: Bradley Dice <[email protected]>

Add edge case tests

991a3e9

Fix for .convert(None) in the index case

8ca7e46

Merge branch 'add-tz-convert' of github.com:shwina/cudf into add-tz-c…

ea3fe0c

…onvert

shwina commented May 17, 2023

View reviewed changes

python/cudf/cudf/tests/series/test_datetimelike.py Outdated Show resolved Hide resolved

shwina commented May 17, 2023

View reviewed changes

shwina requested a review from bdice May 17, 2023 19:23

bdice approved these changes May 17, 2023

View reviewed changes

bdice and others added 2 commits May 17, 2023 14:27

Update python/cudf/cudf/tests/series/test_datetimelike.py

3998c4f

Co-authored-by: Ashwin Srinath <[email protected]>

Merge branch 'branch-23.06' into add-tz-convert

66a543f

rapids-bot bot merged commit e883a11 into rapidsai:branch-23.06 May 18, 2023

shwina mentioned this pull request May 18, 2023

tz_convert sometimes returns results different from Pandas (but same as zoneinfo) #13380

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tz_convert method to convert between timestamps #13328

Add tz_convert method to convert between timestamps #13328

shwina commented May 10, 2023 •

edited

Loading

shwina commented May 12, 2023

mroeschke left a comment

shwina May 16, 2023

bdice May 16, 2023

shwina May 16, 2023

bdice May 16, 2023

bdice May 16, 2023

shwina May 16, 2023

bdice May 16, 2023

bdice May 16, 2023

shwina May 16, 2023

mroeschke May 16, 2023

shwina May 17, 2023

shwina May 17, 2023

bdice May 17, 2023

shwina left a comment

shwina commented May 18, 2023

Add tz_convert method to convert between timestamps #13328

Add tz_convert method to convert between timestamps #13328

Conversation

shwina commented May 10, 2023 • edited Loading

Description

Checklist

shwina commented May 12, 2023

mroeschke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shwina left a comment

Choose a reason for hiding this comment

shwina commented May 18, 2023

shwina commented May 10, 2023 •

edited

Loading