Support `freq` in DatetimeIndex #14593

shwina · 2023-12-07T14:06:47Z

When a DatetimeIndex has a fixed frequency offset, pandas defaults to it having a .freq attribute. Because we don't support that, we raise in pandas compatible mode.

Thus, working with datetimes is practically impossible in pandas compatible mode because so many datetime operations involve setting a datetime column as an index (resample, groupby).

This PR adds rudimentary support for the freq attribute.

bdice

Can we add some tests to this PR?

python/cudf/cudf/core/index.py

galipremsagar · 2023-12-07T14:57:32Z

python/cudf/cudf/core/index.py

@@ -2142,6 +2141,8 @@ def __init__(
        if yearfirst is not False:
            raise NotImplementedError("yearfirst == True is not yet supported")

+        self._freq = _validate_freq(freq)


While looking on adding freq support before, I found some APIs manipulate freq(to new values) and return new results. (I vaguely remember..but I think that happens in binops?) Should we add a TODO comment here that this is not fully functional yet and freq support needs to be added in rest of the code-base?

Yes, although maybe the default behaviour could be for DatetimeIndex to infer freq from its values. Then this should just work.

Also, we should probably only do that in compatibility mode for perf reasons.

…hwina/cudf into support-freq-in-datetime-index

Co-authored-by: Bradley Dice <[email protected]>

bdice · 2023-12-07T22:31:05Z

python/cudf/cudf/tests/test_datetime.py

+                    }
+                )
+            ),
+            reason="Nanosecond offsets being dropped by pandas, which is "


Is this better solved by fixing the condition on the parameter, which should be "pandas < 2.0"?

https://github.com/shwina/cudf/blob/ed3ba3ff17cf686d1e6e38f01073d27b1be64799/python/cudf/cudf/tests/test_datetime.py#L1512

I wanted to do that but it happens only for a few parameter combinations and we currently xpass/xfail strictly. That's the reason for the current approach.

I know we have two diverging approaches at the same place but I plan on dropping these in pandas-2.0 feature branch.

Okay. We can clean it up later.

bdice · 2023-12-07T22:32:47Z

python/cudf/cudf/core/tools/datetimes.py

@@ -463,13 +463,19 @@ class DateOffset:
    }

    _CODES_TO_UNITS = {
+        "N": "nanoseconds",


I have some vague recollection that we left these out on purpose... hmm. I think there was some pandas behavior for which "L" and "ms" were okay but "N", "U", "T", etc. were not supported. We'd probably be able to tell if there are any newly failing pandas tests? I'd just check to see where _CODES_TO_UNITS is used and if there are any inconsistencies with this across different APIs.

There were a bunch of failing tests without these changes, adding these units passed the cudf pytests.

There is only slight increase in pandas-pytest failures:

# This PR: = 12094 failed, 174794 passed, 3850 skipped, 3314 xfailed, 8 xpassed, 21406 warnings, 102 errors in 1516.39s (0:25:16) = # `branch-24.02`: = 11607 failed, 175286 passed, 3849 skipped, 3312 xfailed, 11 xpassed, 21414 warnings, 97 errors in 1493.35s (0:24:53) =

Sounds good, thanks for checking.

bdice

Approving with a few final comments.

python/cudf/cudf/core/index.py

bdice · 2023-12-08T14:17:19Z

python/cudf/cudf/core/tools/datetimes.py

@@ -463,13 +463,19 @@ class DateOffset:
    }

    _CODES_TO_UNITS = {
+        "N": "nanoseconds",


Sounds good, thanks for checking.

bdice · 2023-12-08T14:18:24Z

python/cudf/cudf/tests/test_datetime.py

+                    }
+                )
+            ),
+            reason="Nanosecond offsets being dropped by pandas, which is "


Okay. We can clean it up later.

Co-authored-by: Bradley Dice <[email protected]>

wence-

I am confused by some of the validation steps.

python/cudf/cudf/core/index.py

…hwina/cudf into support-freq-in-datetime-index

bdice

Fix the repr -- then this is good from my side.

python/cudf/cudf/core/index.py

galipremsagar · 2023-12-12T17:25:47Z

/merge

Support freq in DatetimeIndex

9602715

github-actions bot added the Python Affects Python cuDF API. label Dec 7, 2023

bdice reviewed Dec 7, 2023

View reviewed changes

python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved

galipremsagar reviewed Dec 7, 2023

View reviewed changes

galipremsagar self-assigned this Dec 7, 2023

shwina and others added 9 commits December 7, 2023 11:12

"T" is minutes

98e5e1e

Add more string aliases

6b0beee

Define resamplers

20ca2bb

fix metadata issues

b461ecb

Merge branch 'support-freq-in-datetime-index' of https://github.com/s…

45fb6b2

…hwina/cudf into support-freq-in-datetime-index

Merge branch 'branch-24.02' into support-freq-in-datetime-index

f68a689

Fix more cases

0337840

Apply suggestions from code review

957c7c5

Co-authored-by: Bradley Dice <[email protected]>

fix more cases

ed3ba3f

galipremsagar marked this pull request as ready for review December 7, 2023 22:08

galipremsagar requested a review from a team as a code owner December 7, 2023 22:08

galipremsagar requested review from wence- and brandon-b-miller December 7, 2023 22:08

galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 7, 2023

bdice reviewed Dec 7, 2023

View reviewed changes

address reviews

ce4f3bd

galipremsagar approved these changes Dec 8, 2023

View reviewed changes

Merge branch 'branch-24.02' into support-freq-in-datetime-index

6ecc6e6

bdice approved these changes Dec 8, 2023

View reviewed changes

Apply suggestions from code review

cd00345

Co-authored-by: Bradley Dice <[email protected]>

wence- requested changes Dec 8, 2023

View reviewed changes

galipremsagar added 2 commits December 8, 2023 15:56

fix freq calculations

6d00347

Merge branch 'support-freq-in-datetime-index' of https://github.com/s…

e1d2315

…hwina/cudf into support-freq-in-datetime-index

Add validation

55266cd

galipremsagar requested a review from wence- December 8, 2023 16:04

bdice approved these changes Dec 8, 2023

View reviewed changes

python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved

galipremsagar added 2 commits December 8, 2023 16:24

Simplify repr

e1b697f

Merge branch 'branch-24.02' into support-freq-in-datetime-index

32f622a

wence- reviewed Dec 8, 2023

View reviewed changes

python/cudf/cudf/core/index.py Show resolved Hide resolved

wence- approved these changes Dec 8, 2023

View reviewed changes

galipremsagar added 2 commits December 12, 2023 02:26

Handle freq in groupby ops

0d5c452

Merge branch 'branch-24.02' into support-freq-in-datetime-index

112dbc1

rapids-bot bot merged commit a9dc521 into rapidsai:branch-24.02 Dec 12, 2023
67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `freq` in DatetimeIndex #14593

Support `freq` in DatetimeIndex #14593

shwina commented Dec 7, 2023

bdice left a comment

galipremsagar Dec 7, 2023

shwina Dec 7, 2023

bdice Dec 7, 2023

galipremsagar Dec 8, 2023

galipremsagar Dec 8, 2023

bdice Dec 8, 2023

bdice Dec 7, 2023 •

edited

Loading

galipremsagar Dec 8, 2023

bdice Dec 8, 2023

bdice left a comment

bdice Dec 8, 2023

bdice Dec 8, 2023

wence- left a comment

bdice left a comment

galipremsagar commented Dec 12, 2023

Support freq in DatetimeIndex #14593

Support freq in DatetimeIndex #14593

Conversation

shwina commented Dec 7, 2023

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

galipremsagar commented Dec 12, 2023

Support `freq` in DatetimeIndex #14593

Support `freq` in DatetimeIndex #14593

bdice Dec 7, 2023 •

edited

Loading