-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support freq
in DatetimeIndex
#14593
Changes from all commits
9602715
98e5e1e
6b0beee
20ca2bb
b461ecb
45fb6b2
f68a689
0337840
957c7c5
ed3ba3f
ce4f3bd
6ecc6e6
cd00345
6d00347
e1d2315
55266cd
e1b697f
32f622a
0d5c452
112dbc1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -463,13 +463,19 @@ class DateOffset: | |
} | ||
|
||
_CODES_TO_UNITS = { | ||
"N": "nanoseconds", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have some vague recollection that we left these out on purpose... hmm. I think there was some pandas behavior for which There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There were a bunch of failing tests without these changes, adding these units passed the cudf pytests. There is only slight increase in pandas-pytest failures:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, thanks for checking. |
||
"ns": "nanoseconds", | ||
"U": "microseconds", | ||
"us": "microseconds", | ||
"ms": "milliseconds", | ||
"L": "milliseconds", | ||
"s": "seconds", | ||
"S": "seconds", | ||
"m": "minutes", | ||
"min": "minutes", | ||
"T": "minutes", | ||
"h": "hours", | ||
"H": "hours", | ||
"D": "days", | ||
"W": "weeks", | ||
"M": "months", | ||
|
@@ -487,7 +493,7 @@ class DateOffset: | |
pd_offset.Nano: "nanoseconds", | ||
} | ||
|
||
_FREQSTR_REGEX = re.compile("([0-9]*)([a-zA-Z]+)") | ||
_FREQSTR_REGEX = re.compile("([-+]?[0-9]*)([a-zA-Z]+)") | ||
|
||
def __init__(self, n=1, normalize=False, **kwds): | ||
if normalize: | ||
|
@@ -843,10 +849,6 @@ def date_range( | |
arr = cp.linspace(start=start, stop=end, num=periods) | ||
result = cudf.core.column.as_column(arr).astype("datetime64[ns]") | ||
return cudf.DatetimeIndex._from_data({name: result}) | ||
elif cudf.get_option("mode.pandas_compatible"): | ||
raise NotImplementedError( | ||
"`DatetimeIndex` with `freq` cannot be constructed." | ||
) | ||
|
||
# The code logic below assumes `freq` is defined. It is first normalized | ||
# into `DateOffset` for further computation with timestamps. | ||
|
@@ -940,7 +942,7 @@ def date_range( | |
arr = cp.arange(start=start, stop=stop, step=step, dtype="int64") | ||
res = cudf.core.column.as_column(arr).astype("datetime64[ns]") | ||
|
||
return cudf.DatetimeIndex._from_data({name: res}) | ||
return cudf.DatetimeIndex._from_data({name: res}, freq=freq) | ||
|
||
|
||
def _has_fixed_frequency(freq: DateOffset) -> bool: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking on adding
freq
support before, I found some APIs manipulatefreq
(to new values) and return new results. (I vaguely remember..but I think that happens in binops?) Should we add a TODO comment here that this is not fully functional yet andfreq
support needs to be added in rest of the code-base?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although maybe the default behaviour could be for
DatetimeIndex
to inferfreq
from its values. Then this should just work.Also, we should probably only do that in compatibility mode for perf reasons.