-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support more units in cudf.DateOffset
#7078
Merged
rapids-bot
merged 61 commits into
rapidsai:branch-0.20
from
brandon-b-miller:enh-dateoffset-more-units
Apr 22, 2021
Merged
Changes from 20 commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
4a4b4af
Merge branch 'branch-0.17' into branch-0.18
shwina 223f2b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina abd6ad2
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 18863b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 0fbdd31
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina dc9b943
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 0504f88
add years to test params
brandon-b-miller e32ea3d
Added yet another parameter
shwina 4cf00ae
start allowing years and framework for timedeltas
brandon-b-miller 537514a
Support years in cudf.DateOffset
shwina ba5fb76
Add test TODO
shwina b88d5dd
relocate binop logic to DateOffset class
brandon-b-miller afaac8a
Add support for remaining units w/ basic tests
shwina 3b718b5
raise if op isnt add or sub
brandon-b-miller 53562ec
disable reflected ops with sub
brandon-b-miller 0eadf71
Add tests for reflected ops with DateOffsets
shwina 49dd46f
improve _DateOffsetScalars and implement from_scalars
brandon-b-miller 3b9a77a
implement negation and allow for multiple kwargs
brandon-b-miller c637b9c
add tests for multiple units
brandon-b-miller d586aa7
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina cb6e5a9
fractional periods tests and xfails
brandon-b-miller 03ac7e7
create test_offset.py
brandon-b-miller f335d58
Style, etc
shwina 719271f
More style
shwina 996fda8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 09b1309
fix pytest and pacify CI
brandon-b-miller 7c9ac23
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 8ae778a
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina d23b8b8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 9a0db21
bpMerge branch 'branch-0.18' of https://github.com/rapidsai/cudf into…
shwina 7baecdc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into e…
shwina a1fd20e
Copyright
shwina b1283e3
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina ed4b022
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina 3f19f82
Merge branch 'branch-0.18' into enh-dateoffset-more-units
shwina e8dbccc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into e…
shwina d81ba85
Add failing construction tests
shwina 1006d0f
Change to combine_timedeltas_to_widest
shwina 8c596cc
updated logic to combine to seconds
brandon-b-miller 0958d44
Improvements
shwina 18c9c31
manually raise for fractional periods
brandon-b-miller 65dffc3
improve error logic and messages
brandon-b-miller df26991
rework binop logic
brandon-b-miller 3b76e53
Small fixes
shwina 486ce58
disallow all fractional periods
brandon-b-miller 9b4eb13
cleanup
brandon-b-miller b2a148a
style
brandon-b-miller 26156b6
Merge branch 'branch-0.19' into enh-dateoffset-more-units
shwina 8f09138
Add a DateOffset._from_freqstr
shwina 56176dd
Changelog
shwina 297c64f
cleanup
brandon-b-miller c7a6620
merge 0.19, resolve conflicts, fix tests
brandon-b-miller 371bc53
Merge branch 'branch-0.20' into enh-dateoffset-more-units
shwina bc995c1
Merge branch 'enh-dateoffset-more-units' of github.com:brandon-b-mill…
shwina 99b5123
Whitespace
shwina 306f83a
Add is_integer
shwina 2d1fa07
Use is_integer when checking the scalars in DateOffset
shwina dcf4735
OverflowError -> NotImplementedError
shwina 7225564
Fix test
shwina b4eefa0
Call `pd.api.types.is_integer_dtype()` when dtype conversion fails
shwina e0f1b5c
Merge branch 'branch-0.20' into enh-dateoffset-more-units
brandon-b-miller File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2019-2020, NVIDIA CORPORATION. | ||
# Copyright (c) 2019-2021, NVIDIA CORPORATION. | ||
|
||
import warnings | ||
|
||
|
@@ -7,6 +7,7 @@ | |
from pandas.core.tools.datetimes import _unit_map | ||
|
||
import cudf | ||
from cudf import _lib as libcudf | ||
from cudf._lib.strings.char_types import is_integer as cpp_is_integer | ||
from cudf.core import column | ||
from cudf.core.index import as_index | ||
|
@@ -334,9 +335,8 @@ def get_units(value): | |
return value | ||
|
||
|
||
class _DateOffsetScalars(object): | ||
def __init__(self, scalars): | ||
self._gpu_scalars = scalars | ||
class _DateOffsetScalars(dict): | ||
pass | ||
|
||
|
||
class _UndoOffsetMeta(pd._libs.tslibs.offsets.OffsetMeta): | ||
|
@@ -434,10 +434,6 @@ def __init__(self, n=1, normalize=False, **kwds): | |
"normalize not yet supported for DateOffset" | ||
) | ||
|
||
# TODO: Pandas supports combinations | ||
if len(kwds) > 1: | ||
raise NotImplementedError("Multiple time units not yet supported") | ||
|
||
all_possible_kwargs = { | ||
"years", | ||
"months", | ||
|
@@ -456,36 +452,101 @@ def __init__(self, n=1, normalize=False, **kwds): | |
"minute", | ||
"second", | ||
"microsecond", | ||
"millisecond" "nanosecond", | ||
"millisecond", | ||
"nanosecond", | ||
} | ||
|
||
supported_kwargs = {"months"} | ||
supported_kwargs = { | ||
"years", | ||
"months", | ||
"weeks", | ||
"days", | ||
"hours", | ||
"minutes", | ||
"seconds", | ||
"microseconds", | ||
"nanoseconds", | ||
} | ||
|
||
super().__init__(n=n, normalize=normalize, **kwds) | ||
|
||
kwds = self._combine_months_and_years(**kwds) | ||
kwds = self._combine_timedeltas_to_nanos(**kwds) | ||
|
||
scalars = {} | ||
for k, v in kwds.items(): | ||
if k in all_possible_kwargs: | ||
# Months must be int16 | ||
dtype = "int16" if k == "months" else None | ||
if k == "months": | ||
dtype = "int16" | ||
elif k == "nanoseconds": | ||
dtype = "timedelta64[ns]" | ||
else: | ||
dtype = None | ||
scalars[k] = cudf.Scalar(v, dtype=dtype) | ||
|
||
super().__init__(n=n, normalize=normalize, **kwds) | ||
|
||
wrong_kwargs = set(kwds.keys()).difference(supported_kwargs) | ||
if len(wrong_kwargs) > 0: | ||
raise ValueError( | ||
raise NotImplementedError( | ||
f"Keyword arguments '{','.join(list(wrong_kwargs))}'" | ||
" are not yet supported in cuDF DateOffsets" | ||
) | ||
self._scalars = _DateOffsetScalars(scalars) | ||
|
||
def _generate_column(self, size, op): | ||
months = self._scalars._gpu_scalars["months"] | ||
def _combine_months_and_years(self, **kwargs): | ||
kwargs["months"] = kwargs.pop("years", 0) * 12 + kwargs.pop( | ||
"months", 0 | ||
) | ||
return kwargs | ||
|
||
def _combine_timedeltas_to_nanos(self, **kwargs): | ||
weeks = kwargs.pop("weeks", 0) | ||
days = kwargs.pop("days", 0) + 7 * weeks | ||
hours = kwargs.pop("hours", 0) + days * 24 | ||
minutes = kwargs.pop("minutes", 0) + hours * 60 | ||
seconds = kwargs.pop("seconds", 0) + minutes * 60 | ||
milliseconds = kwargs.pop("milliseconds", 0) + seconds * 1000 | ||
microseconds = kwargs.pop("microseconds", 0) + milliseconds * 1000 | ||
nanoseconds = kwargs.pop("nanoseconds", 0) + microseconds * 1000 | ||
kwargs["nanoseconds"] = nanoseconds | ||
return kwargs | ||
|
||
def _datetime_binop(self, datetime_col, op, reflect=False): | ||
if reflect and op == "sub": | ||
raise TypeError( | ||
f"Can not subtract a {type(datetime_col).__name__}" | ||
f" from a {type(self).__name__}" | ||
) | ||
if op not in {"add", "sub"}: | ||
raise TypeError( | ||
f"{op} not supported between {type(self).__name__}" | ||
f" and {type(datetime_col).__name__}" | ||
) | ||
if self._is_no_op: | ||
return datetime_col | ||
else: | ||
if "months" in self._scalars: | ||
rhs = self._generate_months_column(len(datetime_col), op) | ||
datetime_col = libcudf.datetime.add_months(datetime_col, rhs) | ||
if "nanoseconds" in self._scalars: | ||
datetime_col = datetime_col + self._generate_nanos_column( | ||
len(datetime_col), op | ||
) | ||
return datetime_col | ||
|
||
def _generate_months_column(self, size, op): | ||
months = self._scalars["months"] | ||
months = -months if op == "sub" else months | ||
# TODO: pass a scalar instead of constructing a column | ||
# https://github.com/rapidsai/cudf/issues/6990 | ||
col = cudf.core.column.as_column(months, length=size) | ||
return col | ||
|
||
def _generate_nanos_column(self, size, op): | ||
nanos = self._scalars["nanoseconds"] | ||
nanos = -nanos if op == "sub" else nanos | ||
return cudf.core.column.as_column(nanos, length=size) | ||
|
||
@property | ||
def _is_no_op(self): | ||
# some logic could be implemented here for more complex cases | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could write a |
||
|
@@ -497,3 +558,7 @@ def __setattr__(self, name, value): | |
raise AttributeError("DateOffset objects are immutable.") | ||
else: | ||
object.__setattr__(self, name, value) | ||
|
||
def __neg__(self): | ||
new_scalars = {k: -v for k, v in self.kwds.items()} | ||
return DateOffset(**new_scalars) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2021, NVIDIA CORPORATION. | ||
|
||
import pytest | ||
|
||
from cudf import DateOffset | ||
|
||
|
||
@pytest.mark.parametrize("period", [1.5, 0.5, "string", "1", "1.0"]) | ||
@pytest.mark.parametrize("freq", ["years", "months"]) | ||
def test_construction_invalid(period, freq): | ||
kwargs = {freq: period} | ||
with pytest.raises(ValueError): | ||
DateOffset(**kwargs) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has huge potential for overflowing the underlying
int64
. What resolutions do we support in cuDF / libcudf? Should we more strategically use those resolutions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a chat with @shwina , we'll move to returning the least common denominator of resolutions that we support.