Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more units in cudf.DateOffset #7078

Merged
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
4a4b4af
Merge branch 'branch-0.17' into branch-0.18
shwina Dec 11, 2020
223f2b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Dec 15, 2020
abd6ad2
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Dec 17, 2020
18863b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 4, 2021
0fbdd31
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 5, 2021
dc9b943
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 5, 2021
0504f88
add years to test params
brandon-b-miller Jan 5, 2021
e32ea3d
Added yet another parameter
shwina Jan 5, 2021
4cf00ae
start allowing years and framework for timedeltas
brandon-b-miller Jan 5, 2021
537514a
Support years in cudf.DateOffset
shwina Jan 5, 2021
ba5fb76
Add test TODO
shwina Jan 5, 2021
b88d5dd
relocate binop logic to DateOffset class
brandon-b-miller Jan 5, 2021
afaac8a
Add support for remaining units w/ basic tests
shwina Jan 5, 2021
3b718b5
raise if op isnt add or sub
brandon-b-miller Jan 6, 2021
53562ec
disable reflected ops with sub
brandon-b-miller Jan 6, 2021
0eadf71
Add tests for reflected ops with DateOffsets
shwina Jan 6, 2021
49dd46f
improve _DateOffsetScalars and implement from_scalars
brandon-b-miller Jan 6, 2021
3b9a77a
implement negation and allow for multiple kwargs
brandon-b-miller Jan 6, 2021
c637b9c
add tests for multiple units
brandon-b-miller Jan 6, 2021
d586aa7
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 7, 2021
cb6e5a9
fractional periods tests and xfails
brandon-b-miller Jan 8, 2021
03ac7e7
create test_offset.py
brandon-b-miller Jan 8, 2021
f335d58
Style, etc
shwina Jan 8, 2021
719271f
More style
shwina Jan 8, 2021
996fda8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 8, 2021
09b1309
fix pytest and pacify CI
brandon-b-miller Jan 12, 2021
7c9ac23
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 15, 2021
8ae778a
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 21, 2021
d23b8b8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 26, 2021
9a0db21
bpMerge branch 'branch-0.18' of https://github.com/rapidsai/cudf into…
shwina Jan 27, 2021
7baecdc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into e…
shwina Jan 28, 2021
a1fd20e
Copyright
shwina Jan 28, 2021
b1283e3
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 29, 2021
ed4b022
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Feb 1, 2021
3f19f82
Merge branch 'branch-0.18' into enh-dateoffset-more-units
shwina Feb 1, 2021
e8dbccc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into e…
shwina Feb 1, 2021
d81ba85
Add failing construction tests
shwina Feb 1, 2021
1006d0f
Change to combine_timedeltas_to_widest
shwina Feb 1, 2021
8c596cc
updated logic to combine to seconds
brandon-b-miller Feb 2, 2021
0958d44
Improvements
shwina Feb 2, 2021
18c9c31
manually raise for fractional periods
brandon-b-miller Feb 2, 2021
65dffc3
improve error logic and messages
brandon-b-miller Feb 2, 2021
df26991
rework binop logic
brandon-b-miller Feb 2, 2021
3b76e53
Small fixes
shwina Feb 9, 2021
486ce58
disallow all fractional periods
brandon-b-miller Feb 9, 2021
9b4eb13
cleanup
brandon-b-miller Feb 9, 2021
b2a148a
style
brandon-b-miller Feb 9, 2021
26156b6
Merge branch 'branch-0.19' into enh-dateoffset-more-units
shwina Feb 9, 2021
8f09138
Add a DateOffset._from_freqstr
shwina Feb 9, 2021
56176dd
Changelog
shwina Feb 9, 2021
297c64f
cleanup
brandon-b-miller Feb 9, 2021
c7a6620
merge 0.19, resolve conflicts, fix tests
brandon-b-miller Mar 26, 2021
371bc53
Merge branch 'branch-0.20' into enh-dateoffset-more-units
shwina Apr 13, 2021
bc995c1
Merge branch 'enh-dateoffset-more-units' of github.com:brandon-b-mill…
shwina Apr 13, 2021
99b5123
Whitespace
shwina Apr 13, 2021
306f83a
Add is_integer
shwina Apr 13, 2021
2d1fa07
Use is_integer when checking the scalars in DateOffset
shwina Apr 13, 2021
dcf4735
OverflowError -> NotImplementedError
shwina Apr 13, 2021
7225564
Fix test
shwina Apr 13, 2021
b4eefa0
Call `pd.api.types.is_integer_dtype()` when dtype conversion fails
shwina Apr 13, 2021
e0f1b5c
Merge branch 'branch-0.20' into enh-dateoffset-more-units
brandon-b-miller Apr 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions python/cudf/cudf/core/column/datetime.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Copyright (c) 2019-2020, NVIDIA CORPORATION.
# Copyright (c) 2019-2021, NVIDIA CORPORATION.

from __future__ import annotations

import datetime as dt
Expand Down Expand Up @@ -252,7 +253,7 @@ def binary_operator(
reflect: bool = False,
) -> ColumnBase:
if isinstance(rhs, cudf.DateOffset):
return binop_offset(self, rhs, op)
return rhs._datetime_binop(self, op, reflect=reflect)
lhs, rhs = self, rhs
if op in ("eq", "ne", "lt", "gt", "le", "ge"):
out_dtype = np.bool
Expand Down
97 changes: 81 additions & 16 deletions python/cudf/cudf/core/tools/datetimes.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2020, NVIDIA CORPORATION.
# Copyright (c) 2019-2021, NVIDIA CORPORATION.

import warnings

Expand All @@ -7,6 +7,7 @@
from pandas.core.tools.datetimes import _unit_map

import cudf
from cudf import _lib as libcudf
from cudf._lib.strings.char_types import is_integer as cpp_is_integer
from cudf.core import column
from cudf.core.index import as_index
Expand Down Expand Up @@ -334,9 +335,8 @@ def get_units(value):
return value


class _DateOffsetScalars(object):
def __init__(self, scalars):
self._gpu_scalars = scalars
class _DateOffsetScalars(dict):
pass


class _UndoOffsetMeta(pd._libs.tslibs.offsets.OffsetMeta):
Expand Down Expand Up @@ -434,10 +434,6 @@ def __init__(self, n=1, normalize=False, **kwds):
"normalize not yet supported for DateOffset"
)

# TODO: Pandas supports combinations
if len(kwds) > 1:
raise NotImplementedError("Multiple time units not yet supported")

all_possible_kwargs = {
"years",
"months",
Expand All @@ -456,36 +452,101 @@ def __init__(self, n=1, normalize=False, **kwds):
"minute",
"second",
"microsecond",
"millisecond" "nanosecond",
"millisecond",
"nanosecond",
}

supported_kwargs = {"months"}
supported_kwargs = {
"years",
"months",
"weeks",
"days",
"hours",
"minutes",
"seconds",
"microseconds",
"nanoseconds",
}

super().__init__(n=n, normalize=normalize, **kwds)

kwds = self._combine_months_and_years(**kwds)
kwds = self._combine_timedeltas_to_nanos(**kwds)

scalars = {}
for k, v in kwds.items():
if k in all_possible_kwargs:
# Months must be int16
dtype = "int16" if k == "months" else None
if k == "months":
dtype = "int16"
elif k == "nanoseconds":
dtype = "timedelta64[ns]"
else:
dtype = None
scalars[k] = cudf.Scalar(v, dtype=dtype)

super().__init__(n=n, normalize=normalize, **kwds)

wrong_kwargs = set(kwds.keys()).difference(supported_kwargs)
if len(wrong_kwargs) > 0:
raise ValueError(
raise NotImplementedError(
f"Keyword arguments '{','.join(list(wrong_kwargs))}'"
" are not yet supported in cuDF DateOffsets"
)
self._scalars = _DateOffsetScalars(scalars)

def _generate_column(self, size, op):
months = self._scalars._gpu_scalars["months"]
def _combine_months_and_years(self, **kwargs):
kwargs["months"] = kwargs.pop("years", 0) * 12 + kwargs.pop(
"months", 0
)
return kwargs

def _combine_timedeltas_to_nanos(self, **kwargs):
weeks = kwargs.pop("weeks", 0)
days = kwargs.pop("days", 0) + 7 * weeks
hours = kwargs.pop("hours", 0) + days * 24
minutes = kwargs.pop("minutes", 0) + hours * 60
seconds = kwargs.pop("seconds", 0) + minutes * 60
milliseconds = kwargs.pop("milliseconds", 0) + seconds * 1000
microseconds = kwargs.pop("microseconds", 0) + milliseconds * 1000
nanoseconds = kwargs.pop("nanoseconds", 0) + microseconds * 1000
kwargs["nanoseconds"] = nanoseconds
return kwargs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has huge potential for overflowing the underlying int64. What resolutions do we support in cuDF / libcudf? Should we more strategically use those resolutions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chat with @shwina , we'll move to returning the least common denominator of resolutions that we support.


def _datetime_binop(self, datetime_col, op, reflect=False):
if reflect and op == "sub":
raise TypeError(
f"Can not subtract a {type(datetime_col).__name__}"
f" from a {type(self).__name__}"
)
if op not in {"add", "sub"}:
raise TypeError(
f"{op} not supported between {type(self).__name__}"
f" and {type(datetime_col).__name__}"
)
if self._is_no_op:
return datetime_col
else:
if "months" in self._scalars:
rhs = self._generate_months_column(len(datetime_col), op)
datetime_col = libcudf.datetime.add_months(datetime_col, rhs)
if "nanoseconds" in self._scalars:
datetime_col = datetime_col + self._generate_nanos_column(
len(datetime_col), op
)
return datetime_col

def _generate_months_column(self, size, op):
months = self._scalars["months"]
months = -months if op == "sub" else months
# TODO: pass a scalar instead of constructing a column
# https://github.com/rapidsai/cudf/issues/6990
col = cudf.core.column.as_column(months, length=size)
return col

def _generate_nanos_column(self, size, op):
nanos = self._scalars["nanoseconds"]
nanos = -nanos if op == "sub" else nanos
return cudf.core.column.as_column(nanos, length=size)

@property
def _is_no_op(self):
# some logic could be implemented here for more complex cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could write a nanos method and check if self.nanos == 0.

Expand All @@ -497,3 +558,7 @@ def __setattr__(self, name, value):
raise AttributeError("DateOffset objects are immutable.")
else:
object.__setattr__(self, name, value)

def __neg__(self):
new_scalars = {k: -v for k, v in self.kwds.items()}
return DateOffset(**new_scalars)
181 changes: 175 additions & 6 deletions python/cudf/cudf/tests/test_binops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1474,12 +1474,32 @@ def test_scalar_power_invalid(dtype_l, dtype_r):
],
)
@pytest.mark.parametrize("n_periods", [0, 1, -1, 12, -12])
@pytest.mark.parametrize("frequency", ["months"])
@pytest.mark.parametrize(
"frequency",
[
"months",
"years",
"days",
"hours",
"minutes",
"seconds",
"microseconds",
pytest.param(
"nanoseconds",
marks=pytest.mark.xfail(
reason="https://github.com/pandas-dev/pandas/issues/36589"
),
),
],
)
@pytest.mark.parametrize(
"dtype",
["datetime64[ns]", "datetime64[us]", "datetime64[ms]", "datetime64[s]"],
)
def test_datetime_dateoffset_binaryop(date_col, n_periods, frequency, dtype):
@pytest.mark.parametrize("op", [operator.add, operator.sub])
def test_datetime_dateoffset_binaryop(
date_col, n_periods, frequency, dtype, op
):
gsr = cudf.Series(date_col, dtype=dtype)
psr = gsr.to_pandas() # converts to nanos

Expand All @@ -1488,16 +1508,165 @@ def test_datetime_dateoffset_binaryop(date_col, n_periods, frequency, dtype):
goffset = cudf.DateOffset(**kwargs)
poffset = pd.DateOffset(**kwargs)

expect = psr + poffset
got = gsr + goffset
expect = op(psr, poffset)
got = op(gsr, goffset)

utils.assert_eq(expect, got)

expect = op(psr, -poffset)
got = op(gsr, -goffset)

utils.assert_eq(expect, got)


@pytest.mark.parametrize(
"date_col",
[
[
"2000-01-01 00:00:00.012345678",
"2000-01-31 00:00:00.012345678",
"2000-02-29 00:00:00.012345678",
]
],
)
@pytest.mark.parametrize(
"kwargs",
[
{"weeks": 0.5},
{"days": 0.5},
{"hours": 0.5},
{"minutes": 0.5},
{"seconds": 0.5},
utils.xfail_param({"seconds": 0.5e-6}),
utils.xfail_param({"microseconds": 0.5}),
],
)
@pytest.mark.parametrize(
"dtype",
["datetime64[ns]", "datetime64[us]", "datetime64[ms]", "datetime64[s]"],
)
@pytest.mark.parametrize("op", [operator.add, operator.sub])
def test_datetime_dateoffset_binaryop_fractional_periods(
date_col, kwargs, dtype, op
):
gsr = cudf.Series(date_col, dtype=dtype)
psr = gsr.to_pandas() # converts to nanos

goffset = cudf.DateOffset(**kwargs)
poffset = pd.DateOffset(**kwargs)

expect = op(psr, poffset)
got = op(gsr, goffset)

utils.assert_eq(expect, got)


@pytest.mark.parametrize(
"date_col",
[
[
"2000-01-01 00:00:00.012345678",
"2000-01-31 00:00:00.012345678",
"2000-02-29 00:00:00.012345678",
]
],
)
@pytest.mark.parametrize(
"kwargs",
[
{"months": 2, "years": 5},
{"microseconds": 1, "seconds": 1},
{"months": 2, "years": 5, "seconds": 923, "microseconds": 481},
pytest.param(
{"milliseconds": 4},
marks=pytest.mark.xfail(
reason="Pandas gets the wrong answer for milliseconds"
),
),
pytest.param(
{"milliseconds": 4, "years": 2},
marks=pytest.mark.xfail(
reason="Pandas construction fails with these keywords"
),
),
pytest.param(
{"nanoseconds": 12},
marks=pytest.mark.xfail(
reason="Pandas gets the wrong answer for nanoseconds"
),
),
],
)
@pytest.mark.parametrize("op", [operator.add, operator.sub])
def test_datetime_dateoffset_binaryop_multiple(date_col, kwargs, op):

gsr = cudf.Series(date_col, dtype="datetime64[ns]")
psr = gsr.to_pandas()

poffset = pd.DateOffset(**kwargs)
goffset = cudf.DateOffset(**kwargs)

expect = op(psr, poffset)
got = op(gsr, goffset)

utils.assert_eq(expect, got)

expect = psr - poffset
got = gsr - goffset

@pytest.mark.parametrize(
"date_col",
[
[
"2000-01-01 00:00:00.012345678",
"2000-01-31 00:00:00.012345678",
"2000-02-29 00:00:00.012345678",
]
],
)
@pytest.mark.parametrize("n_periods", [0, 1, -1, 12, -12])
@pytest.mark.parametrize(
"frequency",
[
"months",
"years",
"days",
"hours",
"minutes",
"seconds",
"microseconds",
pytest.param(
"nanoseconds",
marks=pytest.mark.xfail(
reason="https://github.com/pandas-dev/pandas/issues/36589"
),
),
],
)
@pytest.mark.parametrize(
"dtype",
["datetime64[ns]", "datetime64[us]", "datetime64[ms]", "datetime64[s]"],
)
def test_datetime_dateoffset_binaryop_reflected(
date_col, n_periods, frequency, dtype
):
gsr = cudf.Series(date_col, dtype=dtype)
psr = gsr.to_pandas() # converts to nanos

kwargs = {frequency: n_periods}

goffset = cudf.DateOffset(**kwargs)
poffset = pd.DateOffset(**kwargs)

expect = poffset + psr
got = goffset + gsr

utils.assert_eq(expect, got)

with pytest.raises(TypeError):
poffset - psr

with pytest.raises(TypeError):
goffset - gsr


@pytest.mark.parametrize("frame", [cudf.Series, cudf.Index, cudf.DataFrame])
@pytest.mark.parametrize(
Expand Down
13 changes: 13 additions & 0 deletions python/cudf/cudf/tests/test_offset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2021, NVIDIA CORPORATION.

import pytest

from cudf import DateOffset


@pytest.mark.parametrize("period", [1.5, 0.5, "string", "1", "1.0"])
@pytest.mark.parametrize("freq", ["years", "months"])
def test_construction_invalid(period, freq):
kwargs = {freq: period}
with pytest.raises(ValueError):
DateOffset(**kwargs)
Loading