Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more units in cudf.DateOffset #7078

Merged
Merged
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
4a4b4af
Merge branch 'branch-0.17' into branch-0.18
shwina Dec 11, 2020
223f2b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Dec 15, 2020
abd6ad2
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Dec 17, 2020
18863b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 4, 2021
0fbdd31
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 5, 2021
dc9b943
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 5, 2021
0504f88
add years to test params
brandon-b-miller Jan 5, 2021
e32ea3d
Added yet another parameter
shwina Jan 5, 2021
4cf00ae
start allowing years and framework for timedeltas
brandon-b-miller Jan 5, 2021
537514a
Support years in cudf.DateOffset
shwina Jan 5, 2021
ba5fb76
Add test TODO
shwina Jan 5, 2021
b88d5dd
relocate binop logic to DateOffset class
brandon-b-miller Jan 5, 2021
afaac8a
Add support for remaining units w/ basic tests
shwina Jan 5, 2021
3b718b5
raise if op isnt add or sub
brandon-b-miller Jan 6, 2021
53562ec
disable reflected ops with sub
brandon-b-miller Jan 6, 2021
0eadf71
Add tests for reflected ops with DateOffsets
shwina Jan 6, 2021
49dd46f
improve _DateOffsetScalars and implement from_scalars
brandon-b-miller Jan 6, 2021
3b9a77a
implement negation and allow for multiple kwargs
brandon-b-miller Jan 6, 2021
c637b9c
add tests for multiple units
brandon-b-miller Jan 6, 2021
d586aa7
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 7, 2021
cb6e5a9
fractional periods tests and xfails
brandon-b-miller Jan 8, 2021
03ac7e7
create test_offset.py
brandon-b-miller Jan 8, 2021
f335d58
Style, etc
shwina Jan 8, 2021
719271f
More style
shwina Jan 8, 2021
996fda8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 8, 2021
09b1309
fix pytest and pacify CI
brandon-b-miller Jan 12, 2021
7c9ac23
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 15, 2021
8ae778a
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 21, 2021
d23b8b8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 26, 2021
9a0db21
bpMerge branch 'branch-0.18' of https://github.com/rapidsai/cudf into…
shwina Jan 27, 2021
7baecdc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into e…
shwina Jan 28, 2021
a1fd20e
Copyright
shwina Jan 28, 2021
b1283e3
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 29, 2021
ed4b022
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Feb 1, 2021
3f19f82
Merge branch 'branch-0.18' into enh-dateoffset-more-units
shwina Feb 1, 2021
e8dbccc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into e…
shwina Feb 1, 2021
d81ba85
Add failing construction tests
shwina Feb 1, 2021
1006d0f
Change to combine_timedeltas_to_widest
shwina Feb 1, 2021
8c596cc
updated logic to combine to seconds
brandon-b-miller Feb 2, 2021
0958d44
Improvements
shwina Feb 2, 2021
18c9c31
manually raise for fractional periods
brandon-b-miller Feb 2, 2021
65dffc3
improve error logic and messages
brandon-b-miller Feb 2, 2021
df26991
rework binop logic
brandon-b-miller Feb 2, 2021
3b76e53
Small fixes
shwina Feb 9, 2021
486ce58
disallow all fractional periods
brandon-b-miller Feb 9, 2021
9b4eb13
cleanup
brandon-b-miller Feb 9, 2021
b2a148a
style
brandon-b-miller Feb 9, 2021
26156b6
Merge branch 'branch-0.19' into enh-dateoffset-more-units
shwina Feb 9, 2021
8f09138
Add a DateOffset._from_freqstr
shwina Feb 9, 2021
56176dd
Changelog
shwina Feb 9, 2021
297c64f
cleanup
brandon-b-miller Feb 9, 2021
c7a6620
merge 0.19, resolve conflicts, fix tests
brandon-b-miller Mar 26, 2021
371bc53
Merge branch 'branch-0.20' into enh-dateoffset-more-units
shwina Apr 13, 2021
bc995c1
Merge branch 'enh-dateoffset-more-units' of github.com:brandon-b-mill…
shwina Apr 13, 2021
99b5123
Whitespace
shwina Apr 13, 2021
306f83a
Add is_integer
shwina Apr 13, 2021
2d1fa07
Use is_integer when checking the scalars in DateOffset
shwina Apr 13, 2021
dcf4735
OverflowError -> NotImplementedError
shwina Apr 13, 2021
7225564
Fix test
shwina Apr 13, 2021
b4eefa0
Call `pd.api.types.is_integer_dtype()` when dtype conversion fails
shwina Apr 13, 2021
e0f1b5c
Merge branch 'branch-0.20' into enh-dateoffset-more-units
brandon-b-miller Apr 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/cudf/cudf/core/column/datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ def binary_operator(
reflect: bool = False,
) -> ColumnBase:
if isinstance(rhs, cudf.DateOffset):
return binop_offset(self, rhs, op)
return rhs._datetime_binop(self, op, reflect=reflect)
lhs, rhs = self, rhs
if op in ("eq", "ne", "lt", "gt", "le", "ge", "NULL_EQUALS"):
out_dtype = np.dtype(np.bool_) # type: Dtype
Expand Down
258 changes: 180 additions & 78 deletions python/cudf/cudf/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@
from typing import Sequence, Union

import numpy as np
import pandas as pd
from pandas.core.tools.datetimes import _unit_map

import cudf
from cudf import _lib as libcudf
from cudf._lib.strings.convert.convert_integers import (
is_integer as cpp_is_integer,
)
from cudf.core import column
from cudf.core.index import as_index
from cudf.utils.dtypes import is_scalar
from cudf.utils.dtypes import is_integer, is_scalar

_unit_dtype_map = {
"ns": "datetime64[ns]",
Expand Down Expand Up @@ -337,67 +337,31 @@ def get_units(value):
return value


class _DateOffsetScalars(object):
def __init__(self, scalars):
self._gpu_scalars = scalars
class DateOffset:

_UNITS_TO_CODES = {
"nanoseconds": "ns",
"microseconds": "us",
"milliseconds": "ms",
"seconds": "s",
"minutes": "m",
"hours": "h",
"days": "D",
"weeks": "W",
"months": "M",
"years": "Y",
}

class _UndoOffsetMeta(pd._libs.tslibs.offsets.OffsetMeta):
"""
For backward compatibility reasons, `pd.DateOffset` is defined
with a metaclass `OffsetMeta`, which makes it such that any
subclass of `pd._libs.tslibs.offset.BaseOffset` is reported as
a subclass of `pd.DateOffset`.

Because we subclass `pd.DateOffset`, we inherit this behaviour,
but don't want to. This metaclass inherits from `OffsetMeta`
and restores normal instance and subclass checking to any
classes that use it.
"""

@classmethod
def __instancecheck__(cls, obj) -> bool:
return type.__instancecheck__(cls, obj)

@classmethod
def __subclasscheck__(cls, obj) -> bool:
return type.__subclasscheck__(cls, obj)

_CODES_TO_UNITS = {v: k for k, v in _UNITS_TO_CODES.items()}

class DateOffset(pd.DateOffset, metaclass=_UndoOffsetMeta):
def __init__(self, n=1, normalize=False, **kwds):
"""
An object used for binary ops where calendrical arithmetic
is desired rather than absolute time arithmetic. Used to
add or subtract a whole number of periods, such as several
months or years, to a series or index of datetime dtype.
Works similarly to pd.DateOffset, and currently supports a
subset of its functionality. The arguments that aren't yet
supported are:
- years
- weeks
- days
- hours
- minutes
- seconds
- microseconds
- milliseconds
- nanoseconds
In addition, cuDF does not yet support DateOffset arguments
that 'replace' units in the datetime data being operated on
such as
- year
- month
- week
- day
- hour
- minute
- second
- microsecond
- millisecond
- nanosecond
Finally, cuDF does not yet support rounding via a `normalize`
keyword argument.
Works similarly to pd.DateOffset, but stores the offset
on the device (GPU).

Parameters
----------
Expand Down Expand Up @@ -431,24 +395,40 @@ def __init__(self, n=1, normalize=False, **kwds):
1 1999-01-31 00:00:00.012345678
2 1999-02-28 00:00:00.012345678
dtype: datetime64[ns]

Notes
-----
Note that cuDF does not yet support DateOffset arguments
that 'replace' units in the datetime data being operated on
such as
- year
- month
- week
- day
- hour
- minute
- second
- microsecond
- millisecond
- nanosecond

cuDF does not yet support rounding via a `normalize`
keyword argument.
"""
if normalize:
raise NotImplementedError(
"normalize not yet supported for DateOffset"
)

# TODO: Pandas supports combinations
if len(kwds) > 1:
raise NotImplementedError("Multiple time units not yet supported")

all_possible_kwargs = {
all_possible_units = {
"years",
"months",
"weeks",
"days",
"hours",
"minutes",
"seconds",
"milliseconds",
"microseconds",
"nanoseconds",
"year",
Expand All @@ -459,30 +439,120 @@ def __init__(self, n=1, normalize=False, **kwds):
"minute",
"second",
"microsecond",
"millisecond" "nanosecond",
"millisecond",
"nanosecond",
}

supported_units = {
"years",
"months",
"weeks",
"days",
"hours",
"minutes",
"seconds",
"milliseconds",
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved
"microseconds",
"nanoseconds",
}

supported_kwargs = {"months"}
unsupported_units = all_possible_units - supported_units

invalid_kwds = set(kwds) - supported_units - unsupported_units
if invalid_kwds:
raise TypeError(
f"Keyword arguments '{','.join(list(invalid_kwds))}'"
" are not recognized"
)

unsupported_kwds = set(kwds) & unsupported_units
if unsupported_kwds:
raise NotImplementedError(
f"Keyword arguments '{','.join(list(unsupported_kwds))}'"
" are not yet supported."
)

if any(not is_integer(val) for val in kwds.values()):
raise ValueError("Non-integer periods not supported")

self._kwds = kwds
kwds = self._combine_months_and_years(**kwds)
kwds = self._combine_kwargs_to_seconds(**kwds)

scalars = {}
for k, v in kwds.items():
if k in all_possible_kwargs:
if k in all_possible_units:
# Months must be int16
dtype = "int16" if k == "months" else None
if k == "months":
# TODO: throw for out-of-bounds int16 values
dtype = "int16"
else:
unit = self._UNITS_TO_CODES[k]
dtype = np.dtype(f"timedelta64[{unit}]")
scalars[k] = cudf.Scalar(v, dtype=dtype)

super().__init__(n=n, normalize=normalize, **kwds)
self._scalars = scalars

wrong_kwargs = set(kwds.keys()).difference(supported_kwargs)
if len(wrong_kwargs) > 0:
raise ValueError(
f"Keyword arguments '{','.join(list(wrong_kwargs))}'"
" are not yet supported in cuDF DateOffsets"
@property
def kwds(self):
return self._kwds

def _combine_months_and_years(self, **kwargs):
# TODO: if months is zero, don't do a binop
kwargs["months"] = kwargs.pop("years", 0) * 12 + kwargs.pop(
"months", 0
)
return kwargs

def _combine_kwargs_to_seconds(self, **kwargs):
"""
Combine days, weeks, hours and minutes to a single
scalar representing the total seconds
"""
seconds = 0
seconds += kwargs.pop("weeks", 0) * 604800
seconds += kwargs.pop("days", 0) * 86400
seconds += kwargs.pop("hours", 0) * 3600
seconds += kwargs.pop("minutes", 0) * 60
seconds += kwargs.pop("seconds", 0)

if seconds > np.iinfo("int64").max:
raise NotImplementedError(
"Total days + weeks + hours + minutes + seconds can not exceed"
f" {np.iinfo('int64').max} seconds"
)

if seconds != 0:
kwargs["seconds"] = seconds
return kwargs

def _datetime_binop(self, datetime_col, op, reflect=False):
if reflect and op == "sub":
raise TypeError(
f"Can not subtract a {type(datetime_col).__name__}"
f" from a {type(self).__name__}"
)
self._scalars = _DateOffsetScalars(scalars)
if op not in {"add", "sub"}:
raise TypeError(
f"{op} not supported between {type(self).__name__}"
f" and {type(datetime_col).__name__}"
)
if not self._is_no_op:
if "months" in self._scalars:
rhs = self._generate_months_column(len(datetime_col), op)
datetime_col = libcudf.datetime.add_months(datetime_col, rhs)

for unit, value in self._scalars.items():
if unit != "months":
value = -value if op == "sub" else value
datetime_col += cudf.core.column.as_column(
value, length=len(datetime_col)
)

return datetime_col

def _generate_column(self, size, op):
months = self._scalars._gpu_scalars["months"]
def _generate_months_column(self, size, op):
months = self._scalars["months"]
months = -months if op == "sub" else months
# TODO: pass a scalar instead of constructing a column
# https://github.com/rapidsai/cudf/issues/6990
Expand All @@ -493,13 +563,45 @@ def _generate_column(self, size, op):
def _is_no_op(self):
# some logic could be implemented here for more complex cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could write a nanos method and check if self.nanos == 0.

# such as +1 year, -12 months
return all([i == 0 for i in self.kwds.values()])
return all([i == 0 for i in self._kwds.values()])

def __setattr__(self, name, value):
if not isinstance(value, _DateOffsetScalars):
raise AttributeError("DateOffset objects are immutable.")
else:
object.__setattr__(self, name, value)
def __neg__(self):
new_scalars = {k: -v for k, v in self._kwds.items()}
return DateOffset(**new_scalars)

def __repr__(self):
includes = []
for unit in sorted(self._UNITS_TO_CODES):
val = self._kwds.get(unit, None)
if val is not None:
includes.append(f"{unit}={val}")
unit_data = ", ".join(includes)
repr_str = f"<{self.__class__.__name__}: {unit_data}>"

return repr_str

@classmethod
def _from_freqstr(cls, freqstr):
"""
Parse a string and return a DateOffset object
expects strings of the form 3D, 25W, 10ms, 42ns, etc.
"""
numeric_part = ""
freq_part = ""

for x in freqstr:
if x.isdigit():
numeric_part += x
else:
freq_part += x

if (
freq_part not in cls._CODES_TO_UNITS
or not numeric_part + freq_part == freqstr
):
raise ValueError(f"Cannot interpret frequency str: {freqstr}")

return cls(**{cls._CODES_TO_UNITS[freq_part]: int(numeric_part)})


def _isin_datetimelike(
Expand Down
Loading