Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval index and interval_range #7182

Merged
merged 116 commits into from
Apr 2, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
fd6fb9c
interval dtype and tests
marlenezw Dec 11, 2020
c653ac5
changelog
marlenezw Dec 11, 2020
9549f61
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into i…
marlenezw Dec 14, 2020
4a16266
updated closed parameter and tests.
marlenezw Dec 14, 2020
f843042
adding both and neither parameters.
marlenezw Dec 14, 2020
5e61f1a
updates to accomodate interval dataframes.
marlenezw Dec 16, 2020
82ef308
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into i…
marlenezw Dec 16, 2020
80d0c40
removing comments and resolving interval_dtype issues.
marlenezw Dec 16, 2020
446e849
Update python/cudf/cudf/core/column/column.py
marlenezw Dec 16, 2020
6b3a9f0
resolving merge conflicts
marlenezw Jan 20, 2021
43ee680
Merge branch 'interval_dtype' of https://github.com/marlenezw/cudf in…
marlenezw Jan 20, 2021
ed2edb1
fixing style issues
marlenezw Jan 20, 2021
11ad3bc
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into i…
marlenezw Jan 21, 2021
18a82ae
code for intervalindex and interval range
marlenezw Jan 21, 2021
4255b83
fixing style issues.
marlenezw Jan 21, 2021
8da544b
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into i…
marlenezw Jan 21, 2021
bc6b5b3
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into i…
marlenezw Jan 22, 2021
1a15565
updates to interval_range
marlenezw Jan 25, 2021
88d3390
fixing merge conflicts.
marlenezw Jan 25, 2021
19135c9
fixing merge conflicts
marlenezw Jan 25, 2021
a6fd058
updates to interval_range after tests.
marlenezw Jan 26, 2021
86e4a20
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into i…
marlenezw Jan 26, 2021
48d00a4
Update python/cudf/cudf/core/column/column.py
marlenezw Jan 26, 2021
daeef06
Merge branch 'intervalIndex' of https://github.com/marlenezw/cudf int…
marlenezw Jan 26, 2021
d4ee46a
more tests and changes to range_interval
marlenezw Jan 27, 2021
d88fdb4
added more tests and comments for clarity.
marlenezw Jan 28, 2021
777cdb5
fixing style changes.
marlenezw Jan 28, 2021
a5d192b
fixing merge conflicts
marlenezw Feb 1, 2021
d56ae9a
style changes
marlenezw Feb 1, 2021
e5baeae
updates for branch-0.19
marlenezw Feb 1, 2021
f24f893
adding type annotations.
marlenezw Feb 1, 2021
9914c3a
initial changes
marlenezw Feb 2, 2021
1375e70
changes based on comments.
marlenezw Feb 16, 2021
a6e2cbf
fixing code that caused csv test to fail.
marlenezw Feb 16, 2021
578c54c
style changes.
marlenezw Feb 16, 2021
7d6f40e
fixing merge conflicts.
marlenezw Feb 16, 2021
cca756d
removing changes to changelog.
marlenezw Feb 17, 2021
eb6855b
base changes.
marlenezw Feb 17, 2021
1606292
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Feb 17, 2021
c8e528c
addressing chnages to fame.py and interval.py
marlenezw Feb 17, 2021
0cfe8f9
fixing mypy styling issue.
marlenezw Feb 17, 2021
dba282a
fixing failing index test.
marlenezw Feb 17, 2021
76955cc
Update python/cudf/cudf/core/column/interval.py
marlenezw Feb 18, 2021
4e9ee08
updating from interval dtype changes.
marlenezw Feb 18, 2021
ab788cf
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Feb 18, 2021
7a360d0
adding periods.
marlenezw Feb 18, 2021
444919c
updates to add new param periods and more tests
marlenezw Feb 23, 2021
be4a77a
fixing merge conflicts
marlenezw Feb 23, 2021
4cf229a
fixing mypy style issues.
marlenezw Feb 25, 2021
d16ec1a
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Feb 25, 2021
524709b
fixing mypy failures.
marlenezw Feb 25, 2021
31c9b75
changes to index.py.
marlenezw Feb 25, 2021
1401c11
making code dryer.
marlenezw Feb 25, 2021
9ffa7d2
style changes.
marlenezw Feb 25, 2021
b3f707c
style issue.
marlenezw Feb 25, 2021
45cde0a
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Feb 26, 2021
16d73f6
cleaning up code.
marlenezw Feb 26, 2021
6dc86ec
sytyle.
marlenezw Feb 26, 2021
6764e0b
fixes for mypy.
marlenezw Feb 26, 2021
223f366
cleaning up code.
marlenezw Feb 26, 2021
725847f
changing interval_range to function
marlenezw Mar 2, 2021
2be072d
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 2, 2021
f3245c2
making sure style is ok.
marlenezw Mar 2, 2021
8420555
fixing mypy style issue.
marlenezw Mar 2, 2021
49de389
updates to code.
marlenezw Mar 3, 2021
606bfe1
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 3, 2021
4ea34c3
changes to interval_range and intervalindex
marlenezw Mar 3, 2021
af4ea1d
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 4, 2021
c00ca33
changes to for tests to pass.
marlenezw Mar 5, 2021
819d04a
fixing merge conflicts
marlenezw Mar 5, 2021
6412ab4
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 8, 2021
f6189cf
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 9, 2021
6fd7eb9
cleaning up docs and adding from_breaks method in intervalindex.
marlenezw Mar 9, 2021
92be49f
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 9, 2021
7085def
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 10, 2021
6d062c9
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
8ec8716
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
906d82c
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
37e988b
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
3a5fd3f
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
5e38756
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
4140525
Update python/cudf/cudf/core/index.py
marlenezw Mar 25, 2021
c1cdb93
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 29, 2021
25a9112
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 29, 2021
f38e34b
adding initial updated changes.
marlenezw Mar 29, 2021
dbfcd13
addressing review comments.
marlenezw Mar 30, 2021
121a9be
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 30, 2021
1790abd
updated changes.
marlenezw Mar 31, 2021
a5fa304
Update python/cudf/cudf/core/index.py
marlenezw Mar 31, 2021
191743e
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Mar 31, 2021
8175120
changes for determining bin_edges dtype
marlenezw Mar 31, 2021
edb68ab
removing my example notebook :)
marlenezw Mar 31, 2021
caab20e
style changes
marlenezw Mar 31, 2021
d48e8f0
Merge branch 'intervalIndex' of https://github.com/marlenezw/cudf int…
marlenezw Mar 31, 2021
b66d16c
determining final dtype.
marlenezw Apr 1, 2021
375a00f
slight changes to docs.
marlenezw Apr 1, 2021
a7eddb7
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Apr 1, 2021
a9f6eb0
adding changes to start.
marlenezw Apr 1, 2021
c00b609
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Apr 1, 2021
ecb6245
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into i…
marlenezw Apr 1, 2021
4b37080
changing periods to int
marlenezw Apr 1, 2021
00ae319
finding common type to get overall type.
marlenezw Apr 1, 2021
9a306a0
figuring out mypy issues.
marlenezw Apr 1, 2021
33e6699
updates to code and style changes.
marlenezw Apr 1, 2021
31dfa59
Adding support for heterogenous data type spec for start, end, freq
isVoid Apr 2, 2021
f5f1453
Update python/cudf/cudf/core/index.py
marlenezw Apr 2, 2021
479b6df
Update python/cudf/cudf/core/index.py
marlenezw Apr 2, 2021
5c192ff
column empty
marlenezw Apr 2, 2021
d3e2204
Merge branch 'intervalIndex' of https://github.com/marlenezw/cudf int…
marlenezw Apr 2, 2021
dd290ae
change use of as_host_type to as_type
isVoid Apr 2, 2021
ba75420
argument data type check logic fix
isVoid Apr 2, 2021
62fe73d
bug fix, coerce periods to int before
isVoid Apr 2, 2021
f4ff52c
adding empty column.
marlenezw Apr 2, 2021
9044d57
Remove list instantiation for any
isVoid Apr 2, 2021
e715c6c
Merge branch 'intervalIndex' of https://github.com/marlenezw/cudf int…
isVoid Apr 2, 2021
603ba75
Update python/cudf/cudf/core/index.py
marlenezw Apr 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 50 additions & 25 deletions python/cudf/cudf/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,34 +11,36 @@
import pandas as pd
from nvtx import annotate
from pandas._config import get_option
from cudf._lib.filling import sequence

import cudf
from cudf._lib.filling import sequence
from cudf._typing import DtypeObj
from cudf.core.abc import Serializable
from cudf.core.column import (
CategoricalColumn,
IntervalColumn,
ColumnBase,
DatetimeColumn,
IntervalColumn,
NumericalColumn,
StringColumn,
TimeDeltaColumn,
column,
arange,
column,
)
from cudf.core.column.string import StringMethods as StringMethods
from cudf.core.dtypes import IntervalDtype
from cudf.core.frame import Frame
from cudf.utils import ioutils, utils
from cudf.utils.docutils import copy_docstring
from cudf.utils.dtypes import (
find_common_type,
is_categorical_dtype,
is_interval_dtype,
is_list_like,
is_mixed_with_object_dtype,
is_numerical_dtype,
is_scalar,
numeric_normalize_types,
is_interval_dtype,
)
from cudf.utils.utils import cached_property, search_range

Expand Down Expand Up @@ -2811,57 +2813,80 @@ def interval_range(
"Of the four parameters: start, end, periods, and "
"freq, exactly three must be specified"
)
if not isinstance(start or freq or end, int) and not isinstance(
start or freq or end, float
args = [start, end, freq, periods]
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
*args, periods = [cudf.Scalar(x) if x is not None else None for x in args]
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
if any(
[
not is_numerical_dtype(x.dtype) if x is not None else False
for x in args
]
isVoid marked this conversation as resolved.
Show resolved Hide resolved
):
raise NotImplementedError("Non-numeric values not yet supported")
elif periods and not freq:
raise ValueError("start, end, freq must be numeric values.")
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
common_dtype = find_common_type([x.dtype for x in args if x])
start, end, freq = args

if periods and not freq:
# if statement for mypy to pass
if end is not None and start is not None:
# determine if periods are float or integer
periods = int(periods)
quotient, remainder = divmod((end - start), periods)
periods = periods._as_host_type("int64")
# divmod only supported on host side scalars
quotient, remainder = divmod((end - start).value, periods.value)
if remainder:
freq_step = cudf.Scalar((end - start) / periods)
else:
freq_step = cudf.Scalar(quotient)
start = cudf.Scalar(start)
common_dtype = find_common_type([common_dtype, freq_step.dtype])
if start.dtype != freq_step.dtype:
start = cudf.Scalar(start.value.astype(freq_step.dtype))
start = start.device_value
freq_step = freq_step.device_value
bin_edges = sequence(size=periods + 1, init=start, step=freq_step,)
left_col = bin_edges[:-1]
right_col = bin_edges[1:]
start = start._as_host_type(freq_step.dtype)
bin_edges = sequence(
size=periods + 1,
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved
init=start.device_value,
step=freq_step.device_value,
)
left_col = bin_edges[:-1].astype(common_dtype)
right_col = bin_edges[1:].astype(common_dtype)
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved
elif freq and periods:
if end:
start = end - (freq * periods)
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved
if start:
end = freq * periods + start
if end is not None and start is not None:
left_col = arange(start, end, freq)
left_col = arange(
start.value, end.value, freq.value, dtype=common_dtype
)
end = end + 1
start = start + freq
right_col = arange(start, end, freq)
right_col = arange(
start.value, end.value, freq.value, dtype=common_dtype
)
elif freq and not periods:
if end is not None and start is not None:
end = end - freq + 1
left_col = arange(start, end, freq)
left_col = arange(
start.value, end.value, freq.value, dtype=common_dtype
)
end = end + freq + 1
start = start + freq
right_col = arange(start, end, freq)
right_col = arange(
start.value, end.value, freq.value, dtype=common_dtype
)
elif start is not None and end is not None:
# if statements for mypy to pass
if freq:
left_col = arange(start, end, freq)
left_col = arange(
start.value, end.value, freq.value, dtype=common_dtype
)
else:
left_col = arange(start, end)
left_col = arange(start.value, end.value, dtype=common_dtype)
start = start + 1
end = end + 1
if freq:
right_col = arange(start, end, freq)
right_col = arange(
start.value, end.value, freq.value, dtype=common_dtype
)
else:
right_col = arange(start, end)
right_col = arange(start.value, end.value, dtype=common_dtype)
else:
raise ValueError(
"Of the four parameters: start, end, periods, and "
Expand Down
3 changes: 3 additions & 0 deletions python/cudf/cudf/core/scalar.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,9 @@ def _dispatch_scalar_unaop(self, op):
def astype(self, dtype):
return Scalar(self.device_value, dtype)

def _as_host_type(self, dtype):
return Scalar(self.value, dtype)
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved


class _NAType(object):
def __init__(self):
Expand Down
102 changes: 102 additions & 0 deletions python/cudf/cudf/tests/test_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1361,6 +1361,18 @@ def test_categorical_index_basic(data, categories, dtype, ordered, name):
assert_eq(pindex, gindex)


INTERVAL_BOUNDARY_TYPES = [
int,
np.int8,
np.int16,
np.int32,
np.int64,
np.float32,
np.float64,
cudf.Scalar,
]


@pytest.mark.parametrize("closed", ["left", "right", "both", "neither"])
@pytest.mark.parametrize("start", [0, 1, 2, 3])
@pytest.mark.parametrize("end", [4, 5, 6, 7])
Expand All @@ -1371,6 +1383,18 @@ def test_interval_range_basic(start, end, closed):
assert_eq(pindex, gindex)


@pytest.mark.parametrize("start_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("end_t", INTERVAL_BOUNDARY_TYPES)
def test_interval_range_dtype_basic(start_t, end_t):
start, end = start_t(24), end_t(42)
start_val = start.value if isinstance(start, cudf.Scalar) else start
end_val = end.value if isinstance(end, cudf.Scalar) else end
pindex = pd.interval_range(start=start_val, end=end_val, closed="left")
gindex = cudf.interval_range(start=start, end=end, closed="left")

assert_eq(pindex, gindex)


@pytest.mark.parametrize("closed", ["left", "right", "both", "neither"])
@pytest.mark.parametrize("start", [0])
@pytest.mark.parametrize("end", [0])
Expand All @@ -1394,6 +1418,24 @@ def test_interval_range_freq_basic(start, end, freq, closed):
assert_eq(pindex, gindex)


@pytest.mark.parametrize("start_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("end_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("freq_t", INTERVAL_BOUNDARY_TYPES)
def test_interval_range_freq_basic_dtype(start_t, end_t, freq_t):
start, end, freq = start_t(5), end_t(70), freq_t(3)
start_val = start.value if isinstance(start, cudf.Scalar) else start
end_val = end.value if isinstance(end, cudf.Scalar) else end
freq_val = freq.value if isinstance(freq, cudf.Scalar) else freq
pindex = pd.interval_range(
start=start_val, end=end_val, freq=freq_val, closed="left"
)
gindex = cudf.interval_range(
start=start, end=end, freq=freq, closed="left"
)

assert_eq(pindex, gindex)


@pytest.mark.parametrize("closed", ["left", "right", "both", "neither"])
@pytest.mark.parametrize("periods", [1, 1.0, 2, 2.0, 3.0, 3])
@pytest.mark.parametrize("start", [0, 0.0, 1.0, 1, 2, 2.0, 3.0, 3])
Expand All @@ -1409,6 +1451,26 @@ def test_interval_range_periods_basic(start, end, periods, closed):
assert_eq(pindex, gindex)


@pytest.mark.parametrize("start_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("end_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("periods_t", INTERVAL_BOUNDARY_TYPES)
def test_interval_range_periods_basic_dtype(start_t, end_t, periods_t):
start, end, periods = start_t(0), end_t(4), periods_t(1.0)
start_val = start.value if isinstance(start, cudf.Scalar) else start
end_val = end.value if isinstance(end, cudf.Scalar) else end
periods_val = (
periods.value if isinstance(periods, cudf.Scalar) else periods
)
pindex = pd.interval_range(
start=start_val, end=end_val, periods=periods_val, closed="left"
)
gindex = cudf.interval_range(
start=start, end=end, periods=periods, closed="left"
)

assert_eq(pindex, gindex)


@pytest.mark.parametrize("closed", ["left", "right", "both", "neither"])
@pytest.mark.parametrize("periods", [1, 2, 3])
@pytest.mark.parametrize("freq", [1, 2, 3, 4])
Expand All @@ -1424,6 +1486,26 @@ def test_interval_range_periods_freq_end(end, freq, periods, closed):
assert_eq(pindex, gindex)


@pytest.mark.parametrize("periods_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("freq_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("end_t", INTERVAL_BOUNDARY_TYPES)
def test_interval_range_periods_freq_end_dtype(periods_t, freq_t, end_t):
periods, freq, end = periods_t(2), freq_t(3), end_t(10)
freq_val = freq.value if isinstance(freq, cudf.Scalar) else freq
end_val = end.value if isinstance(end, cudf.Scalar) else end
periods_val = (
periods.value if isinstance(periods, cudf.Scalar) else periods
)
pindex = pd.interval_range(
end=end_val, freq=freq_val, periods=periods_val, closed="left"
)
gindex = cudf.interval_range(
end=end, freq=freq, periods=periods, closed="left"
)

assert_eq(pindex, gindex)


@pytest.mark.parametrize("closed", ["left", "right", "both", "neither"])
@pytest.mark.parametrize("periods", [1, 2, 3])
@pytest.mark.parametrize("freq", [1, 2, 3, 4])
Expand All @@ -1439,6 +1521,26 @@ def test_interval_range_periods_freq_start(start, freq, periods, closed):
assert_eq(pindex, gindex)


@pytest.mark.parametrize("periods_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("freq_t", INTERVAL_BOUNDARY_TYPES)
@pytest.mark.parametrize("start_t", INTERVAL_BOUNDARY_TYPES)
def test_interval_range_periods_freq_start_dtype(periods_t, freq_t, start_t):
periods, freq, start = periods_t(2), freq_t(3), start_t(9)
freq_val = freq.value if isinstance(freq, cudf.Scalar) else freq
start_val = start.value if isinstance(start, cudf.Scalar) else start
periods_val = (
periods.value if isinstance(periods, cudf.Scalar) else periods
)
pindex = pd.interval_range(
start=start_val, freq=freq_val, periods=periods_val, closed="left"
)
gindex = cudf.interval_range(
start=start, freq=freq, periods=periods, closed="left"
)

assert_eq(pindex, gindex)


@pytest.mark.parametrize("closed", ["right", "left", "both", "neither"])
@pytest.mark.parametrize(
"data",
Expand Down