Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

part1: Simplify BaseIndex to an abstract class #10389

Merged
merged 67 commits into from
Oct 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b791c36
create new PR
skirui-source Mar 2, 2022
ec7cacd
Merge branch 'branch-22.04' of https://github.com/rapidsai/cudf into …
skirui-source Mar 22, 2022
97ca0e1
Merge branch 'branch-22.06' of https://github.com/rapidsai/cudf into …
skirui-source Mar 23, 2022
96f9069
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Jul 20, 2022
cb14c85
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
skirui-source Jul 26, 2022
5c71342
uncomment methods calling to _values() in baseindex class
skirui-source Jul 26, 2022
bfe627e
all references to _values removed
skirui-source Jul 27, 2022
219c21a
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
skirui-source Jul 27, 2022
7025353
wip: adding methods to GenericIndex class
skirui-source Jul 27, 2022
0f3ef27
forgot to add isin in GenericIndex Class
skirui-source Jul 28, 2022
37e64f1
changed to GenericIndex, add _clean_nulls_from_index to RangeIndex to…
skirui-source Jul 28, 2022
ac49b9c
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
skirui-source Jul 28, 2022
89d4fab
fix style issue
skirui-source Jul 28, 2022
008b8d8
fix doctrings for append()
skirui-source Jul 28, 2022
da15cf8
added isin() and to_frame() to RangeIndex class
skirui-source Jul 28, 2022
f35d739
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
skirui-source Jul 28, 2022
47718a7
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Aug 5, 2022
5bf62d5
fixed merge conflict in index.py
skirui-source Aug 5, 2022
2368c3b
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Aug 5, 2022
d69cc2e
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Aug 30, 2022
d63df44
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 1, 2022
c964884
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 6, 2022
def9e5a
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 7, 2022
b76520a
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 7, 2022
5cb3de7
added _values() in BI, values() in GI
skirui-source Sep 8, 2022
1b5a63c
uncomment methods for rangeindex
skirui-source Sep 8, 2022
a98e08d
property decorator for values() in RI
skirui-source Sep 8, 2022
cc33c5a
implem. values() w/o _values in RI/GI
skirui-source Sep 8, 2022
1d043bc
implem. to_series, any and append in RangeIndex
skirui-source Sep 9, 2022
e58c169
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 9, 2022
d000c8e
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 14, 2022
6d4c48f
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 15, 2022
2427c35
fix merge conflict in _base_index.py
skirui-source Sep 20, 2022
867944c
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Sep 21, 2022
7c63953
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 21, 2022
f56c693
added tests for RI - values, to_frame, to_series, any, append isin
skirui-source Sep 23, 2022
f77a6c4
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 23, 2022
277b317
added benchmark for rangeindex.isin
skirui-source Sep 23, 2022
c1267b7
comment code: cupy implem. for RI.isin
skirui-source Sep 23, 2022
f66589a
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 27, 2022
74acfb3
Update python/cudf/cudf/core/column/column.py
skirui-source Sep 27, 2022
ab0e1c4
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Sep 27, 2022
f493400
Update python/cudf/cudf/core/index.py
skirui-source Sep 27, 2022
5680d30
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Sep 27, 2022
8004014
change to abstract in BI: values, to_frame, any, to_pandas, append
skirui-source Sep 27, 2022
e5a391e
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 27, 2022
3af1cf4
change to abstract in BI: to_series, isin, unique
skirui-source Sep 27, 2022
6194c27
change any() implem. for RI
skirui-source Sep 27, 2022
3da35f0
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Sep 28, 2022
29de223
raise NotImeplementedError fixes
skirui-source Sep 28, 2022
3381579
update any() in RI
skirui-source Sep 28, 2022
828a4e4
moved overlap methods from RI/GI -> BI
skirui-source Sep 28, 2022
6336f6d
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 29, 2022
cae92e9
Merge branch 'BaseIndex_simplify' of github.com:skirui-source/cudf in…
skirui-source Sep 29, 2022
8e1a5bf
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Sep 30, 2022
d59fd8a
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Oct 3, 2022
d88b83a
address change to append() in GI
skirui-source Oct 3, 2022
1441ded
revert append(), isna() and notna()
skirui-source Oct 4, 2022
84627f6
uncomment test_index_append_list
skirui-source Oct 4, 2022
8dcd8b7
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
skirui-source Oct 5, 2022
e912399
change implem. of any() in RI, moved _clean_nulls to BI
skirui-source Oct 5, 2022
f396256
split isin() to subclasses
skirui-source Oct 5, 2022
e6fdb6f
added more tests
skirui-source Oct 6, 2022
0a9d7e6
minor edit
skirui-source Oct 6, 2022
26216f8
condense tests
skirui-source Oct 6, 2022
82f5dfc
Merge branch 'branch-22.12' of https://github.com/rapidsai/cudf into …
skirui-source Oct 6, 2022
407a2cc
fix merge conflict
skirui-source Oct 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions python/cudf/benchmarks/API/bench_rangeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,8 @@ def bench_min(benchmark, rangeindex):
def bench_where(benchmark, rangeindex):
cond = rangeindex % 2 == 0
benchmark(rangeindex.where, cond, 0)


def bench_isin(benchmark, rangeindex):
values = [10, 100]
benchmark(rangeindex.isin, values)
240 changes: 108 additions & 132 deletions python/cudf/cudf/core/_base_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,7 @@
from cudf.core.column import ColumnBase, column
from cudf.core.column_accessor import ColumnAccessor
from cudf.utils import ioutils
from cudf.utils.dtypes import (
is_mixed_with_object_dtype,
numeric_normalize_types,
)
from cudf.utils.dtypes import is_mixed_with_object_dtype

_index_astype_docstring = """\
Create an Index with values cast to dtypes.
Expand Down Expand Up @@ -90,7 +87,7 @@ def size(self):

@property
def values(self):
return self._values.values
raise NotImplementedError

def get_loc(self, key, method=None, tolerance=None):
raise NotImplementedError
Expand Down Expand Up @@ -188,12 +185,7 @@ def _clean_nulls_from_index(self):
methods using this method to replace or handle representation
of the actual types correctly.
"""
if self._values.has_nulls():
return cudf.Index(
self._values.astype("str").fillna(cudf._NA_REP), name=self.name
)
else:
return self
raise NotImplementedError

@property
def is_monotonic(self):
Expand Down Expand Up @@ -549,13 +541,11 @@ def to_frame(self, index=True, name=None):
Set the index of the returned DataFrame as the original Index
name : str, default None
Name to be used for the column

Returns
-------
DataFrame
cudf DataFrame
"""

if name is not None:
col_name = name
elif self.name is None:
Expand All @@ -570,7 +560,40 @@ def any(self):
"""
Return whether any elements is True in Index.
"""
return self._values.any()
raise NotImplementedError

def isna(self):
"""
Detect missing values.

Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`cudf.NaN`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values.

Returns
-------
numpy.ndarray[bool]
A boolean array to indicate which entries are NA.

"""
raise NotImplementedError

def notna(self):
"""
Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``.
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.

Returns
-------
numpy.ndarray[bool]
A boolean array to indicate which entries are not NA.
"""
raise NotImplementedError

def to_pandas(self):
skirui-source marked this conversation as resolved.
Show resolved Hide resolved
"""
Expand All @@ -589,7 +612,75 @@ def to_pandas(self):
>>> type(idx)
<class 'cudf.core.index.Int64Index'>
"""
return pd.Index(self._values.to_pandas(), name=self.name)
raise NotImplementedError

def isin(self, values):
"""Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in
the passed set of values. The length of the returned boolean
array matches the length of the index.

Parameters
----------
values : set, list-like, Index
Sought values.

Returns
-------
is_contained : cupy array
CuPy array of boolean values.

Examples
--------
>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
"""
# To match pandas behavior, even though only list-like objects are
# supposed to be passed, only scalars throw errors. Other types (like
# dicts) just transparently return False (see the implementation of
# ColumnBase.isin).
raise NotImplementedError

def unique(self):
"""
Return unique values in the index.

Returns
-------
Index without duplicates
"""
raise NotImplementedError

def to_series(self, index=None, name=None):
"""
Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.

Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.

Returns
-------
Series
The dtype will be based on the type of the Index values.
"""
return cudf.Series._from_data(
self._data,
index=self.copy(deep=False) if index is None else index,
name=self.name if name is None else name,
)

@ioutils.doc_to_dlpack()
def to_dlpack(self):
Expand All @@ -599,7 +690,7 @@ def to_dlpack(self):

def append(self, other):
skirui-source marked this conversation as resolved.
Show resolved Hide resolved
"""
Append a collection of Index options together.
Append a collection of Index objects together.

Parameters
----------
Expand All @@ -626,45 +717,7 @@ def append(self, other):
>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
"""

if is_list_like(other):
to_concat = [self]
to_concat.extend(other)
else:
this = self
if len(other) == 0:
# short-circuit and return a copy
to_concat = [self]

other = cudf.Index(other)

if len(self) == 0:
to_concat = [other]

if len(self) and len(other):
if is_mixed_with_object_dtype(this, other):
got_dtype = (
other.dtype
if this.dtype == cudf.dtype("object")
else this.dtype
)
raise TypeError(
f"cudf does not support appending an Index of "
f"dtype `{cudf.dtype('object')}` with an Index "
f"of dtype `{got_dtype}`, please type-cast "
f"either one of them to same dtypes."
)

if isinstance(self._values, cudf.core.column.NumericalColumn):
if self.dtype != other.dtype:
this, other = numeric_normalize_types(self, other)
to_concat = [this, other]

for obj in to_concat:
if not isinstance(obj, BaseIndex):
raise TypeError("all inputs must be Index")

return self._concat(to_concat)
raise NotImplementedError

def difference(self, other, sort=None):
"""
Expand Down Expand Up @@ -1119,18 +1172,6 @@ def sort_values(
else:
return index_sorted

def unique(self):
skirui-source marked this conversation as resolved.
Show resolved Hide resolved
"""
Return unique values in the index.

Returns
-------
Index without duplicates
"""
return cudf.core.index._index_from_data(
{self.name: self._values.unique()}, name=self.name
)

def join(
self, other, how="left", level=None, return_indexers=False, sort=False
):
Expand Down Expand Up @@ -1263,30 +1304,6 @@ def rename(self, name, inplace=False):
out.name = name
return out

def to_series(self, index=None, name=None):
skirui-source marked this conversation as resolved.
Show resolved Hide resolved
"""
Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.

Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Dame of resulting Series. If None, defaults to name of original
index.

Returns
-------
Series
The dtype will be based on the type of the Index values.
"""
return cudf.Series(
self._values,
index=self.copy(deep=False) if index is None else index,
name=self.name if name is None else name,
)

def get_slice_bound(self, label, side, kind=None):
"""
Calculate slice bound that corresponds to given label.
Expand Down Expand Up @@ -1339,47 +1356,6 @@ def __array_function__(self, func, types, args, kwargs):
else:
return NotImplemented

def isin(self, values):
skirui-source marked this conversation as resolved.
Show resolved Hide resolved
"""Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in
the passed set of values. The length of the returned boolean
array matches the length of the index.

Parameters
----------
values : set, list-like, Index
Sought values.

Returns
-------
is_contained : cupy array
CuPy array of boolean values.

Examples
--------
>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
"""

# To match pandas behavior, even though only list-like objects are
# supposed to be passed, only scalars throw errors. Other types (like
# dicts) just transparently return False (see the implementation of
# ColumnBase.isin).
if is_scalar(values):
raise TypeError(
"only list-like objects are allowed to be passed "
f"to isin(), you passed a {type(values).__name__}"
)

return self._values.isin(values).values

@classmethod
def from_pandas(cls, index, nan_as_null=None):
"""
Expand Down
2 changes: 1 addition & 1 deletion python/cudf/cudf/core/column/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def __init__(self, parent: SeriesOrSingleColumnIndex):
super().__init__(parent=parent)

@property
def categories(self) -> "cudf.core.index.BaseIndex":
def categories(self) -> "cudf.core.index.GenericIndex":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always CategoricalIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepts CategoricalAccesor or SingleColumnCategoricalIndex and returns the following:

  • Index (pandas) or StringIndex (cudf) >> object dtypes

  • Int64Index or Float64Index >> numeric dtypes

  • FloatIndex (pandas) and StringIndex (cudf) >> None dtypes

  • Index (pandas) and GenericIndex (cudf) >> boolean dtypes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. This is a property of CategoricalAccessor, right? What do you mean by "accepts"? Isn't the accessor always created via Series.cat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion, I didn't realize thatCategoricalIndexhas its own .categories method (separate from the CategoricalAccessor class)

"""
The categories of this categorical.
"""
Expand Down
Loading