Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index class deprecation enforcements #13204

Merged
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a0e8501
Index refactor
galipremsagar Apr 24, 2023
732a23e
cleanup
galipremsagar Apr 24, 2023
1bfd351
Merge branch 'pandas_2.0_feature_branch' into Index_2.0
shwina Apr 24, 2023
79295af
Merge branch 'pandas_2.0_feature_branch' into Index_2.0
galipremsagar Apr 24, 2023
e04b773
Fix repr
galipremsagar Apr 24, 2023
64c5a22
cleanup
galipremsagar Apr 24, 2023
856d76f
Merge branch 'Index_2.0' of https://github.com/galipremsagar/cudf int…
galipremsagar Apr 24, 2023
bb37649
Drop child metaclass
galipremsagar Apr 24, 2023
316f7fc
Merge branch 'pandas_2.0_feature_branch' into Index_2.0
galipremsagar Apr 25, 2023
fee57e7
Fix MultiIndex.append
galipremsagar Apr 25, 2023
37c6c96
Merge branch 'pandas_2.0_feature_branch' into Index_2.0
galipremsagar Apr 27, 2023
e3d7045
Simplify setting names
galipremsagar Apr 27, 2023
c569891
Merge
galipremsagar May 15, 2023
f119a29
Address reviews
galipremsagar May 15, 2023
211f986
address review
galipremsagar May 16, 2023
5d91a4f
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar May 16, 2023
94c42ec
Fix dtype for Timedelta days
galipremsagar May 16, 2023
aec5d74
merge
galipremsagar May 17, 2023
b77f286
Address reviews
galipremsagar May 17, 2023
12399b7
make my-py happy
galipremsagar May 17, 2023
819f0c5
cleanup
galipremsagar May 18, 2023
d63c875
Add comment
galipremsagar May 19, 2023
2a3c559
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar May 22, 2023
54875d9
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar May 23, 2023
eaa1143
Merge branch 'pandas_2.0_feature_branch' into Index_2.0
galipremsagar May 25, 2023
11e3ccc
Merge branch 'pandas_2.0_feature_branch' into Index_2.0
galipremsagar May 26, 2023
fcb4313
fix import
galipremsagar May 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/cudf/source/developer_guide/library_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Finally we tie these pieces together to provide a more holistic view of the proj
% class IndexedFrame
% class SingleColumnFrame
% class BaseIndex
% class GenericIndex
% class Index
% class MultiIndex
% class RangeIndex
% class DataFrame
Expand All @@ -42,8 +42,8 @@ Finally we tie these pieces together to provide a more holistic view of the proj
% BaseIndex <|-- MultiIndex
% Frame <|-- MultiIndex
%
% BaseIndex <|-- GenericIndex
% SingleColumnFrame <|-- GenericIndex
% BaseIndex <|-- Index
% SingleColumnFrame <|-- Index
%
% @enduml

Expand Down Expand Up @@ -89,12 +89,12 @@ While we've highlighted some exceptional cases of Indexes before, let's start wi
In practice, `BaseIndex` does have concrete implementations of a small set of methods.
However, currently many of these implementations are not applicable to all subclasses and will be eventually be removed.

Almost all indexes are subclasses of `GenericIndex`, a single-columned index with the class hierarchy:
Almost all indexes are subclasses of `Index`, a single-columned index with the class hierarchy:
```python
class GenericIndex(SingleColumnFrame, BaseIndex)
class Index(SingleColumnFrame, BaseIndex)
```
Integer, float, or string indexes are all composed of a single column of data.
Most `GenericIndex` methods are inherited from `Frame`, saving us the trouble of rewriting them.
Most `Index` methods are inherited from `Frame`, saving us the trouble of rewriting them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is point 3 below (Index is a factory class) still correct with these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Updated it. Let me know if you think it needs more clarification.


We now consider the three main exceptions to this model:

Expand Down
24 changes: 0 additions & 24 deletions python/cudf/cudf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,22 +35,10 @@
BaseIndex,
CategoricalIndex,
DatetimeIndex,
Float32Index,
Float64Index,
GenericIndex,
Index,
Int8Index,
Int16Index,
Int32Index,
Int64Index,
IntervalIndex,
RangeIndex,
StringIndex,
TimedeltaIndex,
UInt8Index,
UInt16Index,
UInt32Index,
UInt64Index,
interval_range,
)
from cudf.core.missing import NA
Expand Down Expand Up @@ -124,15 +112,8 @@
"DatetimeIndex",
"Decimal32Dtype",
"Decimal64Dtype",
"Float32Index",
"Float64Index",
"GenericIndex",
"Grouper",
"Index",
"Int16Index",
"Int32Index",
"Int64Index",
"Int8Index",
"IntervalDtype",
"IntervalIndex",
"ListDtype",
Expand All @@ -141,13 +122,8 @@
"RangeIndex",
"Scalar",
"Series",
"StringIndex",
"StructDtype",
"TimedeltaIndex",
"UInt16Index",
"UInt32Index",
"UInt64Index",
"UInt8Index",
"api",
"concat",
"crosstab",
Expand Down
6 changes: 2 additions & 4 deletions python/cudf/cudf/_typing.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
# Copyright (c) 2021-2023, NVIDIA CORPORATION.

import sys
from typing import TYPE_CHECKING, Any, Callable, Dict, Iterable, TypeVar, Union
Expand Down Expand Up @@ -37,9 +37,7 @@

DataFrameOrSeries = Union["cudf.Series", "cudf.DataFrame"]
SeriesOrIndex = Union["cudf.Series", "cudf.core.index.BaseIndex"]
SeriesOrSingleColumnIndex = Union[
"cudf.Series", "cudf.core.index.GenericIndex"
]
SeriesOrSingleColumnIndex = Union["cudf.Series", "cudf.core.index.Index"]

# Groupby aggregation
AggType = Union[str, Callable]
Expand Down
70 changes: 35 additions & 35 deletions python/cudf/cudf/core/_base_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,9 @@
>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
Index([1.0, 2.0, 3.0], dtype='float64')
"""

BaseIndexT = TypeVar("BaseIndexT", bound="BaseIndex")
Expand Down Expand Up @@ -136,7 +136,7 @@ def get_level_values(self, level):
>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
Index(['a', 'b', 'c'], dtype='object')
"""

if level == self.name:
Expand Down Expand Up @@ -226,15 +226,15 @@ def hasnans(self):
>>> import numpy as np
>>> index = cudf.Index([1, 2, np.nan, 3, 4], nan_as_null=False)
>>> index
Float64Index([1.0, 2.0, nan, 3.0, 4.0], dtype='float64')
Index([1.0, 2.0, nan, 3.0, 4.0], dtype='float64')
>>> index.hasnans
True

`hasnans` returns `True` for the presence of any `NA` values:

>>> index = cudf.Index([1, 2, None, 3, 4])
>>> index
Int64Index([1, 2, <NA>, 3, 4], dtype='int64')
Index([1, 2, <NA>, 3, 4], dtype='int64')
>>> index.hasnans
True
"""
Expand Down Expand Up @@ -287,9 +287,9 @@ def set_names(self, names, level=None, inplace=False):
>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
Expand Down Expand Up @@ -348,7 +348,7 @@ def union(self, other, sort=None):
>>> idx1 = cudf.Index([1, 2, 3, 4])
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Index([1, 2, 3, 4, 5, 6], dtype='int64')

MultiIndex case

Expand Down Expand Up @@ -438,7 +438,7 @@ def intersection(self, other, sort=False):
>>> idx1 = cudf.Index([1, 2, 3, 4])
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
Index([3, 4], dtype='int64')

MultiIndex case

Expand Down Expand Up @@ -542,9 +542,9 @@ def fillna(self, value, downcast=None):
>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, <NA>, 4], dtype='int64')
Index([1, 2, <NA>, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
Index([1, 2, 3, 4], dtype='int64')
"""
if downcast is not None:
raise NotImplementedError(
Expand Down Expand Up @@ -636,13 +636,13 @@ def to_pandas(self, nullable=False):
>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
<class 'pandas.core.indexes.base.Index'>
>>> type(idx)
<class 'cudf.core.index.Int64Index'>
<class 'cudf.core.index.Index'>
"""
raise NotImplementedError

Expand All @@ -667,7 +667,7 @@ def isin(self, values):
--------
>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

Expand Down Expand Up @@ -737,17 +737,17 @@ def append(self, other):
>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
"""
raise NotImplementedError

Expand Down Expand Up @@ -779,14 +779,14 @@ def difference(self, other, sort=None):
>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
Index([2, 1], dtype='int64')
"""
if sort not in {None, False}:
raise ValueError(
Expand Down Expand Up @@ -1232,18 +1232,18 @@ def sort_values(
>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices `idx` was
sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
(Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
dtype=int32))

Sorting values in a MultiIndex:
Expand Down Expand Up @@ -1320,7 +1320,7 @@ def join(
names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a": [1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
(1, 2)],
Expand Down Expand Up @@ -1403,12 +1403,12 @@ def rename(self, name, inplace=False):
>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
"""
Expand Down Expand Up @@ -1498,9 +1498,9 @@ def from_pandas(cls, index, nan_as_null=None):
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
Index([10.0, 20.0, 30.0, nan], dtype='float64')
"""
if not isinstance(index, pd.Index):
raise TypeError("not a pandas.Index")
Expand Down Expand Up @@ -1671,7 +1671,7 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None):
--------
>>> idx = cudf.Index(['a', 'b', 'c', 'd', 'e'])
>>> idx.take([2, 0, 4, 3])
StringIndex(['c' 'a' 'e' 'd'], dtype='object')
Index(['c', 'a', 'e', 'd'], dtype='object')
"""

if axis not in {0, "index"}:
Expand Down Expand Up @@ -1722,9 +1722,9 @@ def repeat(self, repeats, axis=None):
--------
>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
33, 33, 33, 33, 55, 55, 55, 55, 55],
dtype='int64')
"""
Expand Down
8 changes: 4 additions & 4 deletions python/cudf/cudf/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def factorize(values, sort=False, use_na_sentinel=True, size_hint=None):
>>> codes
array([0, 1, 1], dtype=int8)
>>> uniques
StringIndex(['a' 'c'], dtype='object')
Index(['a' 'c'], dtype='object')

When ``use_na_sentinel=True`` (the default), missing values are indicated
in the `codes` with the sentinel value ``-1`` and missing values are not
Expand All @@ -56,7 +56,7 @@ def factorize(values, sort=False, use_na_sentinel=True, size_hint=None):
>>> codes
array([ 1, -1, 0, 2, 1], dtype=int8)
>>> uniques
StringIndex(['a' 'b' 'c'], dtype='object')
Index(['a', 'b', 'c'], dtype='object')

If NA is in the values, and we want to include NA in the uniques of the
values, it can be achieved by setting ``use_na_sentinel=False``.
Expand All @@ -66,12 +66,12 @@ def factorize(values, sort=False, use_na_sentinel=True, size_hint=None):
>>> codes
array([ 0, 1, 0, -1], dtype=int8)
>>> uniques
Float64Index([1.0, 2.0], dtype='float64')
Index([1.0, 2.0], dtype='float64')
>>> codes, uniques = cudf.factorize(values, use_na_sentinel=False)
>>> codes
array([1, 2, 1, 0], dtype=int8)
>>> uniques
Float64Index([<NA>, 1.0, 2.0], dtype='float64')
Index([<NA>, 1.0, 2.0], dtype='float64')
"""

return_cupy_array = isinstance(values, cp.ndarray)
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/core/column/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class CategoricalAccessor(ColumnMethods):
dtype: category
Categories (3, int64): [1, 2, 3]
>>> s.cat.categories
Int64Index([1, 2, 3], dtype='int64')
Index([1, 2, 3], dtype='int64')
>>> s.cat.reorder_categories([3,2,1])
0 1
1 2
Expand Down Expand Up @@ -104,7 +104,7 @@ def __init__(self, parent: SeriesOrSingleColumnIndex):
super().__init__(parent=parent)

@property
def categories(self) -> "cudf.core.index.GenericIndex":
def categories(self) -> "cudf.core.index.Index":
"""
The categories of this categorical.
"""
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/core/column/methods.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
# Copyright (c) 2020-2023, NVIDIA CORPORATION.

from __future__ import annotations

Expand All @@ -8,7 +8,7 @@

import cudf

ParentType = Union["cudf.Series", "cudf.core.index.GenericIndex"]
ParentType = Union["cudf.Series", "cudf.core.index.Index"]


class ColumnMethods:
Expand Down
Loading