Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimators adaptation toward CPU/GPU interoperability #4918

Merged
merged 68 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
06df028
Add device and memory type settings
viclafargue Aug 18, 2022
ba70967
Apply changes
viclafargue Aug 24, 2022
d19a6ef
Add pytests
viclafargue Aug 24, 2022
687bf73
CPU/GPU dispatch according to device_type
viclafargue Aug 26, 2022
f120ac5
Testing different input types
viclafargue Sep 1, 2022
1556de8
Add code doc
viclafargue Sep 1, 2022
62f4ea7
Remove None of acceptable values for device_type and memory_type
viclafargue Sep 1, 2022
85c06a7
Update for thrust 1.17 and fixes to accommodate for cuDF Buffer refac…
dantegd Aug 25, 2022
ef55502
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#4862)
robertmaynard Aug 25, 2022
ac68939
Remove unused cuDF imports (#4873)
beckernick Aug 26, 2022
dadedda
Import treelite models into FIL in a different precision (#4839)
canonizer Aug 26, 2022
9afdff5
All points membership vector for HDBSCAN (#4800)
tarang-jain Aug 26, 2022
0cdafd5
merging
divyegala Sep 2, 2022
ff6d60a
merging and style fix
divyegala Sep 7, 2022
8c64303
base to experimental, attributes available on cuml model, is_cuda_ava…
divyegala Sep 8, 2022
68e816f
copyright check
divyegala Sep 8, 2022
ecd8a0a
fixing failing tests and MRO for predict
divyegala Sep 9, 2022
279004b
fixing mirroring of input/output
divyegala Sep 14, 2022
eb54d3f
Requested changes
viclafargue Sep 26, 2022
d6c2dde
Merge branch 'branch-22.10' into cpu-gpu-interop
viclafargue Sep 29, 2022
0f3922a
Requested changes 2
viclafargue Sep 29, 2022
c3dda60
Requested changes 3
viclafargue Oct 3, 2022
bfe238f
Fix import
viclafargue Oct 4, 2022
5c03489
Fix tests
viclafargue Oct 5, 2022
09db802
Using raft::KeyValuePair instead of cub::KeyValuePair
viclafargue Oct 5, 2022
0dbe8a4
Revert "Using raft::KeyValuePair instead of cub::KeyValuePair"
cjnolet Oct 6, 2022
495c676
Fix _check_internal_model
viclafargue Oct 7, 2022
73bb16c
Generic testing
viclafargue Oct 7, 2022
a232ff6
Adding UMAP
viclafargue Oct 7, 2022
edc4d84
Adding LogisticRegression
viclafargue Oct 7, 2022
3c3328e
Merge branch 'branch-22.10' into cpu-gpu-interop-models
viclafargue Oct 10, 2022
e5f88cf
Adding LogisticRegression 2
viclafargue Oct 11, 2022
2f9b2ac
Improved estimator initialization
viclafargue Oct 13, 2022
26f1fd1
Checker for similarity of hyperparameters default values
viclafargue Oct 13, 2022
d2a857a
Hyperparameters check as a test
viclafargue Oct 14, 2022
0c37722
info instead of warn
viclafargue Oct 14, 2022
d91d924
Add Lasso, ElasticNet, Ridge
viclafargue Oct 19, 2022
49a7c3c
Add PCA
viclafargue Oct 20, 2022
4594f3a
Add TSVD
viclafargue Oct 20, 2022
8cdecb7
Merge branch-22.10
viclafargue Oct 21, 2022
523aaed
Add NearestNeighbors + improve CumlArrayDescriptor
viclafargue Oct 26, 2022
b4584fe
Use of the same attributes
viclafargue Oct 26, 2022
1650a82
Improve coverage for LogisticRegression
viclafargue Oct 28, 2022
9bcc59f
Improve coverage for LinearRegression
viclafargue Oct 28, 2022
4ddb837
Improve coverage for lasso, elasticnet, ridge
viclafargue Oct 28, 2022
2a4427e
Update headers
viclafargue Oct 28, 2022
5802cfd
flake8 style
viclafargue Oct 28, 2022
f038485
Redirection to CPU
viclafargue Oct 28, 2022
7a784ac
Fix numerous minor issues
viclafargue Oct 31, 2022
5bc59e0
Methods coverage for UMAP, PCA, tSVD and NN
viclafargue Nov 2, 2022
7f25eb0
two score functions in mixins
viclafargue Nov 2, 2022
63fa608
Fix header
viclafargue Nov 2, 2022
9743a7a
Merge branch 'branch-22.12' into cpu-gpu-interop-models
viclafargue Nov 3, 2022
dc19310
better testing of LogisticRegression log_proba
viclafargue Nov 4, 2022
c9acaea
deepcopy for Ridge input
viclafargue Nov 4, 2022
36f5a12
skip some of the tests
viclafargue Nov 4, 2022
2e9d04d
avoid double kwargs processing when using a child class
viclafargue Nov 4, 2022
4a32589
restore tests
viclafargue Nov 4, 2022
c0d48ef
Fix remaining issues
viclafargue Nov 8, 2022
3a37b5a
minor improvements
viclafargue Nov 8, 2022
21002e6
Adress reviews
viclafargue Nov 10, 2022
92e44a6
Fix base doc test
viclafargue Nov 11, 2022
06e0f8b
fix typo
viclafargue Nov 14, 2022
0632ebc
address review
viclafargue Nov 16, 2022
4f5e08f
Merge branch 'branch-22.12' into cpu-gpu-interop-models
viclafargue Nov 17, 2022
40165e6
Merge branch 'branch-22.12' into cpu-gpu-interop-models
viclafargue Dec 1, 2022
09f714b
restore init file
viclafargue Dec 1, 2022
19e4d24
Merge branch 'branch-23.02' into cpu-gpu-interop-models
viclafargue Dec 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions python/cuml/common/array_descriptor.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,14 @@ class CumlArrayDescriptor():
Python descriptor object to control getting/setting `CumlArray` attributes
on `Base` objects. See the Estimator Guide for an in depth guide.
"""
def __init__(self, order='K'):
# order corresponds to the order that the CumlArray attribute
# should be in to work with the C++ algorithms.
self.order = order
wphicks marked this conversation as resolved.
Show resolved Hide resolved

def __set_name__(self, owner, name):
self.name = name
setattr(owner, name + '_order', self.order)
wphicks marked this conversation as resolved.
Show resolved Hide resolved

def _get_meta(self,
instance,
Expand Down
2 changes: 1 addition & 1 deletion python/cuml/common/input_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,7 +535,7 @@ def input_to_host_array(X,

if isinstance(X, (int, float, complex, bool, str,
type(None), dict, set, list, tuple)):
return X
return (X,)
wphicks marked this conversation as resolved.
Show resolved Hide resolved

if isinstance(X, np.ndarray):
if len(X.shape) > 1:
Expand Down
56 changes: 55 additions & 1 deletion python/cuml/common/mixins.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION.
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -214,6 +214,33 @@ def score(self, X, y, **kwargs):
preds = self.predict(X, **kwargs)
return r2_score(y, preds, handle=handle)

# TODO : remove score function duplicate
# once updated CPU/GPU interoperability class is ready
@generate_docstring(
return_values={
'name': 'score',
'type': 'float',
'description': 'R^2 of self.predict(X) '
'wrt. y.'
})
@cuml.internals.api_base_return_any_skipall
def _score(self, X, y, **kwargs):
viclafargue marked this conversation as resolved.
Show resolved Hide resolved
"""
Scoring function for regression estimators

Returns the coefficient of determination R^2 of the prediction.

"""
from cuml.metrics.regression import r2_score

if hasattr(self, 'handle'):
handle = self.handle
else:
handle = None

preds = self._predict(X, **kwargs)
return r2_score(y, preds, handle=handle)

@staticmethod
def _more_static_tags():
return {
Expand Down Expand Up @@ -253,6 +280,33 @@ def score(self, X, y, **kwargs):
preds = self.predict(X, **kwargs)
return accuracy_score(y, preds, handle=handle)

# TODO : remove score function duplicate
# once updated CPU/GPU interoperability class is ready
@generate_docstring(
return_values={
'name':
'score',
'type':
'float',
'description': ('Accuracy of self.predict(X) wrt. y '
'(fraction where y == pred_y)')
})
@cuml.internals.api_base_return_any_skipall
def _score(self, X, y, **kwargs):
viclafargue marked this conversation as resolved.
Show resolved Hide resolved
"""
Scoring function for classifier estimators based on mean accuracy.

"""
from cuml.metrics.accuracy import accuracy_score

if hasattr(self, 'handle'):
handle = self.handle
else:
handle = None

preds = self._predict(X, **kwargs)
return accuracy_score(y, preds, handle=handle)

@staticmethod
def _more_static_tags():
return {
Expand Down
3 changes: 1 addition & 2 deletions python/cuml/dask/common/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
from cuml.dask.common.utils import get_client

from cuml.common.base import Base
from cuml.experimental.common.base import Base as experimentalBase
from cuml.common.array import CumlArray
from cuml.dask.common.utils import wait_and_raise_from_futures
from raft_dask.common.comms import Comms
Expand Down Expand Up @@ -129,7 +128,7 @@ def _check_internal_model(model):
if model.type is None:
wait_and_raise_from_futures([model])

if not issubclass(model.type, (Base, experimentalBase)):
if not issubclass(model.type, Base):
raise ValueError("Dask Future expected to contain cuml.Base "
"but found %s instead." % model.type)

Expand Down
8 changes: 4 additions & 4 deletions python/cuml/decomposition/base_mg.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@ class BaseDecompositionMG(object):
self._set_n_features_in(n_cols)

if self.n_components is None:
self._n_components = min(total_rows, n_cols)
self.n_components_ = min(total_rows, n_cols)
else:
self._n_components = self.n_components
self.n_components_ = self.n_components

X_arys = []
for i in range(len(X)):
Expand Down Expand Up @@ -102,11 +102,11 @@ class BaseDecompositionMG(object):
trans_arg = opg.build_data_t(trans_arys)

trans_part_desc = opg.build_part_descriptor(total_rows,
self._n_components,
self.n_components_,
rank_to_sizes,
rank)

self._initialize_arrays(self._n_components, total_rows, n_cols)
self._initialize_arrays(self.n_components_, total_rows, n_cols)
decomp_params = self._build_params(total_rows, n_cols)

if _transform:
Expand Down
2 changes: 1 addition & 1 deletion python/cuml/decomposition/incremental_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ def partial_fit(self, X, y=None, check_input=True) -> "IncrementalPCA":
explained_variance = S ** 2 / (n_total_samples - 1)
explained_variance_ratio = S ** 2 / cp.sum(col_var * n_total_samples)

self.n_rows = n_total_samples
self.n_samples_ = n_total_samples
self.n_samples_seen_ = n_total_samples
self.components_ = V[:self.n_components_]
self.singular_values_ = S[:self.n_components_]
Expand Down
Loading