Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance cuML benchmark utility and refactor hdbscan import utilities #5242

Merged
merged 11 commits into from
Mar 6, 2023
40 changes: 37 additions & 3 deletions python/cuml/benchmark/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
decomposition,
linear_model,
) # noqa: F401
from cuml.internals.import_utils import has_umap
from cuml.internals.import_utils import has_umap, has_hdbscan_prediction
from cuml.internals.safe_imports import cpu_only_import

np = cpu_only_import("numpy")
Expand All @@ -69,6 +69,10 @@
import umap


if has_hdbscan_prediction(raise_if_unavailable=False):
Copy link
Member Author

@beckernick beckernick Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our import utilities have two hdbscan availability checks. I don't believe the prediction namespace is optional in HDBSCAN, so I've opted to use this as a placeholder. If neither the prediction or plots namespaces are optional, we can probably unify these utilities into a single has_hdbscan like we have for other libraries (and customize the raised error in hdbscan.pyx).

Alternatively, I can add a has_hdbscan in this PR and use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think has_hdbscan makes more sense. Initially we only cared that the plotting package was available so it was named accordingly but since then we've added the prediction and we only really care that hdbscan itself is available.

Copy link
Member Author

@beckernick beckernick Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Would you prefer I open a separate PR to refactor or throw it into this one? Will add the utility and refactor hdbscan.pyx here.

import hdbscan


class AlgorithmPair:
"""
Wraps a cuML algorithm and (optionally) a cpu-based algorithm
Expand Down Expand Up @@ -272,6 +276,16 @@ def all_algorithms():
name="DBSCAN",
accepts_labels=False,
),
AlgorithmPair(
hdbscan.HDBSCAN
if has_hdbscan_prediction(raise_if_unavailable=True)
else None,
cuml.cluster.HDBSCAN,
shared_args={},
cpu_args={},
name="HDBSCAN",
accepts_labels=False,
),
AlgorithmPair(
sklearn.linear_model.LinearRegression,
cuml.linear_model.LinearRegression,
Expand Down Expand Up @@ -315,7 +329,8 @@ def all_algorithms():
AlgorithmPair(
sklearn.ensemble.RandomForestClassifier,
cuml.ensemble.RandomForestClassifier,
shared_args={"max_features": 1.0, "n_estimators": 10},
shared_args={"max_features": "sqrt", "n_estimators": 50},
cpu_args={"n_jobs": 1},
name="RandomForestClassifier",
accepts_labels=True,
cpu_data_prep_hook=_labels_to_int_hook,
Expand All @@ -325,7 +340,8 @@ def all_algorithms():
AlgorithmPair(
sklearn.ensemble.RandomForestRegressor,
cuml.ensemble.RandomForestRegressor,
shared_args={"max_features": 1.0, "n_estimators": 10},
shared_args={"max_features": 1.0, "n_estimators": 50},
cpu_args={"n_jobs": 1},
name="RandomForestRegressor",
accepts_labels=True,
accuracy_function=metrics.r2_score,
Expand Down Expand Up @@ -382,6 +398,24 @@ def all_algorithms():
accepts_labels=True,
accuracy_function=cuml.metrics.r2_score,
),
AlgorithmPair(
sklearn.svm.LinearSVC,
cuml.svm.LinearSVC,
shared_args={},
cuml_args={},
name="LinearSVC",
accepts_labels=True,
accuracy_function=cuml.metrics.accuracy_score,
),
AlgorithmPair(
sklearn.svm.LinearSVR,
cuml.svm.LinearSVR,
shared_args={},
cuml_args={},
name="LinearSVR",
accepts_labels=True,
accuracy_function=cuml.metrics.accuracy_score,
),
AlgorithmPair(
sklearn.neighbors.KNeighborsClassifier,
cuml.neighbors.KNeighborsClassifier,
Expand Down