Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Non-linear SVC and SVR are incompatible with dask-ml GridsearchCV #4192

Closed
beckernick opened this issue Sep 7, 2021 · 3 comments · Fixed by #4198
Closed

[BUG] Non-linear SVC and SVR are incompatible with dask-ml GridsearchCV #4192

beckernick opened this issue Sep 7, 2021 · 3 comments · Fixed by #4198
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features.

Comments

@beckernick
Copy link
Member

beckernick commented Sep 7, 2021

Referencing the coef_ attribute on SVC and SVR throws a generic RuntimeError rather than an AttributeError for non-linear kernels. This breaks dask-ML's GridsearchCV (and potentially other components), as it relies on explicit error catching in conditional logic: https://github.com/dask/dask-ml/blob/f1933bbd344452d8cec900ca902da4e417e1683e/dask_ml/model_selection/_normalize.py#L37-L40

Note that this still breaks even if the parameter dict passed to dask-ml only specifies the linear kernel. As the default kernel is rbf, you'd need to also explicitly specify this in the estimator passed to GridsearchCV.

        for attr in attrs:
            if attr in exclude:
                continue
            try:
                val = getattr(est, attr)
            except (sklearn.exceptions.NotFittedError, AttributeError):
                continue

If possible, we should try to throw the more specific AttributeError in this scenario.

from sklearn.svm import SVC, SVR
import cumlgetattr(SVR(), "coef_")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_55047/1241473841.py in <module>
      2 import cuml
      3 
----> 4 getattr(SVR(), "coef_")

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/sklearn/svm/_base.py in coef_(self)
    499     def coef_(self):
    500         if self.kernel != 'linear':
--> 501             raise AttributeError('coef_ is only available when using a '
    502                                  'linear kernel')
    503 

AttributeError: coef_ is only available when using a linear kernel
from sklearn.svm import SVC, SVR
import cumlgetattr(cuml.svm.SVR(), "coef_")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_55047/3521656981.py in <module>
      2 import cuml
      3 
----> 4 getattr(cuml.svm.SVR(), "coef_")

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner(*args, **kwargs)
    593 
    594                 # Call the function
--> 595                 ret_val = func(*args, **kwargs)
    596 
    597             return cm.process_return(ret_val)

cuml/svm/svm_base.pyx in cuml.svm.svm_base.SVMBase.coef_()

RuntimeError: coef_ is only available for linear kernels

Dask-ML Failing Example (the error is propagated up rather than caught):

import dask_ml.model_selection as dcv
from sklearn.datasets import load_iris
import cumliris = load_iris()
​
parameters = {'kernel': ['rbf'], 'C': [1, 10]}
svc = cuml.svm.SVC()
clf = dcv.GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_55610/1025904613.py in <module>
      8 svc = cuml.svm.SVC()
      9 clf = dcv.GridSearchCV(svc, parameters)
---> 10 clf.fit(iris.data, iris.target)

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/dask_ml/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
   1234 
   1235         candidate_params = list(self._get_param_iterator())
-> 1236         dsk, keys, n_splits, _ = build_cv_graph(
   1237             estimator,
   1238             self.cv,

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/dask_ml/model_selection/_search.py in build_cv_graph(estimator, cv, scorer, candidate_params, X, y, groups, fit_params, iid, error_score, return_train_score, cache_cv)
    214     fields, tokens, params = normalize_params(candidate_params)
    215     main_token = tokenize(
--> 216         normalize_estimator(estimator),
    217         fields,
    218         params,

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/dask_ml/model_selection/_normalize.py in normalize_estimator(est)
     36                 continue
     37             try:
---> 38                 val = getattr(est, attr)
     39             except (sklearn.exceptions.NotFittedError, AttributeError):
     40                 continue

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner(*args, **kwargs)
    593 
    594                 # Call the function
--> 595                 ret_val = func(*args, **kwargs)
    596 
    597             return cm.process_return(ret_val)

cuml/svm/svm_base.pyx in cuml.svm.svm_base.SVMBase.coef_()

RuntimeError: coef_ is only available for linear kernels
conda list | grep "rapids\|dask"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-21.10:
cucim                     21.10.00a210907 cuda_11.2_py38_g27da55a_20    rapidsai-nightly
cudf                      21.10.00a210907 cuda_11.2_py38_g8d602b7e99_267    rapidsai-nightly
cudf_kafka                21.10.00a210907 py38_g8d602b7e99_267    rapidsai-nightly
cugraph                   21.10.00a210907 cuda11.2_py38_gdcc08bc6_67    rapidsai-nightly
cuml                      21.10.00a210907 cuda11.2_py38_g0e770fa50_71    rapidsai-nightly
cusignal                  21.10.00a210907 py38_g5ae8e6a_13    rapidsai-nightly
cuspatial                 21.10.00a210907 py38_gf11540e_13    rapidsai-nightly
custreamz                 21.10.00a210907 py38_g8d602b7e99_267    rapidsai-nightly
cuxfilter                 21.10.00a210907 py38_gd7204b9_13    rapidsai-nightly
dask                      2021.9.0           pyhd8ed1ab_0    conda-forge
dask-core                 2021.9.0           pyhd8ed1ab_0    conda-forge
dask-cuda                 21.10.00a210907         py38_35    rapidsai-nightly
dask-cudf                 21.10.00a210907 py38_g8d602b7e99_267    rapidsai-nightly
dask-glm                  0.2.0                      py_1    conda-forge
dask-ml                   1.9.0              pyhd8ed1ab_0    conda-forge
libcucim                  21.10.00a210907 cuda11.2_g27da55a_20    rapidsai-nightly
libcudf                   21.10.00a210907 cuda11.2_g8d602b7e99_267    rapidsai-nightly
libcudf_kafka             21.10.00a210907 g8d602b7e99_267    rapidsai-nightly
libcugraph                21.10.00a210907 cuda11.2_gdcc08bc6_67    rapidsai-nightly
libcuml                   21.10.00a210907 cuda11.2_g0e770fa50_71    rapidsai-nightly
libcumlprims              21.10.00a210826 cuda11.2_ga512fc5_5    rapidsai-nightly
libcuspatial              21.10.00a210907 cuda11.2_gf11540e_13    rapidsai-nightly
librmm                    21.10.00a210907 cuda11.2_g8527317_28    rapidsai-nightly
libxgboost                1.4.2dev.rapidsai21.10      cuda11.2_0    rapidsai-nightly
py-xgboost                1.4.2dev.rapidsai21.10  cuda11.2py38_0    rapidsai-nightly
rapids                    21.10.00a210907 cuda11.2_py38_g24de107_61    rapidsai-nightly
rapids-xgboost            21.10.00a210907 cuda11.2_py38_g24de107_61    rapidsai-nightly
rmm                       21.10.00a210907 cuda_11.2_py38_g8527317_28    rapidsai-nightly
ucx                       1.9.0+gcd9efd3       cuda11.2_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.22.0a210907   py38_gcd9efd3_18    rapidsai-nightly
xgboost                   1.4.2dev.rapidsai21.10  cuda11.2py38_0    rapidsai-nightly
@beckernick beckernick added bug Something isn't working ? - Needs Triage Need team to review and classify Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. labels Sep 7, 2021
@beckernick beckernick changed the title [BUG] Referencing the coef_ attribute on SVC and SVR throws a RuntimeError instead of AttributeError, breaking dask-ml GridsearchCV [BUG] Non-linear kernel SVC and SVR are incompatible with dask-ml GridsearchCV Sep 7, 2021
@beckernick beckernick changed the title [BUG] Non-linear kernel SVC and SVR are incompatible with dask-ml GridsearchCV [BUG] SVC and SVR are incompatible with dask-ml GridsearchCV Sep 7, 2021
@beckernick beckernick changed the title [BUG] SVC and SVR are incompatible with dask-ml GridsearchCV [BUG] Non-linear SVC and SVR are incompatible with dask-ml GridsearchCV Sep 7, 2021
@teju85
Copy link
Member

teju85 commented Sep 8, 2021

Tagging @tfeher and @achirkin

@achirkin
Copy link
Contributor

achirkin commented Sep 8, 2021

Thanks for the report! There seems to be nothing speaking against changing the error type here; I'm on it.

@rapids-bot rapids-bot bot closed this as completed in #4198 Sep 8, 2021
rapids-bot bot pushed a commit that referenced this issue Sep 8, 2021
Change the error type when trying to predict before fitting SVM to match sklearn.

Fixes #4192

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4198
@beckernick
Copy link
Member Author

Confirmed that the dask-ml code in the issue now works. cc @sauravdev

vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Change the error type when trying to predict before fitting SVM to match sklearn.

Fixes rapidsai#4192

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants