Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

Closed
zahs123 opened this issue Sep 7, 2021 · 2 comments · Fixed by #4207
Closed

Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

zahs123 opened this issue Sep 7, 2021 · 2 comments · Fixed by #4207
Labels
Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. question Further information is requested

Comments

@zahs123
Copy link

zahs123 commented Sep 7, 2021

What is your question?
I am trying to get a quicker gridsearch then using the sklearn version. I have tried the following as a experument:

X = np.random.normal(size=(10,4)).astype(np.float32)
y = np.asarray([0,1]*5, dtype=np.int32)
rf_params = {'n_estimators': [10,12] }
cuml_model = RandomForestClassifier()
grid = dcv.GridSearchCV(cuml_model, rf_params,scoring='accuracy' )
%time grid.fit(X,y) 

the above works,

but when i use the cuml version of gridsearch:

clf = cuRFC( max_depth=16,max_features=1.0,)
grid = dcv.GridSearchCV(clf, rf_params, scoring='accuracy')
grid.fit(X,y)

i get the error:
AttributeError: 'NoneType' object has no attribute 'fit'

why am i getting this error? Am i doing something wrong

@zahs123 zahs123 added ? - Needs Triage Need team to review and classify question Further information is requested labels Sep 7, 2021
@zahs123 zahs123 changed the title Am i using randomForest Classifier wrong & is xgboost supported? Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? Sep 7, 2021
@teju85
Copy link
Member

teju85 commented Sep 8, 2021

The issue reported regarding RF seems to be the same as in #4193 .

@RAMitchell , @trivialfis and/or @hcho3 do you folks know if we can do cuml.model_selection.GridSearchCV (which is basically sklearn.model_selection.GridSearchCV) with xgboost?

@trivialfis
Copy link
Member

trivialfis commented Sep 8, 2021

With xgboost and gradient boosted model, yes. With xgboost-dask, no. I haven't tried xgboost random forest, but should be no too.

@viclafargue viclafargue added Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. and removed ? - Needs Triage Need team to review and classify labels Sep 15, 2021
rapids-bot bot pushed a commit that referenced this issue Sep 21, 2021
This PR ⬇️ 
* fixes #4193 and fixes #4194 that relates to API incompatibility with dask-ml GridSearchCV
* changes the behaviour of cuml RF in the following cases:
    * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. 
    * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction.
 * Corresponding tests to capture the warnings from above added
 * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion.
 * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). 
 * `num_classes` parameter of predict and score methods have also been similarly removed

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Rory Mitchell (https://github.com/RAMitchell)

URL: #4207
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
This PR ⬇️ 
* fixes rapidsai#4193 and fixes rapidsai#4194 that relates to API incompatibility with dask-ml GridSearchCV
* changes the behaviour of cuml RF in the following cases:
    * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. 
    * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction.
 * Corresponding tests to capture the warnings from above added
 * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion.
 * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). 
 * `num_classes` parameter of predict and score methods have also been similarly removed

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Rory Mitchell (https://github.com/RAMitchell)

URL: rapidsai#4207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants