Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

zahs123 · 2021-09-07T14:51:56Z

What is your question?
I am trying to get a quicker gridsearch then using the sklearn version. I have tried the following as a experument:

X = np.random.normal(size=(10,4)).astype(np.float32)
y = np.asarray([0,1]*5, dtype=np.int32)
rf_params = {'n_estimators': [10,12] }
cuml_model = RandomForestClassifier()
grid = dcv.GridSearchCV(cuml_model, rf_params,scoring='accuracy' )
%time grid.fit(X,y)

the above works,

but when i use the cuml version of gridsearch:

clf = cuRFC( max_depth=16,max_features=1.0,)
grid = dcv.GridSearchCV(clf, rf_params, scoring='accuracy')
grid.fit(X,y)

i get the error:
AttributeError: 'NoneType' object has no attribute 'fit'

why am i getting this error? Am i doing something wrong

The text was updated successfully, but these errors were encountered:

teju85 · 2021-09-08T08:24:40Z

The issue reported regarding RF seems to be the same as in #4193 .

@RAMitchell , @trivialfis and/or @hcho3 do you folks know if we can do cuml.model_selection.GridSearchCV (which is basically sklearn.model_selection.GridSearchCV) with xgboost?

trivialfis · 2021-09-08T10:02:22Z

With xgboost and gradient boosted model, yes. With xgboost-dask, no. I haven't tried xgboost random forest, but should be no too.

This PR ⬇️ * fixes #4193 and fixes #4194 that relates to API incompatibility with dask-ml GridSearchCV * changes the behaviour of cuml RF in the following cases: * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction. * Corresponding tests to capture the warnings from above added * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion. * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). * `num_classes` parameter of predict and score methods have also been similarly removed Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Rory Mitchell (https://github.com/RAMitchell) URL: #4207

This PR ⬇️ * fixes rapidsai#4193 and fixes rapidsai#4194 that relates to API incompatibility with dask-ml GridSearchCV * changes the behaviour of cuml RF in the following cases: * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction. * Corresponding tests to capture the warnings from above added * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion. * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). * `num_classes` parameter of predict and score methods have also been similarly removed Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Rory Mitchell (https://github.com/RAMitchell) URL: rapidsai#4207

zahs123 added ? - Needs Triage Need team to review and classify question Further information is requested labels Sep 7, 2021

zahs123 changed the title ~~Am i using randomForest Classifier wrong & is xgboost supported?~~ Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? Sep 7, 2021

venkywonka mentioned this issue Sep 14, 2021

RF: python api behaviour refactor #4207

Merged

viclafargue added Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. and removed ? - Needs Triage Need team to review and classify labels Sep 15, 2021

rapids-bot bot closed this as completed in #4207 Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

zahs123 commented Sep 7, 2021

teju85 commented Sep 8, 2021

trivialfis commented Sep 8, 2021 •

edited

Loading

Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194

Comments

zahs123 commented Sep 7, 2021

teju85 commented Sep 8, 2021

trivialfis commented Sep 8, 2021 • edited Loading

trivialfis commented Sep 8, 2021 •

edited

Loading