Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RF regression test fails intermittently #1934

Closed
Salonijain27 opened this issue Mar 26, 2020 · 6 comments
Closed

[BUG] RF regression test fails intermittently #1934

Salonijain27 opened this issue Mar 26, 2020 · 6 comments
Assignees
Labels
bug Something isn't working inactive-30d inactive-90d tests Unit testing for project

Comments

@Salonijain27
Copy link
Contributor

Dask RF regression test fails intermittently.
When tested on CUDA10.1 nightly dev container and the test would fail once or twice every 120 runs. The worst failure that I saw on CUDA 10.1 was when the score was 0.58. This has happened only once. The other failures were within the range of 0.62 to 0.68. The mean of the run is always around 0.70 and the variance in CUDA 10.1 has been around 0.0003.
Also i have not seen it fail in CUDA 10.0. local system but it has failed for CUDA10.0 in the CI
Furthermore, this is failing around 8% of the time with CUDA 10.2, with a few failures between 0.5 and 0.6 and one as low as 0.48

@Salonijain27 Salonijain27 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 26, 2020
@Salonijain27 Salonijain27 self-assigned this Mar 26, 2020
@JohnZed JohnZed changed the title [BUG] Dask RF regression test fails [BUG] Dask RF regression test fails intermittently Apr 3, 2020
@Salonijain27
Copy link
Contributor Author

Code below shows lower accuracy for cuml rf regression when random_state is set to 3201:

import cudf
import numpy as np
import random
from cuml.ensemble import RandomForestRegressor as curfr
from cuml.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor as skrfr
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split


column_info = [20,10]
nrows=10000
datatype = np.float32
ncols, n_info = column_info
X, y = make_regression(n_samples=nrows, n_features=ncols,
                       n_informative=n_info,
                       random_state=123)
X = X.astype(datatype)
y = y.astype(datatype)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000,
                                                    random_state=3201)
cu_rf_params = {
    'n_estimators': 50,
    'max_depth': 16,
    'n_bins': 16,
    }

cu_rf_mg = curfr(**cu_rf_params)
cu_rf_mg.fit(X_train, y_train)
cu_preds = cu_rf_mg.predict(X_test)
acc_score = r2_score(cu_preds, y_test)
print(acc_score)

sk_model = skrfr(max_depth=16, random_state=10, n_estimators=50)
sk_model.fit(X_train, y_train)
sk_preds = sk_model.predict(X_test)
sk_r2 = r2_score(y_test, sk_preds, convert_dtype=datatype)
print(sk_r2)

@Salonijain27
Copy link
Contributor Author

For the above code:
The sklearn score fluctuates by a max difference of 0.02 with the mean being 0.87. However, the cuml RF r2 score ranges from 0.60 to 0.84 depending of the random_state value for train_test_split function

@Salonijain27 Salonijain27 changed the title [BUG] Dask RF regression test fails intermittently [BUG] RF regression test fails intermittently May 7, 2020
@Salonijain27
Copy link
Contributor Author

@vishalmehta1991 @vinaydes any idea why this is happening?

@Garfounkel Garfounkel added tests Unit testing for project and removed ? - Needs Triage Need team to review and classify labels May 14, 2020
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@dantegd
Copy link
Member

dantegd commented May 12, 2021

The failures for mnmg tests are being tracked on #3820 and CUDA 10.1/10.2 are no longer supported, so will close this issue and any problems we can report them on the linked issue.

@dantegd dantegd closed this as completed May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inactive-30d inactive-90d tests Unit testing for project
Projects
None yet
Development

No branches or pull requests

3 participants