-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cuml RF Regressor hangs indefinitely for n_bins
that are not multiples of TPB
#3919
Comments
The line https://github.com/rapidsai/cuml/blob/branch-21.06/cpp/src/decisiontree/batched-levelalgo/kernels.cuh#L372 seems to be the issue.
If
This worked on my initial test. |
Thank you vinay, will update the Issue description accordingly based on the info |
n_bins
for new backendn_bins
that are not multiples of TPB
… `n_bins > TPB && n_bins % TPB != 0` (#3921) * This mini-(but important)-PR fixes the bug in `pdf_to_cdf` device function that causes hang when `n_bins > TPB && n_bins % TPB != 0` * This closes #3919 Authors: - Venkat (https://github.com/venkywonka) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3921
… `n_bins > TPB && n_bins % TPB != 0` (rapidsai#3921) * This mini-(but important)-PR fixes the bug in `pdf_to_cdf` device function that causes hang when `n_bins > TPB && n_bins % TPB != 0` * This closes rapidsai#3919 Authors: - Venkat (https://github.com/venkywonka) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3921
Describe the bug
The Rf regressor seems to indefinitely hang for a specific input of
n_bins
that are greater than the threads-per-block of the regression kernel (TPB=64
when this was encountered) AND are not multiples ofTPB
.IOW, in the currently regression kernel implementation, all
n_bins > 64 && n_bins % 64 != 0
cause kernel to deadlock.Steps/Code to reproduce bug
Quickest way is to change this line from
(1, 1.0, True, 32)
to
(1, 1.0, True, 100)
and run this pytest command after building cuml from source:
Expected behavior
The test must pass, instead of hanging indefinitely
Environment details (please complete the following information):
cmake
&gcc/g++
and commit hash of buildcmake
: 3.20.1g++
: 9.3.095efa251e
(branch-21.06)Additional context
seems to happen with regression when
split_algo=1
and new backend whenn_bins=100
irrespective of other hyperparametersThe text was updated successfully, but these errors were encountered: