-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] CUDA 11.2 libcuml++ C++ test failures EDIT: Updated with 11.2 update 2 #3406
Comments
This issue has been labeled |
Results have been updated with 11.2 update 1 |
W/ 11.2.2 I get failures in the following gtests on a V100 in Centos7:
|
@venkywonka Has found and fixed a bug in the BatchedLevelAlgo test. That should fix this issue. |
I'm working to figure out the UMAP side. |
working on fixing the BatchedLevelAlgo test bug, will send a PR asap |
* This PR fixes the regressions shown by `BatchedLevelAlgo/DtClsTestF` and `BatchedLevelAlgo/DtRegTestF` wherein the quantiles parameter passed to `grow_tree` function was uninitialized garbage memory as opposed to what should have been quantiles computed for each column. * It also replaces the old method of computing quantiles (`preprocess_quantiles`) with new, more accurate one (`computeQuantiles`) * removes an unnecessary memory allocation to `tempmem` in the setup phase of the test fixture. * This fixes failing `BatchedLevelAlgo/DtRegTestF` tests as reported in issue #3406 * It also fixes failing `BatchedLevelAlgo/DtClsTestF` tests in PR #3616 cc @teju85 @vinaydes @JohnZed @hcho3 Authors: - Venkat (https://github.com/venkywonka) Approvers: - Thejaswi. N. S (https://github.com/teju85) - John Zedlewski (https://github.com/JohnZed) URL: #3690
Closes #3406. There's a couple things to note in this PR: 1. There is a kernel in the UMAP gtest that computes an L1 between two different embeddings and it wasn't using atomics for the addition so we may have just not been seeing these failures until now 2. The Python code wasn't failing because it's using random init instead of spectral. I'm going to open an issue in RAFT to investigate why the spectral init differs from run to run with the same inputs even when random seed is set. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Victor Lafargue (https://github.com/viclafargue) - John Zedlewski (https://github.com/JohnZed) URL: #3696
Describe the bug
During the weekend I was updating my workstation to add CUDA 11.2 as one of the installed versions, and ran the
libcuml++
C++ tests to see potential issues we could run into. Prims tests all passed. Since we have (relatively very) few failures, I'm opening this issue to track things.Updated with cuda 11.2 update 2 results (identical to update 1)
Steps/Code to reproduce bug
Build
libcuml++
with CUDA 11.2, running in my 3080 workstation these are the tests that fail:Summary:
Note I only get the
BatchedLevelAlgo
fails if I run multiple tests (like the whole suite), but not if I just run theBatchedLevelAlgo
tests onlyDetailed failures:
Environment details (please complete the following information):
Old failures before update 1:
The text was updated successfully, but these errors were encountered: