-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG fix BatchedLevelAlgo DtClsTest & DtRegTest failing tests #3690
BUG fix BatchedLevelAlgo DtClsTest & DtRegTest failing tests #3690
Conversation
* updated to new method for quantiles computation * deleted unused tempmem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. Thanks @venkywonka for the quick fix.
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #3690 +/- ##
===============================================
+ Coverage 80.70% 82.48% +1.77%
===============================================
Files 227 227
Lines 17615 17541 -74
===============================================
+ Hits 14217 14469 +252
+ Misses 3398 3072 -326
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@gpucibot merge |
…ogram calculation and occupancy optimization (#3616) * This PR introduces: * A faster way to calculate the histograms containing splits in the `ML::DecisionTree::computeSplitClassificationKernel` . These histograms are used for node-splitting in decision trees for the task of classification. * A change in the default `gridDim.x` in the launch configuration of the above kernel from `4` to based on occupancy calculator and other dimension gridDims, thus improving the occupancy to theoretical limits * Earlier too many atomic adds to shared memory limited the kernel times, which has been avoided by blockwide sum-scans to obtain the same histogram using fewer atomic writes to shared memory. * The resulting kernel time speedups are significant (upto 30x for some nodes) * `computeSplitRegressionKernel` has different share-memory write patterns that deserves it's own PR for optimization 😬 * Tests will pass once #3690 is merged Authors: - Venkat (https://github.com/venkywonka) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Philip Hyunsu Cho (https://github.com/hcho3) - Thejaswi. N. S (https://github.com/teju85) - John Zedlewski (https://github.com/JohnZed) URL: #3616
BatchedLevelAlgo/DtClsTestF
andBatchedLevelAlgo/DtRegTestF
wherein the quantiles parameter passed togrow_tree
function was uninitialized garbage memory as opposed to what should have been quantiles computed for each column.preprocess_quantiles
) with new, more accurate one (computeQuantiles
)tempmem
in the setup phase of the test fixture.BatchedLevelAlgo/DtRegTestF
tests as reported in issue [BUG] CUDA 11.2 libcuml++ C++ test failures EDIT: Updated with 11.2 update 2 #3406BatchedLevelAlgo/DtClsTestF
tests in PR ENH Decision Tree new backendcomputeSplitClassificationKernel
histogram calculation and occupancy optimization #3616cc @teju85 @vinaydes @JohnZed @hcho3