-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vector leaf for random forest #4191
Conversation
2954020
to
f192c03
Compare
I have deleted the file
|
Before and after charts from the experiments in #3764: |
Some benchmark results below. The classification datasets are more accurate (statistically significant), although there may be a slight reduction in performance for airline.
|
std::size_t max_batch_size = min(std::size_t(100000), tree->size()); | ||
rmm::device_uvector<NodeT> d_tree(max_batch_size, handle.get_stream()); | ||
rmm::device_uvector<InstanceRange> d_instance_ranges(max_batch_size, handle.get_stream()); | ||
tree->vector_leaf.resize(tree->sparsetree.size() * input.numOutputs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, are we allocating twice the amount of memory than needed because we want to avoid prefix operation (and added complexity of leaf indices)? Twice because there would be equal number of interior nodes as there would leaves.
Since vector_leaf
is part of the model which will live on beyond the training phase, is compaction of the array worth considering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think it's important, I favor the simple approach. I'm assuming that this is not related to the performance regression, will investigate that more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering about the conclusion of this conversation, would array compaction be a worthy optimization in a further PR? (I doubt it would be worth it too much for the memory savings, but wanted to as anyway)
instance_ranges.data(), | ||
tree->sparsetree.size(), | ||
handle.get_stream()); | ||
raft::update_device( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this copy operation needed? Isn't d_leaves
is the output variable for the kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a cudaMemsetasync just so we don't get any garbage values in the unused part of the vector.
total += shist[i].x; | ||
} | ||
for (int i = 0; i < nclasses; i++) { | ||
out[i] = DataT(shist[i].x) / total; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This division can be avoided by first computing the inverse of total
. Not sure if compiler already figures it out. I haven't looked at SASS.
Same for other two SetLeafVector
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation is performed only once per leaf, per class. Are you sure this optimisation is necessary?
@RAMitchell Any idea where the performance regression is coming from? Is it all from the |
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #4191 +/- ##
===============================================
Coverage ? 86.07%
===============================================
Files ? 231
Lines ? 18636
Branches ? 0
===============================================
Hits ? 16041
Misses ? 2595
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
The slight performance regression on the benchmarks above is due to a Treelite performance issue. This is addressed in dmlc/treelite#311. I think we should proceed with merging this, and update treelite after. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great to me, just had one quick question
std::size_t max_batch_size = min(std::size_t(100000), tree->size()); | ||
rmm::device_uvector<NodeT> d_tree(max_batch_size, handle.get_stream()); | ||
rmm::device_uvector<InstanceRange> d_instance_ranges(max_batch_size, handle.get_stream()); | ||
tree->vector_leaf.resize(tree->sparsetree.size() * input.numOutputs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering about the conclusion of this conversation, would array compaction be a worthy optimization in a further PR? (I doubt it would be worth it too much for the memory savings, but wanted to as anyway)
I don't think array compaction will affect performance and the code will get more complicated. Note that it gets compacted anyway in conversion to treelite, so gpu prediction is unaffected. |
@gpucibot merge |
The 2.1.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#311 * dmlc/treelite#302 * dmlc/treelite#303 * dmlc/treelite#296 In particular, dmlc/treelite#311 is a critical follow-up to #4191 and addresses a performance regression. Requires rapidsai/integration#353 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4220
Fixes rapidsai#3764,rapidsai#2518 To do: - post charts confirming the improvement in accuracy - address python tests - benchmark Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Vinay Deshpande (https://github.com/vinaydes) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4191
The 2.1.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#311 * dmlc/treelite#302 * dmlc/treelite#303 * dmlc/treelite#296 In particular, dmlc/treelite#311 is a critical follow-up to rapidsai#4191 and addresses a performance regression. Requires rapidsai/integration#353 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4220
Fixes #3764,#2518
To do: