Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vector leaf for random forest #4191

Merged
merged 42 commits into from
Sep 17, 2021

Conversation

RAMitchell
Copy link
Contributor

Fixes #3764,#2518

To do:

  • post charts confirming the improvement in accuracy
  • address python tests
  • benchmark

@RAMitchell
Copy link
Contributor Author

I have deleted the file cpp/test/sg/rf_treelite_test.cu for the following reasons.

  • It takes over two minutes to run very simple tests
  • It is creating directories and manipulating files, which is hard to maintain
  • There are bugs in the test setup - the data generated does not contain what the comments suggest
  • Treelite integration is well tested, property based tests convert models via rf->treelite->fil, additionally there are python level tests for treelite

@RAMitchell
Copy link
Contributor Author

Before and after charts from the experiments in #3764:
115162994-c908d700-a0fa-11eb-8d5a-cad25380b30a
rf

@RAMitchell
Copy link
Contributor Author

Some benchmark results below. The classification datasets are more accurate (statistically significant), although there may be a slight reduction in performance for airline.

gbm_bench

dataset params treatment time_elapsed train_acc train_acc_std cv_acc cv_acc_std branch train_rmse train_rmse_std cv_rmse cv_rmse_std train_r2 train_r2_std cv_r2 cv_r2_std
0 airline params1 cuml_new 511.92 0.749906 0.000191809 0.747992 0.000262846 old nan nan nan nan nan nan nan nan
1 airline params1 cuml_new 556.626 0.756004 0.000140702 0.753602 0.000243605 new nan nan nan nan nan nan nan nan
2 covtype params1 cuml_new 7.88025 0.816124 0.00085696 0.81222 0.00217357 old nan nan nan nan nan nan nan nan
3 covtype params1 cuml_new 7.88539 0.817422 0.000895598 0.813465 0.00208934 new nan nan nan nan nan nan nan nan
4 airline_regression params1 cuml_new 413.84 nan nan nan nan old 27.2991 0.00553821 27.4596 0.0503616 0.21604 0.000151055 0.206791 0.000834585
5 airline_regression params1 cuml_new 414.647 nan nan nan nan new 27.2991 0.00553821 27.4596 0.0503616 0.21604 0.000151055 0.206791 0.000834585
6 epsilon params1 cuml_new 108.708 0.740931 0.000887601 0.729875 0.0025245 old nan nan nan nan nan nan nan nan
7 epsilon params1 cuml_new 108.373 0.742499 0.000942313 0.731225 0.00239919 new nan nan nan nan nan nan nan nan
8 fraud params1 cuml_new 3.03547 0.998992 9.96128e-05 0.998977 0.000199244 old nan nan nan nan nan nan nan nan
9 fraud params1 cuml_new 3.07072 0.998992 9.96128e-05 0.998977 0.000199244 new nan nan nan nan nan nan nan nan
10 higgs params1 cuml_new 53.3062 0.735349 6.4184e-05 0.729104 0.000516602 old nan nan nan nan nan nan nan nan
11 higgs params1 cuml_new 55.5475 0.736475 7.97032e-05 0.729818 0.000538038 new nan nan nan nan nan nan nan nan
12 year params1 cuml_new 10.5822 nan nan nan nan old 9.52344 0.00887475 9.61514 0.0357043 0.242169 0.00111813 0.227483 0.0035101
13 year params1 cuml_new 10.7327 nan nan nan nan new 9.52344 0.00887475 9.61514 0.0357043 0.242169 0.00111813 0.227483 0.0035101

@RAMitchell RAMitchell changed the title [WIP] Implement vector leaf for random forest Implement vector leaf for random forest Sep 13, 2021
cpp/src/decisiontree/batched-levelalgo/builder.cuh Outdated Show resolved Hide resolved
cpp/src/randomforest/randomforest.cu Outdated Show resolved Hide resolved
cpp/include/cuml/tree/flatnode.h Outdated Show resolved Hide resolved
std::size_t max_batch_size = min(std::size_t(100000), tree->size());
rmm::device_uvector<NodeT> d_tree(max_batch_size, handle.get_stream());
rmm::device_uvector<InstanceRange> d_instance_ranges(max_batch_size, handle.get_stream());
tree->vector_leaf.resize(tree->sparsetree.size() * input.numOutputs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming, are we allocating twice the amount of memory than needed because we want to avoid prefix operation (and added complexity of leaf indices)? Twice because there would be equal number of interior nodes as there would leaves.
Since vector_leaf is part of the model which will live on beyond the training phase, is compaction of the array worth considering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think it's important, I favor the simple approach. I'm assuming that this is not related to the performance regression, will investigate that more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about the conclusion of this conversation, would array compaction be a worthy optimization in a further PR? (I doubt it would be worth it too much for the memory savings, but wanted to as anyway)

instance_ranges.data(),
tree->sparsetree.size(),
handle.get_stream());
raft::update_device(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copy operation needed? Isn't d_leaves is the output variable for the kernel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a cudaMemsetasync just so we don't get any garbage values in the unused part of the vector.

total += shist[i].x;
}
for (int i = 0; i < nclasses; i++) {
out[i] = DataT(shist[i].x) / total;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This division can be avoided by first computing the inverse of total. Not sure if compiler already figures it out. I haven't looked at SASS.
Same for other two SetLeafVector functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation is performed only once per leaf, per class. Are you sure this optimisation is necessary?

@vinaydes
Copy link
Contributor

@RAMitchell Any idea where the performance regression is coming from? Is it all from the SetLeafPredictions?

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0e770fa). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #4191   +/-   ##
===============================================
  Coverage                ?   86.07%           
===============================================
  Files                   ?      231           
  Lines                   ?    18636           
  Branches                ?        0           
===============================================
  Hits                    ?    16041           
  Misses                  ?     2595           
  Partials                ?        0           
Flag Coverage Δ
dask 47.04% <0.00%> (?)
non-dask 78.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e770fa...dfa43dd. Read the comment docs.

@RAMitchell
Copy link
Contributor Author

The slight performance regression on the benchmarks above is due to a Treelite performance issue. This is addressed in dmlc/treelite#311. I think we should proceed with merging this, and update treelite after.

@RAMitchell
Copy link
Contributor Author

Here are some updated benchmarks. Here I have changed the "max_samples" parameter to 1.0, separated training time and prediction time. 'New' includes the treelite fix. In summary, we can see a minor increase in prediction time from the change to vector leaf, but this is fairly insignificant compared to training time.
gbm_bench

@dantegd dantegd added the breaking Breaking change label Sep 16, 2021
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great to me, just had one quick question

std::size_t max_batch_size = min(std::size_t(100000), tree->size());
rmm::device_uvector<NodeT> d_tree(max_batch_size, handle.get_stream());
rmm::device_uvector<InstanceRange> d_instance_ranges(max_batch_size, handle.get_stream());
tree->vector_leaf.resize(tree->sparsetree.size() * input.numOutputs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about the conclusion of this conversation, would array compaction be a worthy optimization in a further PR? (I doubt it would be worth it too much for the memory savings, but wanted to as anyway)

@dantegd dantegd added the improvement Improvement / enhancement to an existing function label Sep 16, 2021
@RAMitchell
Copy link
Contributor Author

I don't think array compaction will affect performance and the code will get more complicated. Note that it gets compacted anyway in conversion to treelite, so gpu prediction is unaffected.

@dantegd
Copy link
Member

dantegd commented Sep 17, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d657178 into rapidsai:branch-21.10 Sep 17, 2021
@hcho3 hcho3 mentioned this pull request Sep 21, 2021
rapids-bot bot pushed a commit that referenced this pull request Sep 23, 2021
The 2.1.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#311
* dmlc/treelite#302
* dmlc/treelite#303
* dmlc/treelite#296

In particular, dmlc/treelite#311 is a critical follow-up to #4191 and addresses a performance regression.

Requires rapidsai/integration#353

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4220
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Fixes rapidsai#3764,rapidsai#2518

To do:
- post charts confirming the improvement in accuracy
- address python tests
- benchmark

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Vinay Deshpande (https://github.com/vinaydes)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4191
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The 2.1.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#311
* dmlc/treelite#302
* dmlc/treelite#303
* dmlc/treelite#296

In particular, dmlc/treelite#311 is a critical follow-up to rapidsai#4191 and addresses a performance regression.

Requires rapidsai/integration#353

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change CMake CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Random forest classification accuracy gap
4 participants