Implement vector leaf for random forest #4191

RAMitchell · 2021-09-07T03:26:22Z

Fixes #3764,#2518

To do:

post charts confirming the improvement in accuracy
address python tests
benchmark

…tests

…shmem-bins

…vector-leaf

RAMitchell · 2021-09-09T01:24:54Z

I have deleted the file cpp/test/sg/rf_treelite_test.cu for the following reasons.

It takes over two minutes to run very simple tests
It is creating directories and manipulating files, which is hard to maintain
There are bugs in the test setup - the data generated does not contain what the comments suggest
Treelite integration is well tested, property based tests convert models via rf->treelite->fil, additionally there are python level tests for treelite

RAMitchell · 2021-09-12T23:57:04Z

Before and after charts from the experiments in #3764:

RAMitchell · 2021-09-13T03:25:30Z

Some benchmark results below. The classification datasets are more accurate (statistically significant), although there may be a slight reduction in performance for airline.

	dataset	params	treatment	time_elapsed	train_acc	train_acc_std	cv_acc	cv_acc_std	branch	train_rmse	train_rmse_std	cv_rmse	cv_rmse_std	train_r2	train_r2_std	cv_r2	cv_r2_std
0	airline	params1	cuml_new	511.92	0.749906	0.000191809	0.747992	0.000262846	old	nan	nan	nan	nan	nan	nan	nan	nan
1	airline	params1	cuml_new	556.626	0.756004	0.000140702	0.753602	0.000243605	new	nan	nan	nan	nan	nan	nan	nan	nan
2	covtype	params1	cuml_new	7.88025	0.816124	0.00085696	0.81222	0.00217357	old	nan	nan	nan	nan	nan	nan	nan	nan
3	covtype	params1	cuml_new	7.88539	0.817422	0.000895598	0.813465	0.00208934	new	nan	nan	nan	nan	nan	nan	nan	nan
4	airline_regression	params1	cuml_new	413.84	nan	nan	nan	nan	old	27.2991	0.00553821	27.4596	0.0503616	0.21604	0.000151055	0.206791	0.000834585
5	airline_regression	params1	cuml_new	414.647	nan	nan	nan	nan	new	27.2991	0.00553821	27.4596	0.0503616	0.21604	0.000151055	0.206791	0.000834585
6	epsilon	params1	cuml_new	108.708	0.740931	0.000887601	0.729875	0.0025245	old	nan	nan	nan	nan	nan	nan	nan	nan
7	epsilon	params1	cuml_new	108.373	0.742499	0.000942313	0.731225	0.00239919	new	nan	nan	nan	nan	nan	nan	nan	nan
8	fraud	params1	cuml_new	3.03547	0.998992	9.96128e-05	0.998977	0.000199244	old	nan	nan	nan	nan	nan	nan	nan	nan
9	fraud	params1	cuml_new	3.07072	0.998992	9.96128e-05	0.998977	0.000199244	new	nan	nan	nan	nan	nan	nan	nan	nan
10	higgs	params1	cuml_new	53.3062	0.735349	6.4184e-05	0.729104	0.000516602	old	nan	nan	nan	nan	nan	nan	nan	nan
11	higgs	params1	cuml_new	55.5475	0.736475	7.97032e-05	0.729818	0.000538038	new	nan	nan	nan	nan	nan	nan	nan	nan
12	year	params1	cuml_new	10.5822	nan	nan	nan	nan	old	9.52344	0.00887475	9.61514	0.0357043	0.242169	0.00111813	0.227483	0.0035101
13	year	params1	cuml_new	10.7327	nan	nan	nan	nan	new	9.52344	0.00887475	9.61514	0.0357043	0.242169	0.00111813	0.227483	0.0035101

cpp/src/decisiontree/batched-levelalgo/builder.cuh

cpp/src/randomforest/randomforest.cu

cpp/include/cuml/tree/flatnode.h

vinaydes · 2021-09-13T13:09:12Z

cpp/src/decisiontree/batched-levelalgo/builder.cuh

-    std::size_t max_batch_size = min(std::size_t(100000), tree->size());
-    rmm::device_uvector<NodeT> d_tree(max_batch_size, handle.get_stream());
-    rmm::device_uvector<InstanceRange> d_instance_ranges(max_batch_size, handle.get_stream());
+    tree->vector_leaf.resize(tree->sparsetree.size() * input.numOutputs);


Just confirming, are we allocating twice the amount of memory than needed because we want to avoid prefix operation (and added complexity of leaf indices)? Twice because there would be equal number of interior nodes as there would leaves.
Since vector_leaf is part of the model which will live on beyond the training phase, is compaction of the array worth considering?

Personally I don't think it's important, I favor the simple approach. I'm assuming that this is not related to the performance regression, will investigate that more.

Just wondering about the conclusion of this conversation, would array compaction be a worthy optimization in a further PR? (I doubt it would be worth it too much for the memory savings, but wanted to as anyway)

vinaydes · 2021-09-13T13:10:23Z

cpp/src/decisiontree/batched-levelalgo/builder.cuh

+                        instance_ranges.data(),
+                        tree->sparsetree.size(),
+                        handle.get_stream());
+    raft::update_device(


Is this copy operation needed? Isn't d_leaves is the output variable for the kernel?

Changed to a cudaMemsetasync just so we don't get any garbage values in the unused part of the vector.

vinaydes · 2021-09-13T13:25:22Z

cpp/src/decisiontree/batched-levelalgo/metrics.cuh

+      total += shist[i].x;
+    }
+    for (int i = 0; i < nclasses; i++) {
+      out[i] = DataT(shist[i].x) / total;


This division can be avoided by first computing the inverse of total. Not sure if compiler already figures it out. I haven't looked at SASS.
Same for other two SetLeafVector functions.

This calculation is performed only once per leaf, per class. Are you sure this optimisation is necessary?

vinaydes · 2021-09-13T13:38:21Z

@RAMitchell Any idea where the performance regression is coming from? Is it all from the SetLeafPredictions?

codecov-commenter · 2021-09-14T04:28:13Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0e770fa). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.10    #4191   +/-   ##
===============================================
  Coverage                ?   86.07%           
===============================================
  Files                   ?      231           
  Lines                   ?    18636           
  Branches                ?        0           
===============================================
  Hits                    ?    16041           
  Misses                  ?     2595           
  Partials                ?        0

Flag	Coverage Δ
dask	`47.04% <0.00%> (?)`
non-dask	`78.73% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e770fa...dfa43dd. Read the comment docs.

RAMitchell · 2021-09-16T01:24:21Z

The slight performance regression on the benchmarks above is due to a Treelite performance issue. This is addressed in dmlc/treelite#311. I think we should proceed with merging this, and update treelite after.

RAMitchell · 2021-09-16T03:52:07Z

Here are some updated benchmarks. Here I have changed the "max_samples" parameter to 1.0, separated training time and prediction time. 'New' includes the treelite fix. In summary, we can see a minor increase in prediction time from the change to vector leaf, but this is fairly insignificant compared to training time.

dantegd

Changes look great to me, just had one quick question

dantegd · 2021-09-16T21:41:43Z

cpp/src/decisiontree/batched-levelalgo/builder.cuh

-    std::size_t max_batch_size = min(std::size_t(100000), tree->size());
-    rmm::device_uvector<NodeT> d_tree(max_batch_size, handle.get_stream());
-    rmm::device_uvector<InstanceRange> d_instance_ranges(max_batch_size, handle.get_stream());
+    tree->vector_leaf.resize(tree->sparsetree.size() * input.numOutputs);


Just wondering about the conclusion of this conversation, would array compaction be a worthy optimization in a further PR? (I doubt it would be worth it too much for the memory savings, but wanted to as anyway)

RAMitchell · 2021-09-16T23:30:29Z

I don't think array compaction will affect performance and the code will get more complicated. Note that it gets compacted anyway in conversion to treelite, so gpu prediction is unaffected.

dantegd · 2021-09-17T01:19:06Z

@gpucibot merge

The 2.1.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#311 * dmlc/treelite#302 * dmlc/treelite#303 * dmlc/treelite#296 In particular, dmlc/treelite#311 is a critical follow-up to #4191 and addresses a performance regression. Requires rapidsai/integration#353 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4220

Fixes rapidsai#3764,rapidsai#2518 To do: - post charts confirming the improvement in accuracy - address python tests - benchmark Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Vinay Deshpande (https://github.com/vinaydes) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4191

The 2.1.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#311 * dmlc/treelite#302 * dmlc/treelite#303 * dmlc/treelite#296 In particular, dmlc/treelite#311 is a critical follow-up to rapidsai#4191 and addresses a performance regression. Requires rapidsai/integration#353 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4220

RAMitchell added 30 commits July 7, 2021 20:35

Rf gtest rewrite

812d9c7

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

2ac77c3

…tests

Use c++17 constexpr if

735c983

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

b077cf3

…tests

Constructor

e54fb6d

Lint

1f88053

Fix uninitialised

af0aede

Merge branch 'tests' of github.com:RAMitchell/cuml into shmem-bins

8e57f41

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

3c9a27b

…shmem-bins

Refactor

14cb0f5

Predict after

26362b0

Node queue with work items

bc0934f

Fix tests

f288814

Merge branch 'branch-21.10' of https://github.com/rapidsai/cuml into …

c7102ea

…shmem-bins

Relax accuracy test

9a0edb9

Include checker

6328fe8

Node instance counts

ce3221c

Address review comments

5019199

Batch leaf calculations

919244b

Removing node

a592b12

Merge branch 'branch-21.10' of https://github.com/rapidsai/cuml into …

7f9c219

…vector-leaf

Pre-merge

6f76bee

Merge branch 'branch-21.10' of https://github.com/rapidsai/cuml into …

93e6d1d

…vector-leaf

Remove Node

a274aa2

Reduce boilerplate

bb61c47

Refactor

471bda4

More refactoring

7ed733f

Change 32 bit index types to std::size_t

e9525c3

Style

906dff8

Update nvtx marker

8ebef30

Update python tests

f192c03

RAMitchell force-pushed the vector-leaf2 branch from 2954020 to f192c03 Compare September 9, 2021 01:20

RAMitchell requested review from a team as code owners September 9, 2021 01:20

RAMitchell added 3 commits September 9, 2021 18:48

Fix gtest false negatives

ee1cb59

Make prediction non-recursive

5167bd9

Fix dask test

96abb0c

RAMitchell changed the title ~~[WIP] Implement vector leaf for random forest~~ Implement vector leaf for random forest Sep 13, 2021

vinaydes suggested changes Sep 13, 2021

View reviewed changes

Review comments

dfa43dd

vinaydes approved these changes Sep 16, 2021

View reviewed changes

dantegd added the breaking Breaking change label Sep 16, 2021

dantegd approved these changes Sep 16, 2021

View reviewed changes

dantegd added the improvement Improvement / enhancement to an existing function label Sep 16, 2021

rapids-bot bot merged commit d657178 into rapidsai:branch-21.10 Sep 17, 2021

hcho3 mentioned this pull request Sep 21, 2021

Upgrade Treelite to 2.1.0 #4220

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement vector leaf for random forest #4191

Implement vector leaf for random forest #4191

RAMitchell commented Sep 7, 2021

RAMitchell commented Sep 9, 2021

RAMitchell commented Sep 12, 2021

RAMitchell commented Sep 13, 2021

vinaydes Sep 13, 2021

RAMitchell Sep 14, 2021

dantegd Sep 16, 2021

vinaydes Sep 13, 2021

RAMitchell Sep 14, 2021

vinaydes Sep 13, 2021

RAMitchell Sep 14, 2021

vinaydes commented Sep 13, 2021

codecov-commenter commented Sep 14, 2021

RAMitchell commented Sep 16, 2021

RAMitchell commented Sep 16, 2021

dantegd left a comment

dantegd Sep 16, 2021

RAMitchell commented Sep 16, 2021

dantegd commented Sep 17, 2021

Implement vector leaf for random forest #4191

Implement vector leaf for random forest #4191

Conversation

RAMitchell commented Sep 7, 2021

RAMitchell commented Sep 9, 2021

RAMitchell commented Sep 12, 2021

RAMitchell commented Sep 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinaydes commented Sep 13, 2021

codecov-commenter commented Sep 14, 2021

Codecov Report

RAMitchell commented Sep 16, 2021

RAMitchell commented Sep 16, 2021

dantegd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RAMitchell commented Sep 16, 2021

dantegd commented Sep 17, 2021