Rewrite random forest gtests #4038

RAMitchell · 2021-07-08T03:42:38Z

Use a property based testing methodology to consolidate most of the tests an extend them to a broader range of inputs. Coverage of input parameters is significantly increased and code size is way down. The majority of tests are now generated in rf_test.cu.

Some dead code is also removed.

Testing has exposed a few bugs that should be resolved in later PRs.

The max_leaves parameter is not obeyed in some cases
Hard cuda crash for n_bins > 128, presumably due to shared memory requirements
Classification algorithms are sometimes not deterministic (not 100% sure if this is expected or not)

hcho3 · 2021-07-08T03:56:18Z

property based testing methodology

Can you clarify what this means? From the quick look of it, this PR rewrites the gtests to be more like pytest, with parametrized tests and hypothesis.

RAMitchell · 2021-07-08T04:10:44Z

Here is an article from the author of hypothesis:
https://increment.com/testing/in-praise-of-property-based-testing/

In this case I'm defining the input space of parameters and data, then using a sampling algorithm to generate (potentially many) test cases. Each time we run a test, we test the outputs for consistency.

…tests

cpp/test/sg/rf_test.cu

…tests

venkywonka

LGTM, awesome PR 🙏🏻
will dig into the bugs, thank you for exposing them rory 👍🏻

robertmaynard

CMake changes LGTM

vinaydes · 2021-07-12T17:37:09Z

Classification algorithms are sometimes not deterministic (not 100% sure if this is expected or not)

@RAMitchell sparsetree vectors from two forests can be different. However the actual trees should be same. This can happen because where a node is placed in the sparsetree vector is decided by an atomicAdd operation, which is not 100% reproducible.
I ran an experiment at my end and found that the mismatch was always in the left_child_id field. Which stores the index of left child in the vector. So the trees are not really different.
Are you are observing any other irreproducibility?

RAMitchell · 2021-07-12T23:28:03Z

I have limited the google tests to around 6s to not monopolise CI time.

vinaydes

Apart from the comments I made, everything looks good to go.

vinaydes · 2021-07-19T14:07:05Z

cpp/include/cuml/tree/flatnode.h

+bool operator==(const SparseTreeNode<DataT, LabelT, IdxT>& lhs,
+                const SparseTreeNode<DataT, LabelT, IdxT>& rhs)
+{
+  return (lhs.prediction == rhs.prediction) && (lhs.colid == rhs.colid) &&


Do we expect the floating points values such as prediction etc, match exactly?

I've disabled any checks like this for now, but in the future I don't think that's an unreasonable goal for classification.

cpp/src/decisiontree/quantile/quantile.cuh

vinaydes · 2021-07-19T14:18:19Z

cpp/src/decisiontree/decisiontree.cuh

@@ -289,11 +290,6 @@ class DecisionTree {
      (std::numeric_limits<L>::is_integer) ? CRITERION::ENTROPY : CRITERION::MSE;

    validity_check(tree_params);
-    if (tree_params.n_bins > n_sampled_rows) {


Why is this check removed? Is it moved to some place else

It's not correct because the quantiles have already been computed at this stage, checked with @venkywonka on this. I don't necessarily see any reason to enforce nbins < nrows.

cpp/test/sg/rf_test.cu

vinaydes · 2021-07-19T14:53:15Z

cpp/test/sg/rf_test.cu

+  {
+    TestAccuracyImprovement();
+    // Bugs
+    // TestDeterminism();


Is classification reproducibility still a problem?

Yes. The node queue using atomic needs to be reworked in a later pr I think.

codecov-commenter · 2021-07-20T06:34:46Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@bcc4cad). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.08    #4038   +/-   ##
===============================================
  Coverage                ?   85.72%           
===============================================
  Files                   ?      230           
  Lines                   ?    18191           
  Branches                ?        0           
===============================================
  Hits                    ?    15595           
  Misses                  ?     2596           
  Partials                ?        0

Flag	Coverage Δ
dask	`48.20% <0.00%> (?)`
non-dask	`78.16% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bcc4cad...c990473. Read the comment docs.

dantegd · 2021-07-20T21:53:21Z

@gpucibot merge

Use a property based testing methodology to consolidate most of the tests an extend them to a broader range of inputs. Coverage of input parameters is significantly increased and code size is way down. The majority of tests are now generated in rf_test.cu. Some dead code is also removed. Testing has exposed a few bugs that should be resolved in later PRs. - The max_leaves parameter is not obeyed in some cases - Hard cuda crash for n_bins > 128, presumably due to shared memory requirements - Classification algorithms are sometimes not deterministic (not 100% sure if this is expected or not) Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Venkat (https://github.com/venkywonka) - Robert Maynard (https://github.com/robertmaynard) - Vinay Deshpande (https://github.com/vinaydes) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4038

Rf gtest rewrite

812d9c7

github-actions bot added CMake CUDA/C++ Cython / Python Cython or Python issue labels Jul 8, 2021

venkywonka self-requested a review July 8, 2021 03:44

RAMitchell added 2 commits July 8, 2021 16:03

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

2ac77c3

…tests

Use c++17 constexpr if

735c983

RAMitchell marked this pull request as ready for review July 9, 2021 03:56

RAMitchell requested review from a team as code owners July 9, 2021 03:56

venkywonka reviewed Jul 9, 2021

View reviewed changes

cpp/test/sg/rf_test.cu Show resolved Hide resolved

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

b077cf3

…tests

venkywonka approved these changes Jul 12, 2021

View reviewed changes

robertmaynard approved these changes Jul 12, 2021

View reviewed changes

Lint

1f88053

dantegd added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 12, 2021

RAMitchell added 4 commits July 15, 2021 16:54

Fix uninitialised

af0aede

Fix illegal memorry access

6f202ce

Fix uninitialised memory

7ce4c66

Enable regression metrics

c26d631

vinaydes approved these changes Jul 19, 2021

View reviewed changes

Review comments

c990473

dantegd approved these changes Jul 20, 2021

View reviewed changes

rapids-bot bot merged commit 5b36ced into rapidsai:branch-21.08 Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite random forest gtests #4038

Rewrite random forest gtests #4038

RAMitchell commented Jul 8, 2021 •

edited

Loading

hcho3 commented Jul 8, 2021

RAMitchell commented Jul 8, 2021

venkywonka left a comment •

edited

Loading

robertmaynard left a comment

vinaydes commented Jul 12, 2021

RAMitchell commented Jul 12, 2021

vinaydes left a comment

vinaydes Jul 19, 2021

RAMitchell Jul 19, 2021

vinaydes Jul 19, 2021

RAMitchell Jul 19, 2021

vinaydes Jul 19, 2021

RAMitchell Jul 19, 2021

codecov-commenter commented Jul 20, 2021

dantegd commented Jul 20, 2021

Rewrite random forest gtests #4038

Rewrite random forest gtests #4038

Conversation

RAMitchell commented Jul 8, 2021 • edited Loading

hcho3 commented Jul 8, 2021

RAMitchell commented Jul 8, 2021

venkywonka left a comment • edited Loading

Choose a reason for hiding this comment

robertmaynard left a comment

Choose a reason for hiding this comment

vinaydes commented Jul 12, 2021

RAMitchell commented Jul 12, 2021

vinaydes left a comment

Choose a reason for hiding this comment

vinaydes Jul 19, 2021

Choose a reason for hiding this comment

RAMitchell Jul 19, 2021

Choose a reason for hiding this comment

vinaydes Jul 19, 2021

Choose a reason for hiding this comment

RAMitchell Jul 19, 2021

Choose a reason for hiding this comment

vinaydes Jul 19, 2021

Choose a reason for hiding this comment

RAMitchell Jul 19, 2021

Choose a reason for hiding this comment

codecov-commenter commented Jul 20, 2021

Codecov Report

dantegd commented Jul 20, 2021

RAMitchell commented Jul 8, 2021 •

edited

Loading

venkywonka left a comment •

edited

Loading