-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Store data frequencies in tree nodes of RF #3647
Conversation
…ort instance count
rerun tests |
Rerun tests |
The Naive Bayes tests keep failing because the dataset download times out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I just had tiny comments.
check_instance_count_for_non_leaf(tree['children'][0]) | ||
check_instance_count_for_non_leaf(tree['children'][1]) | ||
|
||
for tree in json_obj: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good but seems like it might be slow for deep trees. We are currently testing a lot of parameter variants here. How slow is the test overall? Should we reduce number of param variants or only do this test for some combos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It completes in about a minute on my machine.
@gpucibot merge |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #3647 +/- ##
===============================================
+ Coverage 80.87% 81.15% +0.28%
===============================================
Files 228 228
Lines 17630 17813 +183
===============================================
+ Hits 14258 14457 +199
+ Misses 3372 3356 -16
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
The RF model should store the number of data points associated with each tree node. This information is useful in many ways, including:
To that end, this PR does the following:
instance_count
field in theSparseTreeNode
structureinstance_count
field in the JSON dumpNote that this feature will work with the new backend only. If the old backend is used,
instance_count
field will be absent in the JSON dump.Closes #3131