-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] RF: fix stream bug causing performance regressions #4644
[REVIEW] RF: fix stream bug causing performance regressions #4644
Conversation
rerun tests |
@gpucibot merge |
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #4644 +/- ##
===============================================
Coverage ? 83.86%
===============================================
Files ? 251
Lines ? 20293
Branches ? 0
===============================================
Hits ? 17018
Misses ? 3275
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
This nanoPR fixes performance regression caused due to improper stream assignments to the decision trees. Before fix: | sno | algo | input | cu_time | cpu_time | cuml_acc | cpu_acc | speedup | n_samples | n_features | max_depth | n_estimators | n_bins | n_streams | n_jobs | n_classes | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0 | RandomForestClassifier | numpy | 32.635321855545044 | 0.0 | 0.99468 | 0.0 | 0.0 | 800000 | 64 | 8 | 500 | 128 | 4 | -1 | 2 | | 1 | RandomForestClassifier | numpy | 40.36453413963318 | 0.0 | 0.994855 | 0.0 | 0.0 | 800000 | 64 | 10 | 500 | 128 | 4 | -1 | 2 | | 2 | RandomForestClassifier | numpy | 61.35148477554321 | 0.0 | 0.99504 | 0.0 | 0.0 | 800000 | 64 | 16 | 500 | 128 | 4 | -1 | 2 | After fix: | sno | algo | input | cu_time | cpu_time | cuml_acc | cpu_acc | speedup | n_samples | n_features | max_depth | n_estimators | n_bins | n_streams | n_jobs | n_classes | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 0 | RandomForestClassifier | numpy | 28.637776374816895 | 0.0 | 0.99468 | 0.0 | 0.0 | 800000 | 64 | 8 | 500 | 128 | 4 | -1 | 2 1 | RandomForestClassifier | numpy | 34.11380743980408 | 0.0 | 0.994855 | 0.0 | 0.0 | 800000 | 64 | 10 | 500 | 128 | 4 | -1 | 2 2 | RandomForestClassifier | numpy | 47.153409481048584 | 0.0 | 0.99504 | 0.0 | 0.0 | 800000 | 64 | 16 | 500 | 128 | 4 | -1 | 2 Command run in `cuml/` ``` python python/cuml/run_benchmarks.py--num-rows 800000 --num-features 64 --skip-cpu --test-split 0.2 --cuml-param-sweep "n_bins=[128]" "n_streams=[4]" --cpu-param-sweep "n_jobs=[-1]" --param-sweep "max_depth=[8,10,16]" "n_estimators=[500]" --n-reps 1 --csv pool-2112-cls-800000.csv --dataset-param-sweep "n_classes=[2]" --dtype "fp32" --dataset classification -- RandomForestClassifier ``` Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4644
This nanoPR fixes performance regression caused due to improper stream assignments to the decision trees.
Before fix:
After fix:
Command run in
cuml/