Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weighted_n_node_samples field in sklearn importer #330

Merged
merged 1 commit into from
Jan 6, 2022

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Dec 20, 2021

Scikit-learn tree models has two fields for sample counts in nodes: n_node_samples (unweighted count, int64) and weighted_n_node_samples (weighted count, float64). So far, Treelite kept only the unweighted count and discarded the weighted count.

This PR stores weighted_n_node_samples in the Treelite object, using the sum_hess field.

Required by rapidsai/cuml#4447

@codecov
Copy link

codecov bot commented Dec 20, 2021

Codecov Report

Merging #330 (5c6861a) into mainline (5d80909) will decrease coverage by 0.89%.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff               @@
##             mainline     #330      +/-   ##
==============================================
- Coverage       85.06%   84.16%   -0.90%     
  Complexity         42       42              
==============================================
  Files             108      108              
  Lines            8374     8350      -24     
  Branches           40       40              
==============================================
- Hits             7123     7028      -95     
- Misses           1228     1299      +71     
  Partials           23       23              
Impacted Files Coverage Δ
include/treelite/frontend.h 90.00% <ø> (ø)
python/treelite/sklearn/importer.py 94.17% <100.00%> (+0.11%) ⬆️
src/c_api/c_api.cc 94.48% <100.00%> (ø)
src/frontend/sklearn.cc 100.00% <100.00%> (ø)
python/treelite/contrib/msvc.py 0.00% <0.00%> (-67.15%) ⬇️
python/treelite/libpath.py 67.85% <0.00%> (-14.29%) ⬇️
runtime/python/treelite_runtime/libpath.py 67.85% <0.00%> (-14.29%) ⬇️
python/treelite/contrib/__init__.py 69.67% <0.00%> (-5.74%) ⬇️
python/treelite/contrib/util.py 86.17% <0.00%> (-4.26%) ⬇️
python/treelite/core.py 78.57% <0.00%> (-3.58%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d80909...5c6861a. Read the comment docs.

Copy link

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@hcho3 hcho3 merged commit 8b7105e into dmlc:mainline Jan 6, 2022
@hcho3 hcho3 deleted the add_weighted_sample_cnt branch January 6, 2022 20:14
This was referenced Jan 13, 2022
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jan 25, 2022
The 2.2.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#314
* dmlc/treelite#322, dmlc/treelite#327
* dmlc/treelite#325
* dmlc/treelite#332
* dmlc/treelite#330
* dmlc/treelite#333
* dmlc/treelite#334
* dmlc/treelite#304
* dmlc/treelite#335

In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for #4447.

Requires rapidsai/integration#412.

EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340).

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4484
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jan 26, 2022
Add support for:
- [x] cuML RF classifiers
- [x] scikit-learn RF regressors
- [x] scikit-learn RF classifiers

TODOs
- [x] Add test cases
- [x] De-duplicate path extraction logic

Requires dmlc/treelite#330

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4447
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The 2.2.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#314
* dmlc/treelite#322, dmlc/treelite#327
* dmlc/treelite#325
* dmlc/treelite#332
* dmlc/treelite#330
* dmlc/treelite#333
* dmlc/treelite#334
* dmlc/treelite#304
* dmlc/treelite#335

In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for rapidsai#4447.

Requires rapidsai/integration#412.

EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340).

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4484
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…#4447)

Add support for:
- [x] cuML RF classifiers
- [x] scikit-learn RF regressors
- [x] scikit-learn RF classifiers

TODOs
- [x] Add test cases
- [x] De-duplicate path extraction logic

Requires dmlc/treelite#330

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants