Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of NaNs in categorial splits of LightGBM models #304

Merged
merged 4 commits into from
Jan 12, 2022

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Jul 21, 2021

Closes #277

The behavior of LightGBM when an NaN input is given to categorical splits is clarified in microsoft/LightGBM#4468. For categorical splits, the missing_type field ought to be ignored and NaNs should always map to the right child node.

@codecov
Copy link

codecov bot commented Jul 21, 2021

Codecov Report

Merging #304 (d8b1d62) into mainline (76d8e79) will decrease coverage by 0.86%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             mainline     #304      +/-   ##
==============================================
- Coverage       84.83%   83.96%   -0.87%     
  Complexity         42       42              
==============================================
  Files             108      108              
  Lines            8425     8388      -37     
  Branches           40      470     +430     
==============================================
- Hits             7147     7043     -104     
- Misses           1255     1322      +67     
  Partials           23       23              
Impacted Files Coverage Δ
src/frontend/lightgbm.cc 85.39% <ø> (-0.18%) ⬇️
python/treelite/contrib/msvc.py 0.00% <0.00%> (-67.15%) ⬇️
python/treelite/libpath.py 67.85% <0.00%> (-14.29%) ⬇️
runtime/python/treelite_runtime/libpath.py 67.85% <0.00%> (-14.29%) ⬇️
python/treelite/contrib/__init__.py 69.67% <0.00%> (-5.74%) ⬇️
python/treelite/contrib/util.py 86.17% <0.00%> (-4.26%) ⬇️
python/treelite/core.py 78.57% <0.00%> (-3.58%) ⬇️
runtime/python/treelite_runtime/util.py 92.10% <0.00%> (-2.64%) ⬇️
python/treelite/sklearn/common.py 94.11% <0.00%> (-1.13%) ⬇️
python/treelite/sklearn/gbm_regressor.py 92.85% <0.00%> (-0.90%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76d8e79...d8b1d62. Read the comment docs.

@hcho3 hcho3 merged commit 09521bc into dmlc:mainline Jan 12, 2022
@hcho3 hcho3 deleted the fix_lightgbm_categorical_split branch January 12, 2022 20:30
This was referenced Jan 13, 2022
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jan 25, 2022
The 2.2.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#314
* dmlc/treelite#322, dmlc/treelite#327
* dmlc/treelite#325
* dmlc/treelite#332
* dmlc/treelite#330
* dmlc/treelite#333
* dmlc/treelite#334
* dmlc/treelite#304
* dmlc/treelite#335

In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for #4447.

Requires rapidsai/integration#412.

EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340).

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4484
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The 2.2.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#314
* dmlc/treelite#322, dmlc/treelite#327
* dmlc/treelite#325
* dmlc/treelite#332
* dmlc/treelite#330
* dmlc/treelite#333
* dmlc/treelite#334
* dmlc/treelite#304
* dmlc/treelite#335

In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for rapidsai#4447.

Requires rapidsai/integration#412.

EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340).

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Different outputs from Treelite vs LightGBM on Categorical NaNs
1 participant