-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stump has no leaf_count inside dump_model() output #5962
Comments
Thabks very much for the write-up! This request makes sense to me, and I'd support adfing that information at the root node. Are you interested in working on this and submitting a pull request? If not, we can take this as a feature request in the project's backlog and subscribing here will notify you when someone picks it up. |
Great! Thank you @jameslamb , I'll take a stab at a PR, I'll report back if it doesn't work out. |
@thatlittleboy Your post here led me to go look at the
I'd be happy to come help out with some things if you need more assistance. Feel free to |
Awesome, thanks for the offer! Myself, @connortann, and @dsgibbons have been working on getting Off the top of my head, I think we require additional expertise in C extensions (our If we get specific issues on these (as well as lightgbm-core, of course), we'll be sure to ping you. Thanks once again. |
This was fixed by #6569. |
Description
I'll preface this by saying I'm not quite sure if this is a bug, just that it's a little bit of an inconsistent API.
When we have a stump (single-node tree), the
.dump_model()
dictionary output doesn't contain anleaf_count
. It only has aleaf_values
. I'm wondering why that is. It should be possible to assign a count (i.e., the number of samples that was used to train the model) to the root node or am I mistaken?Reproducible example
(I'm intentionally creating a stump here..)
The output is something like this:
Note how there isn't an
leaf_count
inside thetree_info
, which exists normally if the tree were allowed to grow to higher depths. (Just bump upn_samples
to 5000 above (say), and inspect the dump output to see what I mean).Thank you!
Environment info
LightGBM version or commit hash: 3.3.5
Command(s) you used to install LightGBM
python 3.10, MacOS 12.5.1
Additional Comments
This is a bit of an edge case, but I would still appreciate if you could somehow unify the API (dump_model output) slightly wherever reasonable.
The background is that I'm working a bug fix for the
shap
package, and we are parsing thedump_model()
output. Right now, if we receive a stump tree, we aren't able to recover the number of samples that was used to train the model (i.e., count in the root node) unless we get the user to pass in their training data.The text was updated successfully, but these errors were encountered: