You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to request a feature that enables the logging or extraction of all features considered for splits at each node during tree construction, along with their associated gain values (or impurity reductions). The goal is to access not only the best split but also the alternatives that were evaluated in order to identify and potentially trim variables that are essentially duplicative in their contribution to the model.
Motivation
This feature would help modelers better understand how LightGBM is considering features during tree-building. It could be particularly useful for feature engineering and model optimization, as it would allow practitioners to detect features that often compete for splits, meaning they are highly correlated or duplicative in their predictive power. By identifying such variables, it would be possible to simplify models, reduce dimensionality, and improve model interpretability without sacrificing accuracy.
Description
I propose adding functionality to LightGBM that would:
Log or expose all considered features and split thresholds at each node, not just the selected split.
Capture the gain (or impurity reduction) for all potential splits, allowing users to see which features were close competitors in terms of gain.
This feature could be made accessible through a custom callback, an internal API hook, or a configurable parameter that enables detailed logging of splits during model training.
This could be useful for:
Model optimization: Trimming redundant variables that offer little marginal value compared to similar features.
Feature selection: Understanding which variables frequently compete for splits can aid in feature selection or combination.
Model interpretability: Providing insights into the decision-making process of the algorithm, beyond just the final tree structure.
If this feature already exists or can be achieved through custom means (such as callbacks or hooks), please provide guidance on how to implement it.
The text was updated successfully, but these errors were encountered:
Summary
I would like to request a feature that enables the logging or extraction of all features considered for splits at each node during tree construction, along with their associated gain values (or impurity reductions). The goal is to access not only the best split but also the alternatives that were evaluated in order to identify and potentially trim variables that are essentially duplicative in their contribution to the model.
Motivation
This feature would help modelers better understand how LightGBM is considering features during tree-building. It could be particularly useful for feature engineering and model optimization, as it would allow practitioners to detect features that often compete for splits, meaning they are highly correlated or duplicative in their predictive power. By identifying such variables, it would be possible to simplify models, reduce dimensionality, and improve model interpretability without sacrificing accuracy.
Description
I propose adding functionality to LightGBM that would:
This could be useful for:
If this feature already exists or can be achieved through custom means (such as callbacks or hooks), please provide guidance on how to implement it.
The text was updated successfully, but these errors were encountered: