-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] add support for specifying training indices in lgb.cv() #3924
Comments
Hi @julioasotodv . Thanks for using LightGBM, and for taking the time to write up this excellent feature request! Adding a few more details for whenever this is picked up
I think this example from that Stack Overflow question describes the situation well
I've added this feature to #2302, where we organize the list of feature requests. I've also edited the title so that it's understandable for people who don't have prior knowledge of the @julioasotodv are you interested in contributing this feature? |
As seen in issue microsoft#3924
@jameslamb sure! I just did. I will create a PR soon! |
At the moment, this feature is not being actively worked on (see #3989 (comment)). Per the current policy in this repository, I'm going to close this GitHub issue but keep the feature marked as open in #2302. Anyone who is interested in contributing this feature or who wants to add something to this discussion is encouraged to comment here, and the issue can be re-opened. |
Summary
The addition of a
train_folds
argument inlgb.cv()
would allow for more fine-grained folds generation that is useful in some scenarios, such as time series forecasting (just like xgboost R package does).Motivation
The R function
lgb.cv()
currently has got an argument that allows you to specify manual folds through thefolds
argument. This argument expects alist
of indices that should go to the test set for each fold, and all the other indices will go to the train set.However, in some types of datasets and tasks (such as in time series), you may actually want to have different folds where some indices are just not used for that specific fold, neither in the train or test sets (just to avoid leaking information from the future).
Description
The Xgboost R package included this feature a while back, which essentially consists of adding one more argument to the
cv()
function calledtrain_folds
. This way,train_folds
, if specified, makes sure that only those indices will go to the train set in each fold. If it is not specified, the train indices will just be the opposite of the ones in thefolds
argument, just likelgb.cv()
works right now.References
Please see the
train_folds
argument inxgb.cv()
here, and the relevant code in xgb can be found hereThank you!
The text was updated successfully, but these errors were encountered: