-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add chapter on validation and internal tuning #829
Conversation
minor wording suggestions (but can also be left out):
otherwise, great job! |
I'm trying to add early stopping to the XGBoost learner in my benchmark based on this chapter, and I'm not sure whether I just misunderstand a few things or maybe the chapter could be extended in that regard. My problem is that I'm using an One of my naive attempts below: library(mlr3)
library(mlr3tuning)
library(mlr3pipelines)
library(mlr3proba)
library(mlr3extralearners)
task = tsk("lung")
xgb_base = lrn("surv.xgboost.cox",
early_stopping_rounds = 10,
nrounds = to_tune(upper = 1000, internal = TRUE),
tree_method = "hist", booster = "gbtree")
xgb_glearn = po("fixfactors") %>>%
po("imputesample", affect_columns = selector_type("factor")) %>>%
po("encode", method = "treatment") %>>%
po("removeconstants") %>>%
xgb_base |>
as_learner()
set_validate(xgb_glearn, "test")
xgb_autotuner = auto_tuner(
learner = xgb_glearn,
search_space = ps(
surv.xgboost.cox.eta = p_dbl(0.001, 1, logscale = TRUE),
surv.xgboost.cox.max_depth = p_int(1, 20),
surv.xgboost.cox.subsample = p_dbl(0, 1),
surv.xgboost.cox.colsample_bytree = p_dbl(0, 1),
surv.xgboost.cox.grow_policy = p_fct(c("depthwise", "lossguide"))
),
resampling = rsmp("cv", folds = 3),
measure = msr("surv.cindex"),
terminator = trm("evals", n_evals = 20, k = 0),
tuner = tnr("random_search")
) Resulting in the not unexpected error
I'm not sure how to indicate to my
|
Also, are you aware that xgboost will use the optimal model during prediction and NOT the final model? --> You should be less worried about a too high patience parameter (except for increased runtime I guess). |
Ah right, of course, makes sense 😅
I was banking on that -- my main concern is to avoid overfitting in the benchmark, and saving some compute would be a bonus but not a must. Thanks for the clarifications! |
TODOs:
in_tune_fn
).* mlr3learners: BREAKING_CHANGE(xgboost): stricter checks on eval_metric mlr3learners#306
* mlr3extralearners: stricter metric checks when using internal tuning mlr3extralearners#376
set_internal_tuning()
: created an issue in mlr3$divide()
:predict_sets = NULL
when one tunes internal valid score