add chapter on validation and internal tuning #829

sebffischer · 2024-08-16T13:15:42Z

TODOs:

github-actions · 2024-08-16T13:23:35Z

Preview

sumny · 2024-08-18T17:18:01Z

minor wording suggestions (but can also be left out):

"where we would fit again and again with different iterations numbers." -> where we would fit the model again and again with different iterations numbers.
lightgbm -> LightGBM
catboost -> CatBoost
"test" to use the test set as validation data, which only works in combination with resampling and tuning." this sounds as if we would leak test data but it is just the test split from the resampling (so the validation split during HPO, correct?)
"we can no access this through the $model slit." no -> now; slit -> slot
"for training to end" -> for training to stop early
"By using early stopping, we were able to already terminate training 38 rounds. " -> terminate training after 38 rounds; (also training might have been performed actually longer, due to the patience?)
"We see that after a logloss plateaus." -> We can see that the logloss plateaus after 38 rounds.
"as it allows to the internal tuning of a Learner with (non-internal) hyperparameter" -> allows to perform internal tuning ...
"In such scenarios, what one" -> In such scenarios, one
"We also have to say" -> We also have to specify
"You can find out which ones support this feature by checking the corresponding documentation." Maybe also give one example
"as we specified validate =“test”. By visualizing the results we can see an inverse relationship between the two tuning parameters: a larger step size (eta) requires more boosting iterations (nrounds`)." Formatting is weird
"We can also prediction objects" Verb missing
"we its predict_sets field. " Verb missing
"Here can only select from those predict sets that we configured the Learner to predict on." Verb missing
"Because the penguins task" -> As
"select an evaluation metric to classification error" -> set the
"Then, show" -> Then, visualize (or print?)
"lightgbm" -> LightGBM
"xgboost" -> XGBoost
"why the code above errs" -> why the code above errrors
"Don’t tune any other parameters than the learning rate, which is possible by using tnr("internal")" somewhat unclear. Tune nrounds internally and then only tune the learning rate?

otherwise, great job!

jemus42 · 2024-09-05T11:50:31Z

I'm trying to add early stopping to the XGBoost learner in my benchmark based on this chapter, and I'm not sure whether I just misunderstand a few things or maybe the chapter could be extended in that regard.

My problem is that I'm using an AutoTuner with a given search space for tuning, but I would like to also internally use early stopping and thereby tune nrounds.

One of my naive attempts below:

library(mlr3)
library(mlr3tuning)
library(mlr3pipelines)
library(mlr3proba)
library(mlr3extralearners)

task = tsk("lung")
xgb_base = lrn("surv.xgboost.cox", 
               early_stopping_rounds = 10,
               nrounds = to_tune(upper = 1000, internal = TRUE),
               tree_method = "hist", booster = "gbtree")

xgb_glearn = po("fixfactors") %>>%
  po("imputesample", affect_columns = selector_type("factor")) %>>%
  po("encode", method = "treatment") %>>%
  po("removeconstants") %>>%
  xgb_base |>
  as_learner()

set_validate(xgb_glearn, "test")

xgb_autotuner = auto_tuner(
  learner = xgb_glearn,
  search_space = ps(
    surv.xgboost.cox.eta = p_dbl(0.001, 1, logscale = TRUE),
    surv.xgboost.cox.max_depth = p_int(1, 20),
    surv.xgboost.cox.subsample = p_dbl(0, 1),
    surv.xgboost.cox.colsample_bytree = p_dbl(0, 1),
    surv.xgboost.cox.grow_policy = p_fct(c("depthwise", "lossguide"))
  ),
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  terminator = trm("evals", n_evals = 20, k = 0),
  tuner = tnr("random_search")
)

Resulting in the not unexpected error

Error in .__AutoTuner__initialize(self = self, private = private, super = super,  : 
  If the values of the ParamSet of the Learner contain TuneTokens you cannot supply a search_space.

I'm not sure how to indicate to my AutoTuner that I would like to both

tune using the supplied search space using a given metric (not XGBoost's internal one)
have XGBoost use early stopping for nrounds under the hood

sebffischer · 2024-09-06T09:55:08Z

Also, are you aware that xgboost will use the optimal model during prediction and NOT the final model?

--> You should be less worried about a too high patience parameter (except for increased runtime I guess).

jemus42 · 2024-09-06T10:05:24Z

you are accessing the final model fit but in the final model fit there is no early stopping.

Ah right, of course, makes sense 😅
I don't think I strictly need to access those, just for now I'm trying to get a feeling for how the early stopping works and behaves.
I also found xgb_autotuner$tuning_result$internal_tuned_values[[1]]$surv.xgboost.cox.nrounds by now, so that's helpful 👍🏻

Also, are you aware that xgboost will use the optimal model during prediction and NOT the final model?

I was banking on that -- my main concern is to avoid overfitting in the benchmark, and saving some compute would be a bonus but not a must.

Thanks for the clarifications!

...

ff2b761

sebffischer and others added 14 commits August 16, 2024 15:37

...

1bef3af

...

096fbfd

...

2cc3919

add warning

1b08540

iterate

369f16d

...

2cc0c64

...

242df3f

...

11b08ae

...

93074a6

...

31b3295

...

2d07b48

...

5ba2103

...

92f46c4

typo

2a1013f

berndbischl and others added 12 commits August 22, 2024 13:10

...

ca033de

...

3b24896

...

c52d82b

...

f238ae9

...

f073db5

...

58fb88b

...

4209382

...

23bcb04

update pipelines

7787673

update measures

57eebe8

typos

386ff8f

let's hope

3a45430

sebffischer and others added 23 commits October 7, 2024 13:51

does it render now?

c1b39a0

internal tuning in manual search space

2a5834f

...

e9e4b88

...

bda53da

...

ff3a62f

...

342e428

advanced tuning chapter

ab4f51f

advanced technical text

ed75951

...

1b54a50

...

37cdb58

errata

0e5ce01

Merge branch 'main' into validation

ab47aee

update mlr3mbo

d6c714e

Merge branch 'validation' of github.com:mlr-org/mlr3book into validation

6032a62

tuning

d7e2f23

...

a9a9195

...

725a23e

...

f55a805

...

20e1b94

...

7104176

...

056f545

...

fa9b089

...

01f5f42

be-marc mentioned this pull request Nov 4, 2024

Error in advanced tuning chapter #840

Closed

berndbischl and others added 2 commits November 7, 2024 12:37

BB final edits

58220a2

...

92ae51c

sebffischer merged commit 36df925 into main Nov 7, 2024
1 check passed

sebffischer deleted the validation branch November 7, 2024 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add chapter on validation and internal tuning #829

add chapter on validation and internal tuning #829

sebffischer commented Aug 16, 2024 •

edited

Loading

github-actions bot commented Aug 16, 2024

sumny commented Aug 18, 2024

jemus42 commented Sep 5, 2024

sebffischer commented Sep 6, 2024 •

edited

Loading

jemus42 commented Sep 6, 2024

add chapter on validation and internal tuning #829

add chapter on validation and internal tuning #829

Conversation

sebffischer commented Aug 16, 2024 • edited Loading

github-actions bot commented Aug 16, 2024

sumny commented Aug 18, 2024

jemus42 commented Sep 5, 2024

sebffischer commented Sep 6, 2024 • edited Loading

jemus42 commented Sep 6, 2024

sebffischer commented Aug 16, 2024 •

edited

Loading

sebffischer commented Sep 6, 2024 •

edited

Loading