Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance validation in Learners and Task #597

Closed
RaphaelS1 opened this issue Jan 21, 2021 · 1 comment
Closed

Enhance validation in Learners and Task #597

RaphaelS1 opened this issue Jan 21, 2021 · 1 comment

Comments

@RaphaelS1
Copy link
Contributor

RaphaelS1 commented Jan 21, 2021

Now I've looked at documentation in more detail. I'd like to suggest two minor enhancements for validation in Task and Learner:

  1. There are many cases where a validation dataset may be required by a learner in order to make use of internal processes such as early stopping (e.g. GBMs, NNs). As validation is a common use-case, it may be worthwhile adding a simple public method to Task that creates arbitrary train/validation splits that can be passed to learner hyper-parameters, e.g.
split_validation = function(prob) {
  self$row_roles$validation = sample(seq(self$nrow), self$nrow * prob)
}

Whilst this is clearly a thin wrapper around a basic function that could be performed by a user, it still requires user knowledge about row_roles and where to find validation splits (which aren't too well documented).

Alternatively if you don't agree this is worthwhile then I'd suggest adding an example to mlr-org/mlr3book#201, e.g.

> set.seed(1)
> t = tsk("mtcars")
> t$row_roles$validation = sample(seq(t$nrow), t$nrow * 0.3)
> t$row_roles$validation
[1] 25  4  7  1  2 23 11 14 18
  1. Add 'validation' to learner and task properties (similar to 'weights'). Doing so would allow a more efficient method of implementing validation datasets in learners that can handle them, as currently we assume the user will pass in the validation data in the correct format, which is inconsistent with the general task interface, i.e. it is inconsistent for a user to pass Task for training data but a data object for validation data. This would also highlight to users when they can/cannot set the validation role.

EDIT: Fixed examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants