Add temporary params to Learners / Pipeops #304

berndbischl · 2018-11-09T12:18:56Z

Usecase1: you want to tune RF::mtry, but from [0, 1] as percentage, not as an integer from 1..k

Usecase 2: You have created somesmart heuristics to set params, like a,b,c which execute some code.

Proposal: You can add a HP + some piece of code to a pipeop, which maps the setting of that new param to values of already existing ones. you then can use (or tune) the new one

berndbischl · 2018-11-09T12:21:12Z

API suggestions:

PipeOP::add_hyper_par(par, map)
map = function(x, input) maps value of par to a named list of param values
par = Param object from paradox

NB: "input" is a bit problematic and needs to be properly defined. currently i see this as the input which is given to the pipeop. but we have training and test phase. i guess this the input durring the training phase, only and exactly...

berndbischl · 2018-11-09T12:21:29Z

@mllg @jakob-r @pfistfl
comments?

pfistfl · 2018-11-11T16:36:18Z

So, is this something we want to do in mlr3pipelines or in paradox?

berndbischl · 2018-11-11T19:13:55Z

So, is this something we want to do in mlr3pipelines or in paradox?

i think it must be in mlr3pipelines. some code from paradox could then then MAYBE removed. and i alos guess this eliminates ALL our problems we had with expressions.

but i would like some feedback from @mllg @jakob-r @ja-thomas

jakob-r · 2018-11-12T10:17:39Z

I like the idea. Some thoughts

i don't like map as a name, what about trafo, plain fun or reparametrize
add_hyper_par can be added at arbitrary parts of the pipe, right?
If we add a add_hyper_par before a e.g. feature selection, then then the input$p is different from the input$p we would get, if we add add_hyper_par, right?
We have to store the generated parameter values somewhere.
Can we access the complete par_set of the pipeline? add_hyper_par kind of modifies the par_set: At least it adds one param, maybe it even removes one (because it is generated within the function and therefore should not be set anymore)?
Can a parameter generated by add_hyper_par overwrite existing ones (only the ones coming from the right, or also some that are defined more left in the pipeline, given that I simply read a pipeline from left to right)?

berndbischl · 2018-11-12T13:58:36Z

i don't like map as a name, what about trafo, plain fun or reparametrize

yupp, my names were absolutely just an initial proposal. "fun" might be good for now?

add_hyper_par can be added at arbitrary parts of the pipe, right?

as it acts on a PipeOp, yes.

If we add a add_hyper_par before a e.g. feature selection, then then the input$p is different from the input$p we would get, if we add add_hyper_par, right?

sure, it takes the current input (which will be a task here) and compute on it. the "p" of the task if different, before and after feature filtering.

We have to store the generated parameter values somewhere.

good point! in the par_vals it should be stored, after it is computed i guess.

Can we access the complete par_set of the pipeline? add_hyper_par kind of modifies the par_set: At least it adds one param, maybe it even removes one (because it is generated within the function and therefore should not be set anymore)?

for a graph / pipeline of course you can ask for the complete ParamSet. This is simply the union
of all ParamSets of the included PipeOps (where names are made unique by prefixing the PipeOp-id in
front of every param name)
when you add a param (like discussed here) to a PipeOp, of course this would add (automatically, see 1 ) the param to the pipeline
not sure if we want to have stuff removed / made invalid. it is a goog point to consider.

Can a parameter generated by add_hyper_par overwrite existing ones (only the ones coming from the right, or also some that are defined more left in the pipeline, given that I simply read a pipeline from left to right)?

i dont get this question

jakob-r · 2018-11-12T14:20:47Z

i dont get this question

... -> add_hyper_par(x, input) return list(k=3) --> ... 
--> add_hyper_par(x, input) return  list(k=5) -> ... -> train(k = ?)

Question: Which k is now in effect for train?

This is just a stupid example but I could imagine that such problems can occur in more complicated pipelines. You do not even have to have two add_hyper_par steps. Just a setHyperParam (if such a thing exists) somewhere can cause conflicts.

berndbischl · 2018-11-12T14:22:40Z

i still totally dont get what you are asking... your code is imcomplete. can you please write down the complete pipeline and the pipeops?

jakob-r · 2018-11-12T14:29:35Z

It's just an imaginary example where you have defined the parameter k at two different positions of the pipeline and the question now is how that should be handled.

berndbischl · 2018-11-12T14:33:11Z

It's just an imaginary example where you have defined the parameter k at two different positions of the pipeline and the question now is how that should be handled.

i am not trying to be mean or obtuse, i really dont understand your example. can you PLEASE make it more complete? you can make stuff up as you want. it does not need to run. but right now it does not make sense

jakob-r · 2018-11-12T16:55:33Z

We talked and the design looks like that:

Example:

op1 = PipeOpLearner("classif.randomForest")
op1 = PipeOpLerner$new("classif.rf")
op1$add_temp_hyperpar(par = ParamNum$new("p.perc", fun = function(x, input) list(p = round(x * input$n.features)))

p.perc will be added to the ParSet of op1 (so it will contain both: [p, p.perc]
the result of the fun will only be visible within op1
if p.perc is not available the fun should not be triggered
if p is set and the fun overwrites we will throw an error (should be possible to switch this to a warning)
a temp_hyperparam can only be used for one PipeOp. So two pipe ops can not use the same temp hyperparam.

zzawadz · 2018-11-13T08:54:30Z

p.perc will be added to the ParSet of op1 (so it will contain both: [p, p.perc]

I think that p.perc should replace (or hide) the p param. Otherwise it might be confusing. What if I would set both parameters?

For me,it would be nice to have something like:

op1 = PipeOpXXX$new()
op1$params$[TAB] # gives me a hit about all available parametes

# however all elements of params are active bindings
op1$params$p.perc = ParamRemap$new(
 new = paradox::ParamReal$new("p.perc"), 
 replace = op1$params$p,
 function(val, task) {
   return(val * length(task$feature_names))
})

And after running the code above, the p will be hidden from op1$params, so the user will be able to remap the parameter just once.

Then, when the user would like to replace more parameters with one hyperparam, it will be as easy as:

op1 = PipeOpXXX$new()

# however all elements of params are active bindings
op1$params$p.perc = ParamRemap$new(
 new = paradox::ParamReal$new("p.perc"), 
 replace = list(
   op1$params$p.
   op1$params$p2
  ),
 function(val, task) {
   return(val * length(task$feature_names), val * sqrt(length(task$feature_names)))
})

And in this case both p, and p2 will be hidden from params, to prevent further remapping based on them.

jakob-r · 2018-11-13T10:20:34Z

This has some disadvantages

The user has to correctly define which prams are replaced.
It is error-prone, because you can mix up the order of the params
It is ugly that the information is at two different places (the result of the function, and the replace arugment), see the above point

Bernd proposed something like this:

op1$add_temp_hyperpar(
...
old.p = function(val, task),
old.p2 = function(val, taks)
)

Then you directly have the information which param get's replaced but you have to write a function for each and it will get tiresome to set multiple params based on one single temporary one.

Although I see that it is obviously more elegant to hide the parameter that gets overwritten I think it is too much effort. And maybe sometimes you want to set the param directly to a specific value and don't want to modify the pipeline. who knows. simply throwing an error is the easiest solution that allows the most flexibility and needs less coding.

pfistfl · 2018-11-13T10:56:42Z

I am still not convinced, that replacing parameters is something we should do in mlr3pipelines and not paradox.

This is more or less an operation that works on ParamSets or Params.

op1$add_temp_hyperpar(
  paradox::paramReal$new("mtry.perc", lower = 0.1, upper = 1,
    overwrites = "mtry",
    transformer = function(val, task) val * length(task$feature_names)),
  paradox::paramReal$new("p2", lower = 0.1, upper = 1,
    overwrites = "p",
    transformer = function(val, task) val / length(task$feature_names))
)

This would IMHO be concise and sensible.

I am not sure I agree with points raised by @jakob-r:

We always have to make sure, that replaced params indeed exist. We could also actually check for
spelling.
I get the order thing, we should not have this.

And maybe sometimes you want to set the param directly to a specific

Then you just set it?

simply throwing an error is the easiest solution that allows the most flexibility and needs less coding.

I get this, but we might want to be as robust as we sensibly can.

zzawadz · 2018-11-13T11:42:28Z

This is more or less an operation that works on ParamSets or Params.

100% agree. We could even modify the op$par_set directly:

op$par_set$add_hyperpar(
    paradox::paramReal$new("mtry.perc", lower = 0.1, upper = 1,
    overwrites = "mtry",
    transformer = function(val, task) val * length(task$feature_names
)

It's nice composition.

berndbischl · 2019-01-12T18:14:33Z

i discussed this with martin

idea was this:
we could simply try to handle this in the trafo, of the param set.

ps = ParamSet$new(list(ParamDbl$new("mtry.perc", lower = 0, upper = 1))
ps$trafo = function(x, param_set, task) {
  x$mtry = round(x$mtry.perc * task$p)
  x$mtry.perc = NULL
   return(x)
}

Problem is: when do we call the trafo? in a pipeline we have multiple steps.....
and that operation would need to be called EXACTLY when we go into the training of the op,
that a specific param refers to.

mb706 transferred this issue from mlr-org/mlr3pipelines Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add temporary params to Learners / Pipeops #304

Add temporary params to Learners / Pipeops #304

berndbischl commented Nov 9, 2018

berndbischl commented Nov 9, 2018 •

edited

Loading

berndbischl commented Nov 9, 2018

pfistfl commented Nov 11, 2018

berndbischl commented Nov 11, 2018

jakob-r commented Nov 12, 2018

berndbischl commented Nov 12, 2018

jakob-r commented Nov 12, 2018

berndbischl commented Nov 12, 2018

jakob-r commented Nov 12, 2018

berndbischl commented Nov 12, 2018

jakob-r commented Nov 12, 2018

zzawadz commented Nov 13, 2018

jakob-r commented Nov 13, 2018

pfistfl commented Nov 13, 2018 •

edited by jakob-r

Loading

zzawadz commented Nov 13, 2018

berndbischl commented Jan 12, 2019

Add temporary params to Learners / Pipeops #304

Add temporary params to Learners / Pipeops #304

Comments

berndbischl commented Nov 9, 2018

berndbischl commented Nov 9, 2018 • edited Loading

berndbischl commented Nov 9, 2018

pfistfl commented Nov 11, 2018

berndbischl commented Nov 11, 2018

jakob-r commented Nov 12, 2018

berndbischl commented Nov 12, 2018

jakob-r commented Nov 12, 2018

berndbischl commented Nov 12, 2018

jakob-r commented Nov 12, 2018

berndbischl commented Nov 12, 2018

jakob-r commented Nov 12, 2018

zzawadz commented Nov 13, 2018

jakob-r commented Nov 13, 2018

pfistfl commented Nov 13, 2018 • edited by jakob-r Loading

zzawadz commented Nov 13, 2018

berndbischl commented Jan 12, 2019

berndbischl commented Nov 9, 2018 •

edited

Loading

pfistfl commented Nov 13, 2018 •

edited by jakob-r

Loading