Preventing overfitting when evaluating many hyperparameters #20

dhimmel · 2016-07-28T16:08:45Z

In #18 I propose using a grid search to fit the classifier hyperparameters (notebook). We end up with average performance across cross-validation folds for many hyperparameter combinations. Here's the performance visualization from the notebook:

So the question is given a performance grid, how do we pick the optimal parameter combination? Picking just the highest performer can be a recipe for overfitting.

Here's a sklearn guide that doesn't answer my question but is still helpful. See also #19 (comment) where overfitting has been mentioned. I'm paging @antoine-lizee, who has dealt with this issue in the past, and who can hopefully provide solutions from afar as he lives in the hexagon.

gwaybio · 2016-07-29T20:42:15Z

The sklearn documentation isn't great for describing how they define optimal parameters...they also seem to muddle usage of training/testing/holdout! See discussion about this here.

For this type of data, i think the best way to define optimal is based on "average test-set cross validation performance". Looking at the source code it looks like the closest thing to this is setting iid = True. It's the default setting so I don't think we should worry too much about overfitting if we take the max here.

dhimmel · 2016-08-01T14:45:12Z

It's the default setting so I don't think we should worry too much about overfitting if we take the max here.

I guess we should just see whether overfitting becomes an issue from the max-cross-validated performance criterion. I'm worried that it will, especially if we want to evaluate a large number of hyperparameter settings. The example figure above required 63 combinations. So the cruel reality of max grid search is that the more extensively you evaluate the possibilities, the more overfitting you will endure.

For example, in the R glmnet package there are two builtin options for choosing the regularization strength from cross-validaton:

lambda.min is the value of λ that gives minimum mean cross-validated error. The other λ saved is lambda.1se, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min with lambda.1se above.

In my personal experience, lambda.1se produces better models. Unfortunately, for our general grid search, many settings don't have a natural directionality that would allow us to use the glmnet approach.

dhimmel mentioned this issue Feb 15, 2017

Multiple comparisons problems #83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preventing overfitting when evaluating many hyperparameters #20

Preventing overfitting when evaluating many hyperparameters #20

dhimmel commented Jul 28, 2016 •

edited

Loading

gwaybio commented Jul 29, 2016

dhimmel commented Aug 1, 2016

Preventing overfitting when evaluating many hyperparameters #20

Preventing overfitting when evaluating many hyperparameters #20

Comments

dhimmel commented Jul 28, 2016 • edited Loading

gwaybio commented Jul 29, 2016

dhimmel commented Aug 1, 2016

dhimmel commented Jul 28, 2016 •

edited

Loading