Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preventing overfitting when evaluating many hyperparameters #20

Open
dhimmel opened this issue Jul 28, 2016 · 2 comments
Open

Preventing overfitting when evaluating many hyperparameters #20

dhimmel opened this issue Jul 28, 2016 · 2 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Jul 28, 2016

In #18 I propose using a grid search to fit the classifier hyperparameters (notebook). We end up with average performance across cross-validation folds for many hyperparameter combinations. Here's the performance visualization from the notebook:

cross-validated performance grid

So the question is given a performance grid, how do we pick the optimal parameter combination? Picking just the highest performer can be a recipe for overfitting.

Here's a sklearn guide that doesn't answer my question but is still helpful. See also #19 (comment) where overfitting has been mentioned. I'm paging @antoine-lizee, who has dealt with this issue in the past, and who can hopefully provide solutions from afar as he lives in the hexagon.

@gwaybio
Copy link
Member

gwaybio commented Jul 29, 2016

The sklearn documentation isn't great for describing how they define optimal parameters...they also seem to muddle usage of training/testing/holdout! See discussion about this here.

For this type of data, i think the best way to define optimal is based on "average test-set cross validation performance". Looking at the source code it looks like the closest thing to this is setting iid = True. It's the default setting so I don't think we should worry too much about overfitting if we take the max here.

@dhimmel
Copy link
Member Author

dhimmel commented Aug 1, 2016

It's the default setting so I don't think we should worry too much about overfitting if we take the max here.

I guess we should just see whether overfitting becomes an issue from the max-cross-validated performance criterion. I'm worried that it will, especially if we want to evaluate a large number of hyperparameter settings. The example figure above required 63 combinations. So the cruel reality of max grid search is that the more extensively you evaluate the possibilities, the more overfitting you will endure.

For example, in the R glmnet package there are two builtin options for choosing the regularization strength from cross-validaton:

lambda.min is the value of λ that gives minimum mean cross-validated error. The other λ saved is lambda.1se, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min with lambda.1se above.

In my personal experience, lambda.1se produces better models. Unfortunately, for our general grid search, many settings don't have a natural directionality that would allow us to use the glmnet approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants