You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #18 I propose using a grid search to fit the classifier hyperparameters (notebook). We end up with average performance across cross-validation folds for many hyperparameter combinations. Here's the performance visualization from the notebook:
So the question is given a performance grid, how do we pick the optimal parameter combination? Picking just the highest performer can be a recipe for overfitting.
Here's a sklearn guide that doesn't answer my question but is still helpful. See also #19 (comment) where overfitting has been mentioned. I'm paging @antoine-lizee, who has dealt with this issue in the past, and who can hopefully provide solutions from afar as he lives in the hexagon.
The text was updated successfully, but these errors were encountered:
The sklearn documentation isn't great for describing how they define optimal parameters...they also seem to muddle usage of training/testing/holdout! See discussion about this here.
For this type of data, i think the best way to define optimal is based on "average test-set cross validation performance". Looking at the source code it looks like the closest thing to this is setting iid = True. It's the default setting so I don't think we should worry too much about overfitting if we take the max here.
It's the default setting so I don't think we should worry too much about overfitting if we take the max here.
I guess we should just see whether overfitting becomes an issue from the max-cross-validated performance criterion. I'm worried that it will, especially if we want to evaluate a large number of hyperparameter settings. The example figure above required 63 combinations. So the cruel reality of max grid search is that the more extensively you evaluate the possibilities, the more overfitting you will endure.
lambda.min is the value of λ that gives minimum mean cross-validated error. The other λ saved is lambda.1se, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min with lambda.1se above.
In my personal experience, lambda.1se produces better models. Unfortunately, for our general grid search, many settings don't have a natural directionality that would allow us to use the glmnet approach.
In #18 I propose using a grid search to fit the classifier hyperparameters (notebook). We end up with average performance across cross-validation folds for many hyperparameter combinations. Here's the performance visualization from the notebook:
So the question is given a performance grid, how do we pick the optimal parameter combination? Picking just the highest performer can be a recipe for overfitting.
Here's a sklearn guide that doesn't answer my question but is still helpful. See also #19 (comment) where overfitting has been mentioned. I'm paging @antoine-lizee, who has dealt with this issue in the past, and who can hopefully provide solutions from afar as he lives in the hexagon.
The text was updated successfully, but these errors were encountered: