-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the new scale_penalty_with_samples=true doing? #124
Comments
It's a convention on the objective function; the reason is to have the scale of the loss and the penalty be on the same grounds (so that if you have twice as much data, you don't have to change the regularisation) In the case of ridge for instance:
then this is equivalent to multiplying by |
ah no it's not you it's @jbrea maybe he can chip in if you have further questions. Note: in any case I think that parameter is best obtained via hyperparameter optimisation. |
Thanks for the explanation @tlienart, I think users (like me) will expect that the default behavior of the algorithm, especially as simple as a logistic regression, is to work out of the box and provide a reasonnable fit with the default hyperparameters. This new hyperparameter does seem to mess things up as far as I can see, the output is almost like a random biased coin toss. It would probably make more sense to default as false doesn't it? Moreover this would have been a non breaking change from 0.5.7 if I followed correctly the history. |
Please have a look at #108 for the reasoning behind it, specifically the tuning. I don't think you can expect a default that is not scaled to work well across the board for users. More generally I don't think you can expect a good default for this full stop. These parameters must be tuned and the tuning should not be affected by sample size. |
I agree with both. I think So, if we have good evidence that 1) (close to) standardised input is the most common case and 2) the majority of users perceives (potentially) overfitting as a more reasonable fit than (potentially) underfitting, I would argue for lowering the default If we write the solution of logistic regression as |
Thanks, I like this suggestion |
Also agree, why not completely lambda=0 which is vanilla logistic regression? |
We could do this. I just don't like too much the fact that, in the separable case, the solution would have infinite norm, |
Thanks both for the discussion, default set to |
@tlienart This is breaking, no? I think we need a breaking (minor) release not a patch. Or am I missing something? |
|
Thanks @tlienart. I'm making a PR to General to yank 0.6.5 from the registry. |
Hi,
I have just inadvertently upgraded to 0.6 and witnessed massive undesirable changes in the output of my program. I think I could nail it down to
MLJLinearModels
and the new hyperparameterscale_penalty_with_samples
. I don't exactly know what this is doing but from the experiment below it does not seem to be a good default by any mean. Could you provide more information on this hyperparameter and motivate the introduction as a default?The text was updated successfully, but these errors were encountered: