What is the new scale_penalty_with_samples=true doing? #124

olivierlabayle · 2022-08-02T13:52:56Z

Hi,

I have just inadvertently upgraded to 0.6 and witnessed massive undesirable changes in the output of my program. I think I could nail it down to MLJLinearModels and the new hyperparameter scale_penalty_with_samples. I don't exactly know what this is doing but from the experiment below it does not seem to be a good default by any mean. Could you provide more information on this hyperparameter and motivate the introduction as a default?

using MLJLinearModels
using MLJBase
using StableRNGs
using Distributions

rng = StableRNG(123)
# scale_penalty_with_samples=false turns back to the 0.5.7 results
model = LogisticClassifier(fit_intercept=false)
logit(X, θ) = 1 ./ (1 .+ exp.(-X*θ))

N = 1000
P = 3
θ = [-1, 3, -10]
X = rand(rng, N, P)
μ_y = logit(X, θ)
y = zeros(N)
for index in eachindex(μ_y)
    y[index] = rand(rng, Bernoulli(μ_y[index]))
end

mach = machine(model, MLJBase.table(X), categorical(y))
fit!(mach)
fitted_params(mach)
ypred = MLJBase.predict(mach, X)
mean_log_loss = mean(log_loss(ypred, y))

## Output version 0.6.3:

# (classes = CategoricalArrays.CategoricalValue{Float64, UInt32}[0.0, 1.0],
#  coefs = [:x1 => -0.1548456157597273, :x2 => -0.13252104591451244, :x3 => -0.19765403186822833],
#  intercept = nothing,)

# logloss mean: 0.6043495080176984

# predict 5 first
# UnivariateFinite{Multiclass{2}}(0.0=>0.558, 1.0=>0.442)
# UnivariateFinite{Multiclass{2}}(0.0=>0.556, 1.0=>0.444)
# UnivariateFinite{Multiclass{2}}(0.0=>0.566, 1.0=>0.434)
# UnivariateFinite{Multiclass{2}}(0.0=>0.507, 1.0=>0.493)
# UnivariateFinite{Multiclass{2}}(0.0=>0.532, 1.0=>0.468)

## Output version 0.5.7:

# (classes = CategoricalArrays.CategoricalValue{Float64, UInt32}[0.0, 1.0],
#  coefs = [:x1 => -1.2849403317718433, :x2 => 1.978169995235155, :x3 => -6.365523473187024],
#  intercept = nothing,)

# logloss mean: 0.2351991376364725

# predict 5 first
#  UnivariateFinite{Multiclass{2}}(0.0=>0.998, 1.0=>0.00238)
#  UnivariateFinite{Multiclass{2}}(0.0=>0.9, 1.0=>0.1)
#  UnivariateFinite{Multiclass{2}}(0.0=>0.857, 1.0=>0.143)
#  UnivariateFinite{Multiclass{2}}(0.0=>0.642, 1.0=>0.358)
#  UnivariateFinite{Multiclass{2}}(0.0=>0.63, 1.0=>0.37)

The text was updated successfully, but these errors were encountered:

tlienart · 2022-08-02T16:27:20Z

It's a convention on the objective function; the reason is to have the scale of the loss and the penalty be on the same grounds (so that if you have twice as much data, you don't have to change the regularisation) In the case of ridge for instance:

1/n ||y - Xb||^2 + lambda * ||b||^2

then this is equivalent to multiplying by n.

tlienart · 2022-08-02T16:29:56Z

~~PS: wait, I'm confused, you made those changes didn't you? I don't think anyone touched that logic since you did.~~

ah no it's not you it's @jbrea maybe he can chip in if you have further questions.

Note: in any case I think that parameter is best obtained via hyperparameter optimisation.

olivierlabayle · 2022-08-02T17:38:49Z

Thanks for the explanation @tlienart, I think users (like me) will expect that the default behavior of the algorithm, especially as simple as a logistic regression, is to work out of the box and provide a reasonnable fit with the default hyperparameters. This new hyperparameter does seem to mess things up as far as I can see, the output is almost like a random biased coin toss. It would probably make more sense to default as false doesn't it? Moreover this would have been a non breaking change from 0.5.7 if I followed correctly the history.

tlienart · 2022-08-03T04:33:45Z

Please have a look at #108 for the reasoning behind it, specifically the tuning.

I don't think you can expect a default that is not scaled to work well across the board for users. More generally I don't think you can expect a good default for this full stop. These parameters must be tuned and the tuning should not be affected by sample size.

jbrea · 2022-08-03T09:11:51Z

I think users (like me) will expect that the default behavior of the algorithm, especially as simple as a logistic regression, is to work out of the box and provide a reasonnable fit with the default hyperparameters.

I don't think you can expect a default that is not scaled to work well across the board for users. More generally I don't think you can expect a good default for this full stop. These parameters must be tuned and the tuning should not be affected by sample size.

I agree with both. I think scale_penalty_with_samples = true is the better default. However, currently the default lambda = 1 means rather strong regularisation when the input is (close to) standardised, which may be the most common case (see below). In this case, users usually have to change lambda to avoid underfitting. We could also set the default to lambda = 1e-8 (or some other small value), with the argument that it basically doesn't affect the unregularised solution in the non-separable case while still avoiding runaway solutions in the separable case. Users would usually have to deal with overfitting.

So, if we have good evidence that 1) (close to) standardised input is the most common case and 2) the majority of users perceives (potentially) overfitting as a more reasonable fit than (potentially) underfitting, I would argue for lowering the default lambda.

If we write the solution of logistic regression as $\theta = \beta \tilde\theta$, where $|\tilde\theta|_2 = 1$, assume perfect separability and uncorrelated predictors with mean 0 and variance 1, therefore roughly $y_i\tilde\theta'x_i \approx 1$, we can find $\beta$ by minimising $\log(1 + \exp(-\beta)) + \frac{\lambda}2\beta^2$. For $\lambda = 1$ the solution is approximately $\beta \approx 0.8$ and therefore the prediction for the correct class approximately $1/(1 + \exp(-\beta)) \approx 0.7$. This looks heavily regularised for a separable problem. For lambda = 1e-8 the prediction for the correct class would be basically 1.

tlienart · 2022-08-03T10:52:33Z

Thanks, I like this suggestion

olivierlabayle · 2022-08-03T13:52:41Z

Also agree, why not completely lambda=0 which is vanilla logistic regression?

jbrea · 2022-08-04T06:41:32Z

why not completely lambda = 0

We could do this. I just don't like too much the fact that, in the separable case, the solution would have infinite norm, $|\theta|_2 = \infty$, which is never reached by any optimiser, obviously. Therefore I would prefer the default to be at least lambda = eps().

…e, cf #124

tlienart · 2022-08-04T10:04:32Z

Thanks both for the discussion, default set to eps(), patch release under way.

ablaom · 2022-08-04T21:37:46Z

@tlienart This is breaking, no? I think we need a breaking (minor) release not a patch. Or am I missing something?

tlienart · 2022-08-05T04:46:06Z

~~Ok, would you mind doing it? Thanks!~~

Done! 5bb7c6d#commitcomment-80390857

ablaom · 2022-08-09T02:34:26Z

Thanks @tlienart. I'm making a PR to General to yank 0.6.5 from the registry.

tlienart added a commit that referenced this issue Aug 4, 2022

setting default lambda=eps for the logistic regression + patch releas…

1ddd232

…e, cf #124

tlienart closed this as completed Aug 4, 2022

tlienart added a commit that referenced this issue Aug 5, 2022

minor release (default hp change for LogisticRegression) #124

5bb7c6d

ablaom mentioned this issue Aug 9, 2022

Yank version 0.6.5 of MLJLinearModels from the registry JuliaRegistries/General#65911

Merged

jbrea mentioned this issue Nov 14, 2022

Unexpected behaviour with LogisticClassifier #131

Closed

tlienart mentioned this issue Aug 31, 2023

should scale_penalty_with_samples = true be defualt? #149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the new scale_penalty_with_samples=true doing? #124

What is the new scale_penalty_with_samples=true doing? #124

olivierlabayle commented Aug 2, 2022

tlienart commented Aug 2, 2022 •

edited

Loading

tlienart commented Aug 2, 2022 •

edited

Loading

olivierlabayle commented Aug 2, 2022

tlienart commented Aug 3, 2022

jbrea commented Aug 3, 2022

tlienart commented Aug 3, 2022

olivierlabayle commented Aug 3, 2022

jbrea commented Aug 4, 2022

tlienart commented Aug 4, 2022

ablaom commented Aug 4, 2022

tlienart commented Aug 5, 2022 •

edited

Loading

ablaom commented Aug 9, 2022

What is the new scale_penalty_with_samples=true doing? #124

What is the new scale_penalty_with_samples=true doing? #124

Comments

olivierlabayle commented Aug 2, 2022

tlienart commented Aug 2, 2022 • edited Loading

tlienart commented Aug 2, 2022 • edited Loading

olivierlabayle commented Aug 2, 2022

tlienart commented Aug 3, 2022

jbrea commented Aug 3, 2022

tlienart commented Aug 3, 2022

olivierlabayle commented Aug 3, 2022

jbrea commented Aug 4, 2022

tlienart commented Aug 4, 2022

ablaom commented Aug 4, 2022

tlienart commented Aug 5, 2022 • edited Loading

ablaom commented Aug 9, 2022

tlienart commented Aug 2, 2022 •

edited

Loading

tlienart commented Aug 2, 2022 •

edited

Loading

tlienart commented Aug 5, 2022 •

edited

Loading