should scale_penalty_with_samples = true be defualt? #149

alex-s-gardner · 2023-08-31T18:06:37Z

Just sinking my teeth into MLLinearModels and I see that scale_penalty_with_samples = true is the default. Playing around with a number of toy datasets it seems that scale_penalty_with_samples = true does not produce intuitive results while scale_penalty_with_samples = false does, e.g.:

# create data 
t = 1:0.001:10;
y = 10 .+ 10 * sin.(t) .+ 5 * t .+ randn(length(t)) * 2 .+ rand((zeros(round(Int64, length(t) / 5))..., 6, -8, 100, -200, 178, -236, 77, -129, -50, -100, -45, -33, -114, -1010, -1238, -2000), length(t))*.5;
X = hcat(ones(length(t)), sin.(t), t);
scatter(t, y; markerstrokecolor=:match, markerstrokewidth=0, label = "obs", ylim = (0, 70))

# Base LSQ model fit
θ = X \ y;
scatter!(t, X * θ, markerstrokewidth=0, label="Base lsq")

# index X[:,2:end] as fit includes offset by default
θ = fit(LinearRegression(), X[:, 2:end], y);
scatter!(t, hcat(X[:, 2:end], ones(length(t))) * θ, markerstrokewidth=0, label="linear")

θ = fit(HuberRegression(scale_penalty_with_samples=true), X[:, 2:end], y);
scatter!(t, hcat(X[:, 2:end], ones(length(t))) * θ, markerstrokewidth=0, label="huber")

θ = fit(HuberRegression(scale_penalty_with_samples=false), X[:, 2:end], y);
scatter!(t, hcat(X[:, 2:end], ones(length(t))) * θ, markerstrokewidth=0, label="huber-no_scale_penalty")

Given this, should scale_penalty_with_samples = false be made the default or is there a logical reason that it is not?

The text was updated successfully, but these errors were encountered:

tlienart · 2023-08-31T20:40:37Z

I think this stuff tends to depend on what/how you use MLJLM for.

Looping in @jbrea in since he's the one who introduced the change and had a good rationale for it (I think there's an issue where this stuff is discussed I'll try to find it)

Yeah here's a previous discussion: #124

alex-s-gardner · 2023-08-31T21:15:46Z

@tlienart thanks for the link to the previous discussion. To me it seems likely a smaller value for lambda probably should have been set as the default. New users, if moving quickly, will assume that several of the models will give very bad predictions given the defaults. I realize that each use case is different but It seems to me that a lambda would be used in the minority of use cases.

tlienart · 2023-08-31T21:22:02Z

Isn't that what the end of the discussion and the relevant commit did?

In any case if you look at scikit learn for instance, which is one of the reference implementation out there, they use scaling and they use an L2 with a nontrivial lambda by default for logreg for instance.

I personally don't have a strong opinion on this, to me it's a matter of having correct docs and potentially guiding the user in what they should do (eg HP tuning) ; if you feel that the docs were unclear please consider opening a PR for it.

alex-s-gardner · 2023-08-31T21:38:10Z

I think expanding the docs with examples of the different regression options would be a good idea. I'll open a new issue for this with the hopes of supplying a PR soon.

alex-s-gardner closed this as completed Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should scale_penalty_with_samples = true be defualt? #149

should scale_penalty_with_samples = true be defualt? #149

alex-s-gardner commented Aug 31, 2023 •

edited

Loading

tlienart commented Aug 31, 2023 •

edited

Loading

alex-s-gardner commented Aug 31, 2023

tlienart commented Aug 31, 2023 •

edited

Loading

alex-s-gardner commented Aug 31, 2023

should scale_penalty_with_samples = true be defualt? #149

should scale_penalty_with_samples = true be defualt? #149

Comments

alex-s-gardner commented Aug 31, 2023 • edited Loading

tlienart commented Aug 31, 2023 • edited Loading

alex-s-gardner commented Aug 31, 2023

tlienart commented Aug 31, 2023 • edited Loading

alex-s-gardner commented Aug 31, 2023

alex-s-gardner commented Aug 31, 2023 •

edited

Loading

tlienart commented Aug 31, 2023 •

edited

Loading

tlienart commented Aug 31, 2023 •

edited

Loading