Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with documentation review #136

Closed
ablaom opened this issue Jan 20, 2023 · 9 comments
Closed

Help with documentation review #136

ablaom opened this issue Jan 20, 2023 · 9 comments

Comments

@ablaom
Copy link
Member

ablaom commented Jan 20, 2023

I'm considering having a stab at some of #135 but could do with some help.

  1. What does "EN" mean here?

Screen Shot 2023-01-20 at 3 50 17 PM

This appears in this doc page.

  1. The same doc page gives nice tables of model algorithms but the corresponding MLJ model types are not listed. Would be good to have this, to save some detective work (and the user certainly wants this anyway). To make it easier, I'm copying the lists below:
Regressors Formulation¹ Available solvers Comments Model type
OLS & Ridge L2Loss + 0/L2 Analytical² or CG³ ?
Lasso & Elastic-Net L2Loss + 0/L2 + L1 (F)ISTA⁴ ?
Robust 0/L2 RobustLoss⁵ + 0/L2 Newton, NewtonCG, LBFGS, IWLS-CG⁶ no scale⁷ ?
Robust L1/EN RobustLoss + 0/L2 + L1 (F)ISTA ?
Quantile⁸ + 0/L2 RobustLoss + 0/L2 LBFGS, IWLS-CG ?
Quantile L1/EN RobustLoss + 0/L2 + L1 (F)ISTA ?
Classifiers Formulation Available solvers Comments Model type
Logistic 0/L2 LogisticLoss + 0/L2 Newton, Newton-CG, LBFGS yᵢ∈{±1} ?
Logistic L1/EN LogisticLoss + 0/L2 + L1 (F)ISTA yᵢ∈{±1} ?
Multinomial 0/L2 MultinomialLoss + 0/L2 Newton-CG, LBFGS yᵢ∈{1,...,c} ?
Multinomial L1/EN MultinomialLoss + 0/L2 + L1 ISTA, FISTA yᵢ∈{1,...,c} ?
  1. Also, could we have a mapping from human name of solver (appearing in table) to Julia object to set as value in the model struct?

@tlienart @jbrea

@tlienart
Copy link
Collaborator

  1. EN is Elastic Net, sum of L1 and L2 penalty
  2. Assuming you want input output:
  3. all of the regressors are continuous -> continuous\
  4. logistic classifiers are continuous -> binary
  5. multinomial classifiers are continuous -> multiclass

all of them are deterministic, this repo is purely about finding what people would call MLE or MAP estimator

  1. I think what you want is to know how to set the solver field; a user could (though usually won't) indicate one of the relevant solver defined here: https://github.com/JuliaAI/MLJLinearModels.jl/blob/dev/src/fit/solvers.jl for the appropriate model. So for instance if the column says Analytical or CG then
solver = CG(...)
solver = Analytical(...)

would work for that model.

  • Analytical -> Analytical(...) (Analytical formula or iterative krylov-style solve that would be very close to analytical)
  • CG -> CG(...) (Conjugate gradient)\
  • ISTA -> ISTA(...) (iterative soft thresholding = proximal descent for L1)
  • FISTA -> FISTA(...) (Fast iterative soft thresholding, = same but with nesterov style acceleration)
  • Newton -> Newton(...) (newton method with full hessian solve)
  • NewtonCG -> NewtonCG(...) (same stuff but solving the hessian with CG)
  • LBFGS -> LBFGS(...) wrapper around Optim.LBFGS
  • IWLS-CG -> IWLSCG(...) iterative reweighted least sqaure with CG solve

hope that helps, happy to review your stab at this

@ablaom
Copy link
Member Author

ablaom commented Jan 26, 2023

@tlienart The current doc strings say something like " if solver=nothing then the default will be used" but don't say what that default is, for each model. Can I get this without digging into the code? Is it always the first one in this table with ISTA the default where it says "(F)ISTA"?

It's a bit annoying that the default isn't the default, instead of nothing if you know what I mean.

I also got confused for a while until I realised ISTA and FISTA were aliases for slow/fast ProxGrad. I was looking for ages for docstrings for ISTA and FISTA but they don't exist. Probably there are other dummies like me who didn't guess this straight away. I will try to address this in my documentation PR.

Ditto CG (alias for Analaytical(iterative=true)).

@tlienart
Copy link
Collaborator

tlienart commented Jan 26, 2023

defaults

  • L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical() (matrix solve, possibly using an iterative solver)
  • LogisticLoss, L2Penalty (logistic regression) --> LBFGS()
  • MultinomialLoss, L2Penalty (multinomial regression) --> LBFGS()
  • SmoothLoss L2+L1 Penalty (lasso, elasticnet, logistic+multinomial with elastic net) --> FISTA()
  • RobustLoss, L2Penalty (quantile regression, ...) --> LBFGS()
  • Other --> error

Alternative solvers a user can specify

  • LogisticLoss/RobustLoss + L2Penalty --> Newton, NewtonCG
  • MultinomialLoss + L2Penalty --> NewtonCG
  • RobustLoss + L2Penalty --> IWLSCG
  • SmoothLoss + mix L2/L1 --> ISTA

in general the user should not specify these alternatives as they will be inferior to the default (there will be edge cases where this is not true but I don't think these are very relevant for a ML practitioner).

solver parameters with their defaults

Analytical

  • iterative::Bool=false whether to use a cholesky solve or a conjugate gradient (CG) solve
  • max_inner::Int=200 default number of inner iterations for an iterative solve; will be clamped with the dimension of the problem, i.e. the effective max number of iteration is min(max_inner, p) (
    max_cg_steps = min(solver.max_inner, p)
    )
  • CG sugar for Analytical(iterative=true)

Newton

Solves the problem with a full solve of the Hessian

  • optim_options can pass a Optim.Options(...) object for things like f_tol (tolerance on objective), see general options
  • newton_options can pass a named tuple with things like linesearch = ... (see here)

NewtonCG

Solves the problem with a CG solve of the Hessian.

Same parameters as Newton except the naming: newtoncg_options

LBFGS

LBFGS solve; optim_options and lbfgs_options as per these docs

ProxGrad

A user should not call that constructor, the relevant flavours are ISTA (no accel) and FISTA (with accel); ProxGrad is not used for anything else than L1 penalized problems for now.

  • accel whether to use nesterov style acceleration
  • max_iter max number of descent iterations
  • tol tol on the relative change of the parameter
  • max_inner max number of inner iterations
  • beta shrinkage of the backtracking step

ISTA is ProxGrad for L1 with accel set to false; FISTA same story but with acceleration.

ISTA is not necessarily slower than FISTA but generally FISTA has a better chance of being faster. A non expert user should just use FISTA.

IWLSCG

Iteratively weighted least square with CG solve

  • max_iter number of max outer iterations (steps)
  • max_inner number of steps for the inner solves (conjugate gradient)
  • tol tolerance on the relative change of the parameter
  • damping how much to damp iterations should be between (0, 1] with 1 no damping
  • threshold threshold for the residuals (e.g. for quantile regression)

In general users should not use this. A bit like Newton, NewtonCG above, IWLSCG will typically be more expensive, but it's an interesting tool for people who are interested in solvers and provides a sanity check for other methods.

It's a bit annoying that the default isn't the default, instead of nothing if you know what I mean.

If you have a suggestion for a cleanup, maybe open an issue? (I'm actually not sure I know what you mean)

@ablaom
Copy link
Member Author

ablaom commented Jan 26, 2023

L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical (matrix solve, possibly using an iterative solver)

What does "possibly" mean? I'm guessing iteration=false for linear and iteration=true for ridge? Is that right?

@ablaom
Copy link
Member Author

ablaom commented Jan 26, 2023

And I suppose we can add:

RobustLoss, with L1 + L2 Penalty (RobustRegressor, HuberRegressor) --> LBFGS

Yes?

@ablaom
Copy link
Member Author

ablaom commented Jan 26, 2023

L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical (matrix solve, possibly using an iterative solver)
SmoothLoss L2+L1 Penalty (lasso, elasticnet, logistic+multinomial with elastic net) --> FISTA

Looks like you are saying that the default solver for LogisticClassifier and MultinomialClassifier depends on the value of the regularisation parameters (which would explain the nothing solver default). Is the default only Analytical(...) if L1 penalty is zero, and FISTA otherwise? But now I'm confused because (F)ISTA aren't listed as possible solvers for those models in the current docs.

@ablaom
Copy link
Member Author

ablaom commented Jan 27, 2023

I appreciate the help but I'm think I must be asking the wrong questions. Here's what I want to do for each model M:

  • state clearly in docs what values the field solver may take on, eg, "any instance of LBFGS, ProxGrad".
  • state clearly what the default value is; if this is "dynamic", ie depends on values of other parameters, then I want a concise statement of the logic needed to determine what solver will be chosen.

Likely all this information is contained in want you are telling me, but I feel I have to "reverse engineer" the answer.

Does this better clarify my needs?

@tlienart
Copy link
Collaborator

L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical (matrix solve, possibly using an iterative solver)

What does "possibly" mean? I'm guessing iteration=false for linear and iteration=true for ridge? Is that right?

no, both iteration=true/false can be used for either Linear or Ridge. In both cases you just have to solve a positive definite linear system of the form $Mx = b$ (just in Ridge it's perturbed by the identity to shift the spectrum away from zero); to solve such a system you can either do a full solve $M\b$ (using cholsolve) or you can use an iterative method such as conjugate gradient or krylov or whatever. The latter (iterative) can be good when the dimensionality of the problem is large.
In general though, users should just use iterative=false, the full backsolve will work very well most of the time.

RobustLoss, with L1 + L2 Penalty (RobustRegressor, HuberRegressor) --> LBFGS

RobustLoss + L2 --> LBFGS
RobustLoss + L2 + L1 --> FISTA

Looks like you are saying that the default solver for LogisticClassifier and MultinomialClassifier depends on the value of the regularisation parameters (which would explain the nothing solver default)

As soon as you have a non-smooth penalty such as L1, we cannot use smooth solvers and have to resort to proxgrad. So yes as soon as there's a non-zero coefficient in front of the L1 penalty, a FISTA solver is picked.

But now I'm confused because (F)ISTA aren't listed as possible solvers for those models in the current docs.

Screenshot 2023-01-27 at 08 47 07


state clearly in docs what values the field solver may take on, eg, "any instance of LBFGS, ProxGrad".
state clearly what the default value is; if this is "dynamic", ie depends on values of other parameters, then I want a concise statement of the logic needed to determine what solver will be chosen.

isn't what I quoted in my previous answer under defaults what you wanted?

Maybe to simplify (I'm aware you have limited bandwidth and that it's not helping to have a long conversation), how about we do this just for Linear+Ridge in a draft PR, we get to a satisfactory point and then we progress from there?

MLJ constructors:

  • LinearRegressor, RidgeRegressor

for both the solver can be specified to be Analytical(...). The default is Analytical(). Difference with default is if the user passes ;iterative=true in which case they may also specify max_inner

@ablaom
Copy link
Member Author

ablaom commented Feb 1, 2023

@tlienart Thanks for the additional help and your patience. #138 is now ready for your review.

@ablaom ablaom closed this as completed Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants