-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with documentation review #136
Comments
all of them are deterministic, this repo is purely about finding what people would call MLE or MAP estimator
would work for that model.
hope that helps, happy to review your stab at this |
@tlienart The current doc strings say something like " if It's a bit annoying that the default isn't the default, instead of I also got confused for a while until I realised Ditto |
defaults
Alternative solvers a user can specify
in general the user should not specify these alternatives as they will be inferior to the default (there will be edge cases where this is not true but I don't think these are very relevant for a ML practitioner). solver parameters with their defaults
|
max_cg_steps = min(solver.max_inner, p) |
CG
sugar for Analytical(iterative=true)
Newton
Solves the problem with a full solve of the Hessian
optim_options
can pass aOptim.Options(...)
object for things likef_tol
(tolerance on objective), see general optionsnewton_options
can pass a named tuple with things likelinesearch = ...
(see here)
NewtonCG
Solves the problem with a CG solve of the Hessian.
Same parameters as Newton
except the naming: newtoncg_options
LBFGS
LBFGS solve; optim_options
and lbfgs_options
as per these docs
ProxGrad
A user should not call that constructor, the relevant flavours are ISTA (no accel) and FISTA (with accel); ProxGrad is not used for anything else than L1 penalized problems for now.
accel
whether to use nesterov style accelerationmax_iter
max number of descent iterationstol
tol on the relative change of the parametermax_inner
max number of inner iterationsbeta
shrinkage of the backtracking step
ISTA is ProxGrad for L1 with accel set to false; FISTA same story but with acceleration.
ISTA is not necessarily slower than FISTA but generally FISTA has a better chance of being faster. A non expert user should just use FISTA.
IWLSCG
Iteratively weighted least square with CG solve
max_iter
number of max outer iterations (steps)max_inner
number of steps for the inner solves (conjugate gradient)tol
tolerance on the relative change of the parameterdamping
how much to damp iterations should be between(0, 1]
with1
no dampingthreshold
threshold for the residuals (e.g. for quantile regression)
In general users should not use this. A bit like Newton
, NewtonCG
above, IWLSCG will typically be more expensive, but it's an interesting tool for people who are interested in solvers and provides a sanity check for other methods.
It's a bit annoying that the default isn't the default, instead of nothing if you know what I mean.
If you have a suggestion for a cleanup, maybe open an issue? (I'm actually not sure I know what you mean)
What does "possibly" mean? I'm guessing iteration=false for linear and iteration=true for ridge? Is that right? |
And I suppose we can add: RobustLoss, with L1 + L2 Penalty (RobustRegressor, HuberRegressor) --> LBFGS Yes? |
Looks like you are saying that the default solver for LogisticClassifier and MultinomialClassifier depends on the value of the regularisation parameters (which would explain the |
I appreciate the help but I'm think I must be asking the wrong questions. Here's what I want to do for each model M:
Likely all this information is contained in want you are telling me, but I feel I have to "reverse engineer" the answer. Does this better clarify my needs? |
no, both
RobustLoss + L2 --> LBFGS
As soon as you have a non-smooth penalty such as L1, we cannot use smooth solvers and have to resort to proxgrad. So yes as soon as there's a non-zero coefficient in front of the L1 penalty, a FISTA solver is picked.
isn't what I quoted in my previous answer under Maybe to simplify (I'm aware you have limited bandwidth and that it's not helping to have a long conversation), how about we do this just for Linear+Ridge in a draft PR, we get to a satisfactory point and then we progress from there? MLJ constructors:
for both the |
I'm considering having a stab at some of #135 but could do with some help.
This appears in this doc page.
MLJ
model types are not listed. Would be good to have this, to save some detective work (and the user certainly wants this anyway). To make it easier, I'm copying the lists below:yᵢ∈{±1}
yᵢ∈{±1}
yᵢ∈{1,...,c}
yᵢ∈{1,...,c}
@tlienart @jbrea
The text was updated successfully, but these errors were encountered: