Update optimisation #23

Marius1311 · 2021-03-12T10:41:20Z

@msmdev, how difficult would it be to speed up our optimisation by using derivative information? Is there a prototype implementation somewhere?

msmdev · 2021-03-12T11:03:38Z

@Marius1311, there is: I used a special Gauss-Newton solver NLSCON in my MATLAB implementation of GPCCA.
But: it is tricky and would need a lot of testing to be sure that it works in all relevant cases.
Currently, I don't have spare time to invest in this, but you can take a look here yourself:

calls NLSCON: main_nlscon.m
NLSCON solver: nlscon.m
optimization routine with option to use NLSCON via main_nlscon.m: opt_soft.m

Marius1311 · 2021-03-12T12:34:29Z

Nice! What sort of speedup did you experience with this in Matlab compared to derivative-free? I would like to know this to figure out whether it's worth the effort.

msmdev · 2021-03-12T12:40:57Z

I only used this to enable clustering into hundreds of macrostates, since Nelder-Mead scales very bad with increasing n: you simply can't get Nelder-Mead to converge in any reasonable time for say n=100.
I didn't test for speed differences for small cluster numbers n.

msmdev · 2021-03-12T12:50:46Z

But beware: Gauss-Newton actually never converged in my experiments, since the routine we are using is non-differentiable. It nevertheless ended somewhere near a local minimum - this at least sufficed to get a nearly optimal solution and identify the optimal number of clusters out of a large interval of potential n. If you really want the optimal (local) solution, you would still need to run Nelder-Mead for the found "optimal" n.
In my case the benefit from a final Nelder-Mead optimization was small, so I stayed with the Gauss-Newton solution, but I would assume that this could be very dependent on the actual system you are clustering.

Marius1311 · 2021-03-15T09:25:05Z

Okay - why wouldn't the Gauss-Newton solver converge? If the routine you're using is non-differentiable, then how did you get derivatives for Gauss-Newton? Isn't the gradient given in (Roblitz and Weber, 2013) on page 163?

Also, how come there are local optima here? Section 4.3.2 "Minimizing the objective function" in (Roblitz and Weber, 2013) starts with "The objective function (16) is convex. Therefore, the optimum is attained in a vertex of the feasible set FA′ (Weber 2006, Lemma 3.7)".

The arguments you outlined above are also given in point 3 of Section 4.3.2, but I don't really understand them, I must admit.

msmdev · 2021-03-15T09:35:26Z

Hi @Marius1311
I see your problems, but I can guarantee you that you will understand it as soon as you know all the details right :)

Okay - why wouldn't the Gauss-Newton solver converge? If the routine you're using is non-differentiable, then how did you get derivatives for Gauss-Newton? Isn't the gradient given in (Roblitz and Weber, 2013) on page 163?

Wow, what a HUGE formula. XD
It is a little more complex here: you can differentiate the objective function, but since you impose constraints after every step, the whole routine isn't. So GN works "stepwise", but doesn't need to converge (it is luck, if it does), but still you get stepwise closer to a local minimum.

Also, how come there are local optima here? Section 4.3.2 "Minimizing the objective function" in (Roblitz and Weber, 2013) starts with "The objective function (16) is convex. Therefore, the optimum is attained in a vertex of the feasible set FA′ (Weber 2006, Lemma 3.7)".

The important thing to understand here is that the objective is convex on the feasible set and NOT on the real numbers as whole. So, if you perform unconstrained optimization on the real numbers, you can (and most likely will) have local minima.

The arguments you outlined above are also given in point 3 of Section 4.3.2, but I don't really understand them, I must admit.

Does the above help to get it?

msmdev · 2021-03-15T09:51:27Z

@Marius1311, still: damped GN optimization of the objective using the NLSCON is reasonable and very useful. Although it won't "really converge" and might get trapped in local minima - Nelder-Mead can also get trapped in those and it takes ages to converge for large n. Using GN is just tricky and you need to adapt the hyperparameters of NLSCON accordingly. Thus, one needs to do some research on different systems to get a idea how to set them accordingly. I just did this for one system and thus I wouldn't trust those setting generally.

Marius1311 · 2021-03-15T12:55:29Z

ok, thanks @msmdev, that helps!

Marius1311 · 2021-03-31T13:29:00Z

Just found this toolbox for python optimisation: https://pypesto.readthedocs.io/en/latest/api_optimize.html

@msmdev, could this be useful for us?

Marius1311 assigned msmdev Mar 12, 2021

msmdev added the enhancement New feature or request label Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update optimisation #23

Update optimisation #23

Marius1311 commented Mar 12, 2021

msmdev commented Mar 12, 2021 •

edited

Loading

Marius1311 commented Mar 12, 2021

msmdev commented Mar 12, 2021

msmdev commented Mar 12, 2021

Marius1311 commented Mar 15, 2021

msmdev commented Mar 15, 2021 •

edited

Loading

msmdev commented Mar 15, 2021

Marius1311 commented Mar 15, 2021

Marius1311 commented Mar 31, 2021

Update optimisation #23

Update optimisation #23

Comments

Marius1311 commented Mar 12, 2021

msmdev commented Mar 12, 2021 • edited Loading

Marius1311 commented Mar 12, 2021

msmdev commented Mar 12, 2021

msmdev commented Mar 12, 2021

Marius1311 commented Mar 15, 2021

msmdev commented Mar 15, 2021 • edited Loading

msmdev commented Mar 15, 2021

Marius1311 commented Mar 15, 2021

Marius1311 commented Mar 31, 2021

msmdev commented Mar 12, 2021 •

edited

Loading

msmdev commented Mar 15, 2021 •

edited

Loading