Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update optimisation #23

Open
Marius1311 opened this issue Mar 12, 2021 · 9 comments
Open

Update optimisation #23

Marius1311 opened this issue Mar 12, 2021 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@Marius1311
Copy link
Collaborator

@msmdev, how difficult would it be to speed up our optimisation by using derivative information? Is there a prototype implementation somewhere?

@msmdev
Copy link
Owner

msmdev commented Mar 12, 2021

@Marius1311, there is: I used a special Gauss-Newton solver NLSCON in my MATLAB implementation of GPCCA.
But: it is tricky and would need a lot of testing to be sure that it works in all relevant cases.
Currently, I don't have spare time to invest in this, but you can take a look here yourself:

@Marius1311
Copy link
Collaborator Author

Nice! What sort of speedup did you experience with this in Matlab compared to derivative-free? I would like to know this to figure out whether it's worth the effort.

@msmdev
Copy link
Owner

msmdev commented Mar 12, 2021

I only used this to enable clustering into hundreds of macrostates, since Nelder-Mead scales very bad with increasing n: you simply can't get Nelder-Mead to converge in any reasonable time for say n=100.
I didn't test for speed differences for small cluster numbers n.

@msmdev
Copy link
Owner

msmdev commented Mar 12, 2021

But beware: Gauss-Newton actually never converged in my experiments, since the routine we are using is non-differentiable. It nevertheless ended somewhere near a local minimum - this at least sufficed to get a nearly optimal solution and identify the optimal number of clusters out of a large interval of potential n. If you really want the optimal (local) solution, you would still need to run Nelder-Mead for the found "optimal" n.
In my case the benefit from a final Nelder-Mead optimization was small, so I stayed with the Gauss-Newton solution, but I would assume that this could be very dependent on the actual system you are clustering.

@Marius1311
Copy link
Collaborator Author

Okay - why wouldn't the Gauss-Newton solver converge? If the routine you're using is non-differentiable, then how did you get derivatives for Gauss-Newton? Isn't the gradient given in (Roblitz and Weber, 2013) on page 163?
image

Also, how come there are local optima here? Section 4.3.2 "Minimizing the objective function" in (Roblitz and Weber, 2013) starts with "The objective function (16) is convex. Therefore, the optimum is attained in a vertex of the feasible set FA′ (Weber 2006, Lemma 3.7)".

The arguments you outlined above are also given in point 3 of Section 4.3.2, but I don't really understand them, I must admit.

@msmdev
Copy link
Owner

msmdev commented Mar 15, 2021

Hi @Marius1311
I see your problems, but I can guarantee you that you will understand it as soon as you know all the details right :)

Okay - why wouldn't the Gauss-Newton solver converge? If the routine you're using is non-differentiable, then how did you get derivatives for Gauss-Newton? Isn't the gradient given in (Roblitz and Weber, 2013) on page 163?
image

Wow, what a HUGE formula. XD
It is a little more complex here: you can differentiate the objective function, but since you impose constraints after every step, the whole routine isn't. So GN works "stepwise", but doesn't need to converge (it is luck, if it does), but still you get stepwise closer to a local minimum.

Also, how come there are local optima here? Section 4.3.2 "Minimizing the objective function" in (Roblitz and Weber, 2013) starts with "The objective function (16) is convex. Therefore, the optimum is attained in a vertex of the feasible set FA′ (Weber 2006, Lemma 3.7)".

The important thing to understand here is that the objective is convex on the feasible set and NOT on the real numbers as whole. So, if you perform unconstrained optimization on the real numbers, you can (and most likely will) have local minima.

The arguments you outlined above are also given in point 3 of Section 4.3.2, but I don't really understand them, I must admit.

Does the above help to get it?

@msmdev
Copy link
Owner

msmdev commented Mar 15, 2021

@Marius1311, still: damped GN optimization of the objective using the NLSCON is reasonable and very useful. Although it won't "really converge" and might get trapped in local minima - Nelder-Mead can also get trapped in those and it takes ages to converge for large n. Using GN is just tricky and you need to adapt the hyperparameters of NLSCON accordingly. Thus, one needs to do some research on different systems to get a idea how to set them accordingly. I just did this for one system and thus I wouldn't trust those setting generally.

@Marius1311
Copy link
Collaborator Author

ok, thanks @msmdev, that helps!

@Marius1311
Copy link
Collaborator Author

Just found this toolbox for python optimisation: https://pypesto.readthedocs.io/en/latest/api_optimize.html

@msmdev, could this be useful for us?

@msmdev msmdev added the enhancement New feature or request label Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants