Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Manifolds #448

Open
7 tasks
antoine-levitt opened this issue Jul 23, 2017 · 18 comments
Open
7 tasks

Improvements to Manifolds #448

antoine-levitt opened this issue Jul 23, 2017 · 18 comments

Comments

@antoine-levitt
Copy link
Contributor

See #435

  • Second order algorithms
    Never used it, so I'm not the best person to do this
  • Compare performance with other libraries
    Competitors include ROPTLIB, ManOpt, https://github.com/NickMcNutt/ManifoldOptim.jl (abandoned)
  • Vector transport
    The proper way to implement algorithms like CG and BFGS is to use vector transport to transport the information at one point to another. Right now this is done with projections, which might not be the most efficient
  • More manifolds and variants of existing manifolds (e.g. different retractions)
    See e.g. the list in http://www.math.fsu.edu/~whuang2/Indices/index_ROPTLIB.html
    Also {x, Ax = b}, or intersection manifold (just do the projection on both manifolds iteratively and hope it converges)
  • Optimize number of retractions and projections
    I have been pretty liberal with the use of retractions and projections in the optimizers, maybe some of them are unnecessary
  • A better way to do product manifolds?
    Right now, the two components are stored in a flat 1D array, which might be suboptimal
  • Arbitrary inner product
    The Sphere and Stiefel manifolds could take a more general inner product
@simonbyrne
Copy link

Any chance of support for positive semi-definite matrices?

@pkofod
Copy link
Member

pkofod commented Sep 25, 2017

Any chance of support for positive semi-definite matrices?

According to the docs Manopt has this, but their code is GPL-3 licensed, so I think it's best if we avoid links to their code base in this thread, as it will probably exclude the reader from contributing.

@antoine-levitt
Copy link
Contributor Author

Coincidentally, @whuang08 pointed to me just yesterday that a retraction for the manifold of SPD matrices is R_eta(X) = X + eta + 0.5 eta X^{-1} eta, which breaks the current API (which only has a retract(x) function whereas this would need a retract(x,eta)). This should not be too bad to adapt though. That retraction seems fishy to me because it's singular at zero eigenvalues, but apparently it's useful. I don't have any experience in SPD matrices, do you have a use case/paper/reference implementation?

@antoine-levitt
Copy link
Contributor Author

Ah, good point about the licence, will avoid looking at ROPTLIB also then. It is however described in papers.

@simonbyrne
Copy link

It probably doesn't answer your question directly (it's been a while since I've worked with manifolds) but the easiest option is usually parametrise SPD matrices via a Cholesky factor which is constrained to have non-negative diagonals.

@antoine-levitt
Copy link
Contributor Author

That looks substantially different from using a manifold type algorithm, and is probably best treated by a solver with inequality constrains (JuMP & friends).

@simonbyrne
Copy link

Ah, okay. Thanks.

@simonbyrne
Copy link

Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.

@whuang08
Copy link

whuang08 commented Sep 25, 2017 via email

@antoine-levitt
Copy link
Contributor Author

I guess it depends on whether you expect your minimum to hit the boundary (matrices with zero eigenvalues) or not. If you do then I would imagine it's hard to bypass an inequality constrained solver. If you don't then I guess manifold algorithms are efficient.

@antoine-levitt
Copy link
Contributor Author

Just putting this here for reference: I benchmarked finding the first few eigenvalues of a large random matrix in Optim and in manopt. The results is that the implementations are comparable and result in about the same number of iterations/matvec. Code for manopt:

% Generate random problem data.
% rng(0);
n = 1000;
m = 10;
A = randn(n);
A = .5*(A+A.');
x0 = randn(n,m);
 
% Create the problem structure.
manifold = stiefelfactory(n,m);
problem.M = manifold;
 
% Define the problem cost function and its Euclidean gradient.
problem.cost  = @(x) -trace(x'*(A*x));
problem.egrad = @(x) -2*A*x;

% Solve.
[x, xcost, info, options] = rlbfgs(problem, x0);
 
% Display some statistics.
figure;
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');

and Optim:

using Optim

# srand(0)
const n = 1000
const m = 10
M = randn(n,n)
const A = (M+M')/2
f(x) = -vecdot(x,A*x)
g(x) = -2*A*x
g!(stor,x) = copy!(stor,g(x))
x0 = randn(n,m)

manif = Optim.Stiefel()
Optim.optimize(f, g!, x0, Optim.LBFGS(m=30,manifold=manif, linesearch=Optim.HagerZhang()), Optim.Options(show_trace=true, allow_f_increases=true,g_tol=1e-6))

@pkofod
Copy link
Member

pkofod commented Jun 19, 2018

Cool, good to have some examples like this.

I tried swapping out HagerZhang for MoreThuente. The former gives

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.15317647539137902,0.42376052107832024, ...]
 * Minimizer: [-0.04295817425645339,-0.018684703787118326, ...]
 * Minimum: -4.311236e+02
 * Iterations: 136
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 6.80e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = -1.45e-14 |f(x)|
   * |g(x)| ≤ 1.0e-06: true 
     |g(x)| = 9.52e-07 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 410
 * Gradient Calls: 411

while MoreThuente gives

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.05351779586314921,-0.7229077835364466, ...]
 * Minimizer: [-0.04590592560787643,-0.03998550452894138, ...]
 * Minimum: -4.301612e+02
 * Iterations: 137
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 6.96e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = -2.51e-14 |f(x)|
   * |g(x)| ≤ 1.0e-06: true 
     |g(x)| = 8.44e-07 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 147
 * Gradient Calls: 148

Interesting to see the significant reduction in objective calls here. I know this is know - HZ needs to be conservative, but if MT is suited for the example at hand, there's a pretty neat gain to find in terms of calls to f(x)/g(x).

(using A_mul_B! in g! can also decrease runtime by 25%, but that obviously has nothing to do with how well the optimizer works :))

@anriseth
Copy link
Contributor

I tried swapping out HagerZhang for MoreThuente.

Hmm, and BackTracking stagnates

 * Algorithm: L-BFGS
 * Starting Point: [0.5494801640402754,1.2759422464582777, ...]
 * Minimizer: [-0.003418669419144482,0.00966606972837302, ...]
 * Minimum: -3.174551e+02
 * Iterations: 8
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 1.05e-09 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e-06: false 
     |g(x)| = 8.45e+00 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 52
 * Gradient Calls: 10

@antoine-levitt
Copy link
Contributor Author

@anriseth yes, I reported this on gitter. This is strange, because I used backtracking on similar problems before and it worked very well. I reduced it further here: #626

@pkofod yes, LBFGS usually produces pretty good initial steps, so HZ is counterproductive in requiring more evals per iteration. In my previous tests backtracking was a much better default choice; @anriseth suggested to make it the default for LBFGS and I agree: it should really only be the default for CG, where it is more important to have a good stepsize.

@jagot
Copy link
Contributor

jagot commented Sep 19, 2019

I am particularly interested in the last two items in the list. For the custom metric, I implemented the Löwdin transform that symmetrically orthogonalizes a non-orthogonal basis, such that the SVD:s &c that assumes an orthogonal basis works. The results are then transformed back to the non-orthogonal basis.

Please see my implementation here:
https://github.com/JuliaAtoms/SCF.jl/blob/b25be8df29efb21ce573a4b9041709ee78573353/src/manifolds.jl
I am well aware that it is not particularly efficient and there may be a mathematically more beautiful approach, but it works (TM).

Would there be any interest in including these extended manifolds into Optim.jl? Any improvement suggestions?

@antoine-levitt
Copy link
Contributor Author

For sure, please do put it in there. The longer term plan is to split those things off in Manifolds.jl,see JuliaManifolds/Manifolds.jl#35, but right now it can live here.

I think you can avoid doing the lowdin transform explicitly, and just eigendecompose overlap matrices. Will take a closer look when I'm at a computer

@jagot
Copy link
Contributor

jagot commented Sep 19, 2019

Well, I implemented the Löwdin transform by eigendecomposing the overlap matrix (when it is not simply an UniformScaling) and caching the forward and backward transforms. I was more thinking if one could implement something via SVD, but it will anyway end up being dense transform matrices (with the exception for diagonal overlap matrices, of course).

@antoine-levitt
Copy link
Contributor Author

I mean computing the overlap as X'SX, factoring that, and using that transform to modify X. That's more efficient when the first dimension of X is large, esp. when S can be applied in a matrix-free way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants