Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optim.retract differs from Manifolds.retract, maybe? #920

Open
dehann opened this issue May 13, 2021 · 40 comments
Open

Optim.retract differs from Manifolds.retract, maybe? #920

dehann opened this issue May 13, 2021 · 40 comments

Comments

@dehann
Copy link

dehann commented May 13, 2021

Manifolds and Optim walk into a bar asking for Spheres. Manifolds says the norm is to project!
https://github.com/JuliaManifolds/Manifolds.jl/blob/4c6cb43b9fce3ca4b506a00a6783aed9fc06a10f/src/manifolds/Sphere.jl#L388

Optim, goes whoa dude, retract!

retract!(S::Sphere, x) = (x ./= norm(x))

thunk....


We (@Affie and I) are trying to figure out how to combine Optim with Manifolds, and after many hours we are starting to think that Optim is actually always projecting, however, we now think that Optim perhaps needs to change to a more general retract about point p given vector x. The most difficult thing for us right now is understanding what everyone's terminology is since all references of retract always keep saying "projecting".

Should Optim.retract! calls not rather always accept a manifold point p at which the tangent space is taken? See for example ManifoldsBase.retract(M,p,X), where I understand Manifold type M, with point p in M, and a vector X in the tangent space of M at p (e.g. X is a Lie algebra element).

Manopt.jl, also states retr at point x_k of vector s*delta.

PS, see in-place ManifoldsBase.retract!(M,q,p,X)


Optim docs add more confusion, saying:

Implementing new manifolds is as simple as adding methods project_tangent!(M::YourManifold,x) and retract!(M::YourManifold,g,x).

Yet the Optim.jl functions in code have the opposite parameter signatures:

# Manifold interface: every manifold (subtype of Manifold) defines the functions
# project_tangent!(m, g, x): project g on the tangent space to m at x
# retract!(m, x): map x back to a point on the manifold m

not sure how to read g (gradient / tangent?) and x (ambient vector if is an embedding?).


Also:

cc @kellertuer

@dehann
Copy link
Author

dehann commented May 13, 2021

also cc @david-m-rosen

@kellertuer
Copy link

Wow, you tackle all the points :)

The retraction topic might be a little difficult, since there especially is not the retraction, but you can phrase many operations on manifolds to be retractions.

Let's start in the beginning: What is a retraction? See for example http://sma.epfl.ch/~nboumal/book/index.html Definition 3.40: A retraction as a map from the tangent bundle to the manifold such that the restriction to one tangent space retr_p is p for the zero tangent vector and its differential at zero is the identity. This can be seen as a first ordre approximation to the exponential map.

The idea is that the exponential map might be too expensive. And on the sphere this is exactly what one does with projection: Walk into the embedding (p + X) and project back.

So first in Manifolds.jl – we implement retract(M,p,X,type) (and the inlace retract!(M,q,p,x,type)) similar to exp (and exp! just that the last parameter additionally specifies which retraction you want to use.
And you see the retraction by projection on the Sphere for example here https://github.com/JuliaManifolds/Manifolds.jl/blob/4c6cb43b9fce3ca4b506a00a6783aed9fc06a10f/src/manifolds/Sphere.jl#L423.

So – what does Optim do? The anyways have always a + in their algorithms, so the surround this by manifold-ideas (not completely mathematically rigorous, but + and then projection – keeps you on the manifold, if there is a projection available).
Since they have the + anyways they – with a slight misuse of notation - have retract to just project (again: plus is already done in the algorithms themselves) – so yes they accept arbitrary points in the embedding, but the idea is they come form some operation p+X.
This does not work for all manifolds, but if you want to do Manifolds and Optimization more rigorous, consider Manopt.jl

For CG in Manopt.jl, keep in mind that

  • x is a point on the manifold
  • s is a stepsize (scalar)
  • δ is some descent direction.
    and you can again also do exp(M, x, s*δ ) (there is actually the ExponentialRetraction to do exactly this=.

Summary (i.e. the tl;dr)

As phrased here JuliaManifolds/Manifolds.jl#35 (comment) „Manopt.jl is there for more serious manifoldization.“ – with all my bias (as developer of Manopt.jl) and phrased carefully: If you have implemented something using Optim and want to shortly check a Manifold idea (with one of their Manifolds) – do it. If you want to do the serious optimization on Manifolds (similar to Manopt in Matlab or pymantop in Python) – and be able to use any manifold from here – please use Manopt.jl. If you are missing an algorithm, let me know.

@antoine-levitt
Copy link
Contributor

Great to see this taken up! I wanted to do it at some point but was waiting on @pkofod 's fabled rewrite :-)

Should Optim.retract! calls not rather always accept a manifold point p at which the tangent space is taken?

I made this choice because it was simpler to modify the existing code in this way. Optim's retract (x+d) is manifolds' retract(x, d), and Optim's retract(x) is manifold's project(x). This is motivated by the fact that optimization algorithms are not very sensitive to the choice of retraction, and that optim only deals with manifolds as constraints (ie embedded manifolds). As far as I know there's no standard terminology on these points.

Implementing new manifolds is as simple as adding methods project_tangent!(M::YourManifold,x) and retract!(M::YourManifold,g,x).

That's a bug, my bad! Should be the opposite signature of course.

@kellertuer
Copy link

This is motivated by the fact that optimization algorithms are not very sensitive to the choice of retraction

Uh, they are. I just spent 2 hours today with a student with some numerical instabilities, until we exchanged the retraction with one a little less good in approximation (of exp) but far better in stability and the algorithm ran smoothly (here will be a Trust Region with Approximate Hessian-SR1-Updates soon-ish).

@antoine-levitt
Copy link
Contributor

Do you mean that you have two mathematically well-defined retractions and one performs repeatedly and noticeably better than the other? If so I'd be interested in that. Are you sure you don't mean that one implementation of a retraction was numerically unstable and you switched to a numerically more stable one?

@kellertuer
Copy link

I actually mean two retractions, both well-defined and such. So for example on the Sphere, you have

  • exp (using sine and cosine following great arcs) – yes that is a retraction
  • summation and projection

For large step things (or if you are fine stopping early) bot are equally fine, but below steps of size 1e-7 exp is too unstable so one should use ProjectionRetraction.

The same holds even more if you take into account vector transports and their interplay with retractions (i.e. preferably take a vector transport by differentiated retraction of the same retraction for the best experience).

@antoine-levitt
Copy link
Contributor

That's not contradictory with what I said: mathematically (in infinite precision) both are equally fine, but of course you have to take care to implement them properly (stably). I'm pretty sure you can implement the exponential retraction stably - you probably need to use tricks like trigonometric identities to transform cos(x)-1 to be able to compute it stably for x small

@antoine-levitt
Copy link
Contributor

Ie: you have to distinguish the mathematical function and the algorithm implementing it. For me "retraction" only refers to the former

@kellertuer
Copy link

Mathematically you should always use the exponential map, since the retraction only approximates (up to first order) the correct way to “step in direction X” on a manifold. Retractions (as far as I am aware introduced in the Abils, Mahony, Sepulchre book) are only introduced to do the numerical algorithms and implementations, for example when there is no closed form known of the solution to the ODE defining the exponential map.

So mathematically with retractions you introduce an additional error (comparable to not waling along straight lines). For sure since those are first order approximations for “small steps” you are “fine enough”.

Concerning the concrete example: I think we already have a quite stable form, I am not aware of a numerically more stable one at the moment.

@antoine-levitt
Copy link
Contributor

I don't understand at all why you would say that. There's nothing sacred or "correct" about the exponential map. It's just a canonical (associated with a specific riemannian metric) way of moving around, but so what? For a given manifold, there may be several ways of moving around, of transporting tangent vectors, several connections, etc, and that's perfectly fine. Sure, Levi-Civita is nice, but it's not the only one.

When you do optimization you already approximate the structure of the objective function (using gradient and/or hessian information). Saying that you should do the exponential map is a bit like insisting that you should follow exactly the ODE x' = -nabla f(x) instead of doing gradient descent (because gradient descent introduces an "additional error" compared to the gradient flow). You don't care about following particular trajectories, you care about getting to the minimum at minimal cost. All convergence theorems I know are insensitive to the choice of a retraction (for the deep reason that the second-order geometry of the manifold does not influence the second-order properties of the objective function near a critical point), and I've never seen any practical difference between different retractions.

@kellertuer
Copy link

Mathematically, if you are close enough to the minimiser, the retraction acts as good as the exponential map, sure. Maybe my statement was too strong there.

@antoine-levitt
Copy link
Contributor

My point is that we should not think of the exponential map as being the goal and retractions as being approximations of that goal, we should think of the exponential map as one retraction among many. I'm not an expert in differential geometry, but to me the point of the Levi-Civita is not that it's the "best connection", it's just a canonical one (that you can assign in a unique way to any riemannian manifold). Just because a choice is canonical does not mean it's not still a choice.

@kellertuer
Copy link

Levi-Cevita, well it is the only torsion free connection that preserves the metric. In the sense of properties it is the best connection. So it is not just a choice, you gain a lot from that.

The properties with which retractions are introduced is, such that they approximate the exponential map. It‘s a tradeoff: There are retractions where you can state you stay very close to exp, and there are several that are easier to evaluate or faster to evaluate or more stable – hence there might be a good reasons to take them, and as you said locally for convergence that's fine.
I am not saying the exp is always the goal, for sure not, but it is the operation a retraction “mimics”.

@antoine-levitt
Copy link
Contributor

The property of being torsion free is somewhat arbitrary. The only reason we care about torsion free is because of the uniqueness of the Levi-Civita connection: it's not that it's such a fundamental property, it's just that it is canonical.

it is the operation a retraction “mimics”.

Again I don't see it that way. The goal is to have a retraction, the exponential map just happens to provide you one. Looks lik I won't be able to convince you, but that's OK, as long as we both agree that in practice it doesn't matter much which one you choose.

@kellertuer
Copy link

In practice it does matter which you choose depending on numerical stability (that ist both a) are you clever enough to do that stable and b) does there exist a stable / closed form implementation), and I agree that it is sometimes beneficial to stick to a retraction for these reasons. And in that sense it also matters (see above exp not being stable for example). For small steps in theory it does not matter, that's right (e.g. convergence).

If possible, I prefer exp, and I see I can‘t convince you on that.
For the algorithms that we have in mind, that is a philosophical point I think, so it is also completely fine to see this differently.

@antoine-levitt
Copy link
Contributor

antoine-levitt commented May 13, 2021

see above exp not being stable for example

Again, numerical stability is a property of the algorithm implementing the mathematical function, not of the mathematical function itself. What is a property of the mathematical function is conditioning, and exp is well-conditioned. Some implementations are (technically: backward) stable, some are not. I'm pretty sure you just used a formula like exp(t) = cos(t) p + sin(t) d, which is going to be bad when t becomes comparable with sqrt(eps) ~= 1e-8, but that's not the fault of exp, it's the fault of the implementation. You can eg do exp(t) = p + sin(t) d - 2 sin(t/2)^2 p, which computes the same mathematical function but does not have trouble for t small. edit: hmm, actually maybe it does. y = p + sin(t) d - 2 sin(t/2)^2 p; x = y / norm(y); is maybe better? To be checked, but anyway the point remains.

@kellertuer
Copy link

I think we especially have a different understanding of practically. In your theoretical practicality, exp is well-conditioned, sure. In my practical practicality, we use https://github.com/JuliaManifolds/Manifolds.jl/blob/4c6cb43b9fce3ca4b506a00a6783aed9fc06a10f/src/manifolds/Sphere.jl#L183-L187, but that's sin(t)/t d, and maybe we should exchange that, you're right.

That does not mean that any exponential map can always be computed (with a different algorithm) arbitrarily exact. See for example the logarithmic map on Stiefel, which for now does not have a closed form solution (known until now).

So I am not sure which theoretical practicality you are referring to here. In theory, if you are close (small steps) exp/retr do the same in principle. In practice:

  1. If X is the direction you want to go into, exp_p(X) gets you there, retr_p(X) just approximately
  2. numerical stabilities
  3. computational effort

are all points to be taken into account together. Retr looses the first, but might win in 2 and 3.

@antoine-levitt
Copy link
Contributor

Actually now that I look at it that implementation looks fine to me (as long as X is orthogonal to p to machine precision), what's wrong with it numerically?

@dehann
Copy link
Author

dehann commented May 13, 2021

Hi wow, lots of info -- thanks! This already helps a lot. Think it's a case of diverse but minimal documentation coming from many different needs/sources.

I'm busy with some documentation examples for Manifolds.jl and will try capture some of this there to help. From my side, i'm trying to make it easier for new users to ramp up quicker, and cross checking against literature. I'm
finding even the textbooks have differing opinions about names or identifying operations.


Perhaps i could add as a big fan and user of all the mentioned packages: What I like most about Manifolds.jl is a serious effort on getting general abstractions right. Dealing with data-types is a bit hard first time round though (can be fixed via more tutorial docs, where i'm working now). What I like about Optim.jl is diverse user base and history including Flux (I already have a big dependency on Optim stretching back years), the difficult bit is a deeper assumption about '+' from Linear Alg. vs generalized on-manifold increments. What I like about Manopt.jl is well abstracted on-manifolds update rules, but support for high dimension systems might not be as well developed yet (just a younger package, so all good). Over at IncrementalInference.jl we will likely support all of the above and NLsolve too. Hence my particular care in naming conventions and operations, trying to avoid type piracy etc.

@kellertuer
Copy link

Actually now that I look at it that implementation looks fine to me (as long as X is orthogonal to p to machine precision), what's wrong with it numerically?

I will first thoroughly investigate why it happens and then propose a fix – as soon as I find time.

@kellertuer
Copy link

Hi wow, lots of info -- thanks! This already helps a lot. Think it's a case of diverse but minimal documentation coming from many different needs/sources.

I will try to extend our documentation for sue :)

Perhaps i could add as a big fan and user of all the mentioned packages: What I like most about Manifolds.jl is a serious effort on getting general abstractions right.

Thanks for this nice feedback!

Dealing with data-types is a bit hard first time round though (can be fixed via more tutorial docs, where i'm working now).

For now in Manifolds.jl data types are loosely typed, and we mostly assume to have arrays (but want to allow for static arrays, too, for example), so we only type data if it is necessary for distinction. Let‘s see, how we also can document this better.

What I like about Manopt.jl is well abstracted on-manifolds update rules, but support for high dimension systems might not be as well developed yet (just a younger package, so all good).

Let me know, when I can help somewhere, high-dimension systems should be possible using Product manifolds, for example.

@Affie
Copy link

Affie commented May 14, 2021

Thanks for all the feedback and discussion.

I have 2 questions:

  1. Do I understand it correctly then that: Optim.jl's retract!(M, x) is equivalent to Manifolds.jl's project(M, p) where x and p are points on the ambient space of manifold M projected to M.

  2. I'm still not following Optim.jl's project_tangent(M,g,x), is g (gradient?) a vector in the tangent space that is mapped back (by projection) to the manifold at the point on the manifold x. Which would make it equivalent to Manifolds.jl's retract(M, p, X)

@antoine-levitt
Copy link
Contributor

  1. I think so yes
  2. g is a vector in the ambient space that gets projected to the tangent space at x. Nothing to do with retractions.

For instance, gradient descent is
x <- retract(M, x - alpha * project_tangent(M, g, x))
where g is the gradient of the objective function at xn. Hope that helps.

@kellertuer
Copy link

Concerning 1) I think so, too.

@Affie
Copy link

Affie commented May 14, 2021

Thanks, so for 2 it stays on the tangent space. Then it looks like the relevant Manifolds.jl function is:

project(M::Manifold, p, X)

Project ambient space representation of a vector X to a tangent vector at point p on the Manifold M.

@antoine-levitt
Copy link
Contributor

Ah, it's somewhat unfortunate that manifolds uses the same name (project) for two different operations. Wouldn't it make more sense to call project(M, p) retract(M, p)?

@kellertuer
Copy link

kellertuer commented May 15, 2021

We use the same name (project) for two things that are projections. Let me elaborate and tell the origin of the functions.

Originally we had project_tangent(M, p, X) (mutating project_tangent!(M, Y, p, X) which takes a (tangent) vector X in the embedding and projects it to the tangent space at p and we had project_point(M, p)(mutating variant project_point!(M, q, p)), which takes a point pfrom the embedding and projects it onto the manifold.

Since both are actually projections, we simplified them all to be called project. Yes if you write them all down two similar signatures occur that do something different:

  • project(M, p, X) projects X on to the tangent space at p
  • project!(M, q, p) projectts p onto M inlace computed in q.

But – they are all projections and p,q are completely different objects (points) than X (a vector).

The retraction retract(M, p, X) (or its mutating variant retract!(M, q, p, X)) on the other hand takes a point p and a tangent vector X from the tangent space at p and follows a special curve emanating from p in direction X, so it ends up at a point q on the manifold, that is roughly ||X||_p away. If you even choose retract(M, p, X, ::ExponentialRetraction) the result is exactly d(p,q) = ||X||_p away.

Mathematically there is no retraction that would work without said direction. retract(M,p) is just nothing that I see somewhere, that would mathematically make sense, because a retraction always works at a point p with a tangent vector X. Even more, since there is not the retraction, the retraction also always requires which one you mean. In Manifolds it defaults to using the exponential map if you do not provide the type, but there are quite some retractions on certain manifolds, the one we discuss here is the ProjectionRetraction() but there we have retract(M, p, X, ::ProjectionRetracion) = project(M, p+x) (at least if you have a + in the embedding, it might be more complex in other embeddings)

Or two answer 2) in short: No. project_tangent(M, p, X) (which it was called in Manifolds in the beginning) and retract(M, p, X) are not the same, the first projects (an amnient) X to give you a tangent vector at p, the second expects X to be a tangent vector and computes a point on the manifold.

@antoine-levitt
Copy link
Contributor

To me projecting a point on the manifold and retractions are more alike than projecting a point on the manifold and projecting a vector on the tangent space. But naming things is always tricky!

@kellertuer
Copy link

But there is an addition within the retraction before you project?

The projection itself is definitely not a retraction, since a retraction is a map from the tangent space to the manifold – and not a map from the embedding to the manifold. So they are – mathematically – something completely different.

For the two projections – they are mathematically – projections, for example projection twice is the same as projections once. They are very similar.

@kellertuer
Copy link

Oh and even more, the retraction by QR decomposition or by SVD or these (see Stiefel for example) are also very very far from projections.

@antoine-levitt
Copy link
Contributor

Well they are projections in the sense that most often they take p+X and project back to the manifold. Also retract assumes p is on the manifold, but it could plausibly be made to work for p not on the manifold, in which case it would make sense to have retract(M,p) be the same as retract(M,p,0) - ie project(M,p).

It's just potentially confusing to have the same name for both projections on the manifold and projection on the tangent space, but again naming things is tricky! Your convention of using different letters for points on the manifolds and tangent vectors is definitely a big help here.

@kellertuer
Copy link

But the essential thing is that the (projection based) retraction is the_addition together with the projection, not just the projection alone! Only for the very special case that you mention now (X=0) and for the case that p is not on a manifold (retractions always assume that, similar to exp), then and only then project(M,p) is the same sa retract (M,p,0) and makes sense (if p is on a manifold project(M,p)=p). So I would not like to use retractions starting from points p not being on manifolds. So for me retract(M,p) is a misuse of notation either by saying ”p is bleary p+X we just don‘t tell you“ (then the retraction=projection you implement is only the second half of the retraction) or by saying ”Maybe p is not on a manifold“ (then that's not how a retraction is defined, I am sorry).

Well, both projections are projections, that‘s why we called both project. The signature of a function is always part of a function. Sure one has to be careful in naming and we have the consistent scheme, points are p,q or p1,p2,p3; vectors are X,Y or X1,X2,X3 and we always add the math formulae to the docs (at least I hope we do not forget them often).

@antoine-levitt
Copy link
Contributor

antoine-levitt commented May 15, 2021

Well, both projections are projections, that‘s why we called both project. The signature of a function is always part of a function.

Sure, but usually in julia f(x) and f(x,y) tend to be the same function (eg maybe f is defined as f(x,y=z)). That pattern is quite prevalent and so your project might be confusing. But that's life, everything is potentially confusing to somebody.

or by saying ”Maybe p is not on a manifold“ (then that's not how a retraction is defined, I am sorry).

It very much depends on your definition of a retraction. For instance the only thing wikipedia knows is https://en.wikipedia.org/wiki/Retraction_(topology)#Retract which takes a point not necessarily on the manifold and maps it back to the manifold. I've seen it used in that sense in papers (and use it in that sense in mine)

@kellertuer
Copy link

Well, for that project we do not follow the usual way, because both are projections.

For the retractions, see for example Definition 3.41 http://sma.epfl.ch/~nboumal/book/IntroOptimManifolds_Boumal_2020.pdf
which requires more, but also we are on manifolds and not just on a topological space.

@kellertuer
Copy link

Maybe to summarise:

  1. In a TopologicalSpace.jl one could introduce retract(T,p) having a fallback to project as one of the retractions (though I would then do retract(T,p) = retract(T,p,ProjectionRetraction) and only dispatch this to a projection.
    On a manifold (ie a topological space with much more structure), however, a retraction requires more structure, so that just project is not enough to be a retraction on a manifold.
    One might argue that retraction has two different definitions here, but that is nothing I can change.

  2. for the project – if one wants to be exact, the project(M,p,x)is the lazy way of writing a project(TangentSpaceAtPoint(M,p),X). By keeping in mind that p and X are different, for example also different types but at least different conceptually, there is no confusion when one reads the documentation. But I can for sure extend the documentation with some formulae.

Also not that – concerning notation – we especially also created this page https://juliamanifolds.github.io/Manifolds.jl/stable/misc/notation.html – to help the reader follow the docs.

@mateuszbaran
Copy link

Sure, but usually in julia f(x) and f(x,y) tend to be the same function (eg maybe f is defined as f(x,y=z)). That pattern is quite prevalent and so your project might be confusing. But that's life, everything is potentially confusing to somebody.

There are some exceptions, like 2-argument variant of atan or maximum with a function. Anyway, I wouldn't mind changing names here in Manifolds.jl. The problem we have here (discussed from this post JuliaManifolds/ManifoldsBase.jl#59 (comment) over some further posts) is that the generic variants are necessarily very verbose (such as project(TangentSpaceAtPoint(M,p),X)) or somewhat ambiguous. The PR I've linked does some naming cleanup (necessary for some generalizations). I'd be open doing some other changes to, for example vector-based project could be renamed to project_vector. It also connects to the problem on what the vector projection actually is, how is that connected to differentials of point projections, representations of tangent spaces in the embedding etc. At a deep enough level it's somewhat unclear even to me.

Regarding projection vs retraction, differential geometry isn't particularly consistent in naming conventions so we had to settle on something, and once the user sees that projection is "point in the embedding -> point on the manifold", and retraction is "point on a manifold, tangent vector -> point on a manifold", they are good to go and Manifolds.jl is quite consistent here.

@Affie
Copy link

Affie commented May 15, 2021

The project definitions from manifold took me a while to understand, but it's well documented.

The project_tangent(M, p, X) and project_point(M, p) might have made it simpler to understand as the return types are different, but together with the documentation, project(M, p, X) and project(M, p) was clear enough (after I looked up a bunch of definitions to understand the concepts.)

Perhaps a simple figure with a description will be enough to avoid future confusion.

@kellertuer
Copy link

A figure (on the sphere for example) would be not that hard to make, I am just not sure, where to actually include it. Maybe one cloud split projections into their own section in the interface.hml-page and put an image in the beginning?

@Affie
Copy link

Affie commented May 16, 2021

That could work nicely. Anywhere easy to find.

For me when I started looking at the documentation, I went to "getting started" first. Then browesed a bit and looked at the manifolds I was interested in and the functions they listed. I only later found the functions in the "ManifoldsBase.jl" section (they were a bit clearer and helped me more) and then found the "notation" section.
I would say the notation can be earlier in the docs and can also do with the same sphere figure, so its perhaps another option to add some basic implimentaion consepts (that users have a hard time to understand) in this section.

@kellertuer
Copy link

Well, the notations section is for notations, not for implementations, but maybe we should do the getting started then longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants