-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optim.retract differs from Manifolds.retract, maybe? #920
Comments
also cc @david-m-rosen |
Wow, you tackle all the points :) The retraction topic might be a little difficult, since there especially is not the retraction, but you can phrase many operations on manifolds to be retractions. Let's start in the beginning: What is a retraction? See for example http://sma.epfl.ch/~nboumal/book/index.html Definition 3.40: A retraction as a map from the tangent bundle to the manifold such that the restriction to one tangent space retr_p is p for the zero tangent vector and its differential at zero is the identity. This can be seen as a first ordre approximation to the exponential map. The idea is that the exponential map might be too expensive. And on the sphere this is exactly what one does with projection: Walk into the embedding (p + X) and project back. So first in Manifolds.jl – we implement So – what does Optim do? The anyways have always a + in their algorithms, so the surround this by manifold-ideas (not completely mathematically rigorous, but + and then projection – keeps you on the manifold, if there is a projection available). For CG in Manopt.jl, keep in mind that
Summary (i.e. the tl;dr)As phrased here JuliaManifolds/Manifolds.jl#35 (comment) „Manopt.jl is there for more serious manifoldization.“ – with all my bias (as developer of Manopt.jl) and phrased carefully: If you have implemented something using Optim and want to shortly check a Manifold idea (with one of their Manifolds) – do it. If you want to do the serious optimization on Manifolds (similar to Manopt in Matlab or pymantop in Python) – and be able to use any manifold from here – please use Manopt.jl. If you are missing an algorithm, let me know. |
Great to see this taken up! I wanted to do it at some point but was waiting on @pkofod 's fabled rewrite :-)
I made this choice because it was simpler to modify the existing code in this way. Optim's retract (x+d) is manifolds' retract(x, d), and Optim's retract(x) is manifold's project(x). This is motivated by the fact that optimization algorithms are not very sensitive to the choice of retraction, and that optim only deals with manifolds as constraints (ie embedded manifolds). As far as I know there's no standard terminology on these points.
That's a bug, my bad! Should be the opposite signature of course. |
Uh, they are. I just spent 2 hours today with a student with some numerical instabilities, until we exchanged the retraction with one a little less good in approximation (of exp) but far better in stability and the algorithm ran smoothly (here will be a Trust Region with Approximate Hessian-SR1-Updates soon-ish). |
Do you mean that you have two mathematically well-defined retractions and one performs repeatedly and noticeably better than the other? If so I'd be interested in that. Are you sure you don't mean that one implementation of a retraction was numerically unstable and you switched to a numerically more stable one? |
I actually mean two retractions, both well-defined and such. So for example on the Sphere, you have
For large step things (or if you are fine stopping early) bot are equally fine, but below steps of size The same holds even more if you take into account vector transports and their interplay with retractions (i.e. preferably take a vector transport by differentiated retraction of the same retraction for the best experience). |
That's not contradictory with what I said: mathematically (in infinite precision) both are equally fine, but of course you have to take care to implement them properly (stably). I'm pretty sure you can implement the exponential retraction stably - you probably need to use tricks like trigonometric identities to transform cos(x)-1 to be able to compute it stably for x small |
Ie: you have to distinguish the mathematical function and the algorithm implementing it. For me "retraction" only refers to the former |
Mathematically you should always use the exponential map, since the retraction only approximates (up to first order) the correct way to “step in direction X” on a manifold. Retractions (as far as I am aware introduced in the Abils, Mahony, Sepulchre book) are only introduced to do the numerical algorithms and implementations, for example when there is no closed form known of the solution to the ODE defining the exponential map. So mathematically with retractions you introduce an additional error (comparable to not waling along straight lines). For sure since those are first order approximations for “small steps” you are “fine enough”. Concerning the concrete example: I think we already have a quite stable form, I am not aware of a numerically more stable one at the moment. |
I don't understand at all why you would say that. There's nothing sacred or "correct" about the exponential map. It's just a canonical (associated with a specific riemannian metric) way of moving around, but so what? For a given manifold, there may be several ways of moving around, of transporting tangent vectors, several connections, etc, and that's perfectly fine. Sure, Levi-Civita is nice, but it's not the only one. When you do optimization you already approximate the structure of the objective function (using gradient and/or hessian information). Saying that you should do the exponential map is a bit like insisting that you should follow exactly the ODE x' = -nabla f(x) instead of doing gradient descent (because gradient descent introduces an "additional error" compared to the gradient flow). You don't care about following particular trajectories, you care about getting to the minimum at minimal cost. All convergence theorems I know are insensitive to the choice of a retraction (for the deep reason that the second-order geometry of the manifold does not influence the second-order properties of the objective function near a critical point), and I've never seen any practical difference between different retractions. |
Mathematically, if you are close enough to the minimiser, the retraction acts as good as the exponential map, sure. Maybe my statement was too strong there. |
My point is that we should not think of the exponential map as being the goal and retractions as being approximations of that goal, we should think of the exponential map as one retraction among many. I'm not an expert in differential geometry, but to me the point of the Levi-Civita is not that it's the "best connection", it's just a canonical one (that you can assign in a unique way to any riemannian manifold). Just because a choice is canonical does not mean it's not still a choice. |
Levi-Cevita, well it is the only torsion free connection that preserves the metric. In the sense of properties it is the best connection. So it is not just a choice, you gain a lot from that. The properties with which retractions are introduced is, such that they approximate the exponential map. It‘s a tradeoff: There are retractions where you can state you stay very close to exp, and there are several that are easier to evaluate or faster to evaluate or more stable – hence there might be a good reasons to take them, and as you said locally for convergence that's fine. |
The property of being torsion free is somewhat arbitrary. The only reason we care about torsion free is because of the uniqueness of the Levi-Civita connection: it's not that it's such a fundamental property, it's just that it is canonical.
Again I don't see it that way. The goal is to have a retraction, the exponential map just happens to provide you one. Looks lik I won't be able to convince you, but that's OK, as long as we both agree that in practice it doesn't matter much which one you choose. |
In practice it does matter which you choose depending on numerical stability (that ist both a) are you clever enough to do that stable and b) does there exist a stable / closed form implementation), and I agree that it is sometimes beneficial to stick to a retraction for these reasons. And in that sense it also matters (see above exp not being stable for example). For small steps in theory it does not matter, that's right (e.g. convergence). If possible, I prefer exp, and I see I can‘t convince you on that. |
Again, numerical stability is a property of the algorithm implementing the mathematical function, not of the mathematical function itself. What is a property of the mathematical function is conditioning, and exp is well-conditioned. Some implementations are (technically: backward) stable, some are not. I'm pretty sure you just used a formula like |
I think we especially have a different understanding of practically. In your theoretical practicality, exp is well-conditioned, sure. In my practical practicality, we use https://github.com/JuliaManifolds/Manifolds.jl/blob/4c6cb43b9fce3ca4b506a00a6783aed9fc06a10f/src/manifolds/Sphere.jl#L183-L187, but that's That does not mean that any exponential map can always be computed (with a different algorithm) arbitrarily exact. See for example the logarithmic map on Stiefel, which for now does not have a closed form solution (known until now). So I am not sure which theoretical practicality you are referring to here. In theory, if you are close (small steps) exp/retr do the same in principle. In practice:
are all points to be taken into account together. Retr looses the first, but might win in 2 and 3. |
Actually now that I look at it that implementation looks fine to me (as long as X is orthogonal to p to machine precision), what's wrong with it numerically? |
Hi wow, lots of info -- thanks! This already helps a lot. Think it's a case of diverse but minimal documentation coming from many different needs/sources. I'm busy with some documentation examples for Manifolds.jl and will try capture some of this there to help. From my side, i'm trying to make it easier for new users to ramp up quicker, and cross checking against literature. I'm Perhaps i could add as a big fan and user of all the mentioned packages: What I like most about Manifolds.jl is a serious effort on getting general abstractions right. Dealing with data-types is a bit hard first time round though (can be fixed via more tutorial docs, where i'm working now). What I like about Optim.jl is diverse user base and history including Flux (I already have a big dependency on Optim stretching back years), the difficult bit is a deeper assumption about '+' from Linear Alg. vs generalized on-manifold increments. What I like about Manopt.jl is well abstracted on-manifolds update rules, but support for high dimension systems might not be as well developed yet (just a younger package, so all good). Over at IncrementalInference.jl we will likely support all of the above and NLsolve too. Hence my particular care in naming conventions and operations, trying to avoid type piracy etc. |
I will first thoroughly investigate why it happens and then propose a fix – as soon as I find time. |
I will try to extend our documentation for sue :)
Thanks for this nice feedback!
For now in Manifolds.jl data types are loosely typed, and we mostly assume to have arrays (but want to allow for static arrays, too, for example), so we only type data if it is necessary for distinction. Let‘s see, how we also can document this better.
Let me know, when I can help somewhere, high-dimension systems should be possible using Product manifolds, for example. |
Thanks for all the feedback and discussion. I have 2 questions:
|
For instance, gradient descent is |
Concerning 1) I think so, too. |
Thanks, so for 2 it stays on the tangent space. Then it looks like the relevant
|
Ah, it's somewhat unfortunate that manifolds uses the same name (project) for two different operations. Wouldn't it make more sense to call project(M, p) retract(M, p)? |
We use the same name (project) for two things that are projections. Let me elaborate and tell the origin of the functions. Originally we had Since both are actually projections, we simplified them all to be called
But – they are all projections and The retraction Mathematically there is no retraction that would work without said direction. Or two answer 2) in short: No. |
To me projecting a point on the manifold and retractions are more alike than projecting a point on the manifold and projecting a vector on the tangent space. But naming things is always tricky! |
But there is an addition within the retraction before you project? The projection itself is definitely not a retraction, since a retraction is a map from the tangent space to the manifold – and not a map from the embedding to the manifold. So they are – mathematically – something completely different. For the two projections – they are mathematically – projections, for example projection twice is the same as projections once. They are very similar. |
Oh and even more, the retraction by QR decomposition or by SVD or these (see Stiefel for example) are also very very far from projections. |
Well they are projections in the sense that most often they take p+X and project back to the manifold. Also It's just potentially confusing to have the same name for both projections on the manifold and projection on the tangent space, but again naming things is tricky! Your convention of using different letters for points on the manifolds and tangent vectors is definitely a big help here. |
But the essential thing is that the (projection based) retraction is the_addition together with the projection, not just the projection alone! Only for the very special case that you mention now (X=0) and for the case that p is not on a manifold (retractions always assume that, similar to exp), then and only then project(M,p) is the same sa retract (M,p,0) and makes sense (if p is on a manifold project(M,p)=p). So I would not like to use retractions starting from points p not being on manifolds. So for me retract(M,p) is a misuse of notation either by saying ”p is bleary p+X we just don‘t tell you“ (then the retraction=projection you implement is only the second half of the retraction) or by saying ”Maybe p is not on a manifold“ (then that's not how a retraction is defined, I am sorry). Well, both projections are projections, that‘s why we called both project. The signature of a function is always part of a function. Sure one has to be careful in naming and we have the consistent scheme, points are p,q or p1,p2,p3; vectors are X,Y or X1,X2,X3 and we always add the math formulae to the docs (at least I hope we do not forget them often). |
Sure, but usually in julia f(x) and f(x,y) tend to be the same function (eg maybe f is defined as f(x,y=z)). That pattern is quite prevalent and so your
It very much depends on your definition of a retraction. For instance the only thing wikipedia knows is https://en.wikipedia.org/wiki/Retraction_(topology)#Retract which takes a point not necessarily on the manifold and maps it back to the manifold. I've seen it used in that sense in papers (and use it in that sense in mine) |
Well, for that project we do not follow the usual way, because both are projections. For the retractions, see for example Definition 3.41 http://sma.epfl.ch/~nboumal/book/IntroOptimManifolds_Boumal_2020.pdf |
Maybe to summarise:
Also not that – concerning notation – we especially also created this page https://juliamanifolds.github.io/Manifolds.jl/stable/misc/notation.html – to help the reader follow the docs. |
There are some exceptions, like 2-argument variant of Regarding projection vs retraction, differential geometry isn't particularly consistent in naming conventions so we had to settle on something, and once the user sees that projection is "point in the embedding -> point on the manifold", and retraction is "point on a manifold, tangent vector -> point on a manifold", they are good to go and Manifolds.jl is quite consistent here. |
The The Perhaps a simple figure with a description will be enough to avoid future confusion. |
A figure (on the sphere for example) would be not that hard to make, I am just not sure, where to actually include it. Maybe one cloud split projections into their own section in the interface.hml-page and put an image in the beginning? |
That could work nicely. Anywhere easy to find. For me when I started looking at the documentation, I went to "getting started" first. Then browesed a bit and looked at the manifolds I was interested in and the functions they listed. I only later found the functions in the "ManifoldsBase.jl" section (they were a bit clearer and helped me more) and then found the "notation" section. |
Well, the notations section is for notations, not for implementations, but maybe we should do the getting started then longer. |
Manifolds and Optim walk into a bar asking for
Spheres
. Manifolds says thenorm
is toproject!
https://github.com/JuliaManifolds/Manifolds.jl/blob/4c6cb43b9fce3ca4b506a00a6783aed9fc06a10f/src/manifolds/Sphere.jl#L388
Optim, goes whoa dude,
retract!
Optim.jl/src/Manifolds.jl
Line 69 in e439de4
thunk....
We (@Affie and I) are trying to figure out how to combine Optim with Manifolds, and after many hours we are starting to think that
Optim
is actually always projecting, however, we now think that Optim perhaps needs to change to a more generalretract
about pointp
given vectorx
. The most difficult thing for us right now is understanding what everyone's terminology is since all references of retract always keep saying "projecting".Should
Optim.retract!
calls not rather always accept a manifold pointp
at which the tangent space is taken? See for exampleManifoldsBase.retract(M,p,X)
, where I understand Manifold typeM
, with pointp
inM
, and a vectorX
in the tangent space ofM
atp
(e.g.X
is a Lie algebra element).Manopt.jl, also states
retr
at pointx_k
of vectors*delta
.PS, see in-place
ManifoldsBase.retract!(M,q,p,X)
Optim docs add more confusion, saying:
Yet the Optim.jl functions in code have the opposite parameter signatures:
Optim.jl/src/Manifolds.jl
Lines 1 to 3 in e439de4
not sure how to read
g
(gradient / tangent?) andx
(ambient vector if is an embedding?).Also:
cc @kellertuer
The text was updated successfully, but these errors were encountered: