-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An Interface and first steps towards a Riemannian Gradient and a Riemannian Hessian #27
Comments
A first approach that is possible, are Differentials of the exponential map, the distance function, the log function and geodesics using Jacobi fields, see for example doi:10.3389/fams.2018.00059 – of course we did not invent that and the computation of Jacobi fields employed in there is limited to symmetric manifolds, but there it can be given in closed form (which is actually the reason for the If functions are built from above mentioned blocks, I am sure this can be automated (differentials and their adjoints to compute the gradient) and if the manifold is not symmetric on can still solve the Jacobi fields by ODEs. Note that in the above paper we were a little lazy and did not distinguish between tangent and cotangent vectors, I would prefer to do that in such an implementation. The hessian is for example given by taking such a gradient field and take the (Riemannian) derivative f that field along a next direction (indicated by another cotangent vector). While I am not 100% sure whether that can be computed as canonical as above mentioned gradients, it would be nice to know how much one can do there automatically. If the manifold is isometrically embedded, both the Riemannian gradient and the hessian can be derived using the (Euclidean) gradient and Hessian in the embedding. That's where Manopt and pymanopt have their |
There is quite a lot to digest in that paper, though I'd assume that ideally one would for example just AD through the |
Yes, it get's little technical in that paper; Look at Eq. 9, that's the Chain rule on manifolds. If you look at (10), that's the differential of a geodesic with respect to its starting point and finally Sec 4.2 (the beginning is enough) is the adjoint Jacobi fields (they put a few |
I was thinking a bit about AD and correct me if I'm wrong but I think at least in two simple cases we can use standard AD here: derivative of a manifold-valued curve and function gradients. First, derivative of For the gradient of Does this look right, @kellertuer ? |
Given f: R -> M and a point t (in R), where we want to get the (covariant) deriative at as well as a step size h. We need two function evaluations, a=f(t) and b=f(t+h) Then the derivative f'(t) can be approximated by 1/h * log(M,a,b). I think you might have missed the division by the step size (that's u at your notation?).
I am not sure about this, basically for notation reasons. the gradient is the Riesz representer of the differential, and with your notation I can get Df(p)[h_1e_1], where h contains the step sizes (in the d directionson M) and e_1,...,e_n is the tangent ONB in TpM. But how did you get from this differential to the Riesz Representer? |
Well, AD doesn't need two evaluations. That's one of the great things about it. If you apply L'Hôpital's rule to
I'll try to make it more formal, in that post I've just thrown a general idea that looks promising. It seems like |
Ah, I did not think about going to L'Hospital – yes. And I can compute the derivative of the log (hm, until now only available in for example in the MVIRT Matlab toolbox, but I want to transfer that anyways, that's also why I wanted the I hope that for several cases we can avoid |
Well, I do not need the derivative of log per se, using ForwardDiff
M = Sphere(2)
f(t) = some point on M
df(t) = ForwardDiff.derivative(h -> log(M, f(t), f(t+h)), 0) and |
I think in my mind there is something missing, either a differential of the log (but maybe I got you wrong there) or the h, but I see, that might be taken care of by forward-diff. Then this should work (for real-valued functions defined on manifolds), since basically the log replaces the (finite) difference. |
Hmm... there might be a small semantic issue. For a function f: M -> R some places define the gradient of f at p \in M as an element of T_pM and some as an element of T_p*M (cotangent space), althogh both variants would be represented in a computer in the same form (as you said, the covector/linear form would be stored as its Riesz representer). So for the discussion here we could limit ourselves to tangent vectors. Similarly to my previous code, using ForwardDiff
M = Sphere(2)
f(p) = some combination of coordinates of p
df(p) = ForwardDiff.gradient(h -> f(exp(M, p, project_tangent(M, p, h))), zero_tangent_vector(M, p)) should, I think, return the tangent vector corresponding to the direction of steepest ascent of |
BTW, Exp/log could also be replaced with retraction and inverse retraction. Correct behavior up to first order should be enough. I'm not quite sure how to turn it into some theorems. They would probably be very boring from the geometric perspective anyway, I'm doing my best to sidestep the actual differential geometry here. |
If all you have is a hammer, everything looks like a nail and here I have the boring Euclidean AD... |
The gradient is always a tangent vector; the differential always yields a cotangent. I would avoid using project here, since only tangent vectors are allowed for the last argumentof the exp (or retraction, you're right) anyways. |
Projection is required exactly because only tangent vectors are allowed there. AD doesn't know that and I think this is the simplest way to put this constraint. I've used the orthonormal basis a few posts above instead of projection so that's also an option (although a more convoluted one). |
I don't feel well with this approach, since you can enter that computation with anything (i.e. nontangents) and would neither notice that nor whether the computation is errornous. For me that feels to much like you just hope everything is right and you can barely check with all the projections in between (and by the way Although the only alternative I see is to extend AD to |
My differential geometry is not good, but isn't the gradient in fact a cotangent vector? |
Projection is performed on
There is only one projection that isn't even necessary. It's true that there is more hope than solid evidence at the moment but as I see it, to have AD we can either work out the details in the direction I've sketched or write an AD system from scratch.
It's a bit more complicated in case of, for example, Lie groups that represent tangent vectors in the tangent space at identity or vector bundles. These are two examples that I keep in mind. BTW, in which case |
Well, I just noticed that my method of calculating gradients is just an application of the definition of the directional derivative over the whole basis of the tangent space at a point. It's just using exp(t*v) as the curve from the definition. |
Ok, also with the question of @sethaxen – let's make all this a little more rigorous and look at the book of e.g. Udriste (1994):
While I am ok with the decomposition into an ONB, I prefer not to use projection. what's the difference? Let's look at the Sphere(2): And yes you'Re right its then just using |
I think I know how to make this more formal and more general. I'll write it down and send you when it's ready. |
After working on more carefully (see attachment) it turned out that everything can be justified except the projection thing. You were right about it. Do you see any mistakes in my description?
That's a valid concern, right now I'm only considering tangent vectors for simplicity. We need to think about where and how cotangent vectors are used. |
Thanks for the details, it's always good to write down precisely what you mean/intend.
Concerning the last remark – in this notation one has just to make sure, that the vectors in U_M are not too long, since the dimension is already correct, it is locally (around zero on U_M) fine. |
Right, I'll correct it. I only need that for very small
Hmm, yes, that might be easier to read, I'll try it. By the way, we don't use the orthonormality assumption anywhere, right? I ask because it might sometimes be easier to obtain a non-orthonormal basis for TpM in practice.
Right, good point. I'll change that. Thanks for feedback! |
Yes, what you are doing in the end is fine, I am just criticising that it looks like (2) and (3) are equal, while in fact (3) approximates (2) – and for small
I would not drop that, since with Gram Schmidt you can get an ONB easily and then the decomposition is easier then for non ON Bs |
I've made some edits to reflect that (3) is an approximation and use ONBs everywhere. I've also introduced \tilde{\mathrm{d}f_p} to denote the function that will actually be implemented. I'm wondering now where I should expand on ONB of TpM itself. The part that I actually need are Eqs. (7) and (8) since I can implement these two function on a computer and calculate their Jacobians at (0,0,...,0) to get \hat{\mathrm{d}f_p}. Here is the LaTeX source in case you wanted to do some edits: \documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{amssymb}
\begin{document}
Let $M$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{m}}$ by $h_M\colon M \to \mathbb{R}^{\hat{m}}$ and $N$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{n}}$ by $h_N\colon N \to \mathbb{R}^{\hat{n}}$.
Let $p$ be a point on $M$ with a neigborhood $M_{p} \subseteq M$.
Let $f \colon M_{p} \to N$ be a $C^1$ map.
The embeddings are not necessarily isometric but they must be at least $C^1$ and invertible in a neighborhood of $p$ (or, respectively, $f(p)$.
The differential $\mathrm{d} f_p \colon T_p M \to T_{f(p)} N$ is a function such that for any curve $\gamma \colon \mathbb{R} \to M$, $\gamma(0) = p$, $\gamma'(0) = v$, $\gamma$ corresponds to the tangent vector $v\in T_{p}M$, we have
\begin{equation}
\label{eq:differential}
\mathrm{d}f_p(\gamma'(0)) = (f \circ \gamma)'(0).
\end{equation}
For example $\gamma(t) = \exp_p(tv)$. Thus
\begin{equation}
\label{eq:differential-2}
\mathrm{d}f_p(v) = \frac{\mathrm{d}}{\mathrm{d} t}f(\exp_p(tv))(0).
\end{equation}
This can also be approximated
\begin{equation}
\label{eq:differential-3}
\mathrm{d}f_p(v) \approx \log_{f(p)}( f(\exp_p(v)))
\end{equation}
for small vectors $v$. For sufficiently small vectors $v$ all functions are defined.
We can represent the function $f$ as a mapping between subsets of the spaces $M_p$ and $N$ are embedded in, $\hat{f} \colon h_M^{-1}(M_p) \to \mathbb{R}^{\hat{n}}$:
\begin{equation}
\hat{f}(x) = h_N(f(h_M^{-1}(x)))
\end{equation}
for any $x\in h_M(M_p)$.
Similarly, let us assume that $T_p M$ is embedded by $h_{T_p M}$ as a linear subspace $U_M$ of $\mathbb{R}^{\hat{m}}$ and $T_{f(p)} N$ is embedded by $h_{T_{f(p)} N}$ as a linear subspace $U_N$ of $\mathbb{R}^{\hat{n}}$.
Using these, we can represent the differential $\mathrm{d}f_p$ in the embedding by $\hat{\mathrm{d}f_p} \colon U_M \to U_N$:
\begin{equation}
\hat{\mathrm{d}f_p}(v) = h_{T_{f(p)} N}(\mathrm{d}f_p (h_{T_p M}^{-1}(v)))
\end{equation}
for any $v \in U_M$.
In this setting $\hat{\mathrm{d}f_p}$ is just a linear transformation between two vector subspaces. It is thus completely determined by, for example, its values on a basis of $U_M$.
Substituting everything we get
\begin{equation}
\label{eq:lots-of-circs}
\hat{\mathrm{d}f_p}(v) \approx \tilde{\mathrm{d}f_p}(v) = (h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1} \circ \hat{f}\circ h_M \circ \exp_p \circ h_{T_p M}^{-1})(v)
\end{equation}
which might look useless until we notice that we can calculate values of $h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1}$, $\hat{f}$ and $h_M \circ \exp_p \circ h_{T_p M}^{-1}$ easily in our computer programs.
One way forward now is to put different vectors $v$ to Eq.~\eqref{eq:lots-of-circs} and see what is returned. We could, however, take an orthonormal basis $v_1, v_2, \dots, v_m$ of $U_M$, define
\begin{equation}
g(t_1, t_2, \dots, t_m) = \tilde{\mathrm{d}f_p}\left(\sum_{i=1}^m t_i v_i\right)
\end{equation}
and calculate Jacobian of $g$ at zeros using automatic differentiation to get an easy method of computing $\hat{\mathrm{d}f_p}(v)$.
Alternatively, we could take an orthonormal basis $v_1, v_2, \dots, v_{\hat{m}}$ of our embedding of $T_p M$ in $\mathbb{R}^{\hat{m}}$, define
\begin{equation}
g_2(t_1, t_2, \dots, t_{\hat{m}}) = \tilde{\mathrm{d}f_p}\left(\sum_{i=1}^{\hat{m}} t_i v_i\right)
\end{equation}
and calculate Jacobian of $g_2$. As long as the it is given a vector from $U_M$ the expected result will be returned, although care must be taken to avoid giving $g_2$ coefficients $t_i$ that do not correspond to a vector from $U_M$.
\end{document} |
I keep thinking about this problem and I hope we can find a solution that doesn't involve ONBs of tangent spaces. Computing them would be prohibitively expensive for high-dimensional manifolds. I've looked what pymanopt does and I don't see any justification that these projections in egrad2rgrad result in actual Riemannian gradients and not just some approximations that are usually good enough for optimization. |
I've made a few changes to the text that hopefully provide a good enough justification for the projection I've originally proposed: \documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{amssymb}
\begin{document}
Let $M$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{m}}$ by $h_M\colon M \to \mathbb{R}^{\hat{m}}$ and $N$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{n}}$ by $h_N\colon N \to \mathbb{R}^{\hat{n}}$.
Let $p$ be a point on $M$ with a neigborhood $M_{p} \subseteq M$.
Let $f \colon M_{p} \to N$ be a $C^1$ map.
The embeddings are not necessarily isometric but they must be at least $C^1$ and invertible in a neighborhood of $p$ (or, respectively, $f(p)$.
The differential $\mathrm{d} f_p \colon T_p M \to T_{f(p)} N$ is a function such that for any curve $\gamma \colon \mathbb{R} \to M$, $\gamma(0) = p$, $\gamma'(0) = v$, $\gamma$ corresponds to the tangent vector $v\in T_{p}M$, we have
\begin{equation}
\label{eq:differential}
\mathrm{d}f_p(\gamma'(0)) = (f \circ \gamma)'(0).
\end{equation}
For example $\gamma(t) = \exp_p(tv)$. Thus
\begin{equation}
\label{eq:differential-2}
\mathrm{d}f_p(v) = \frac{\mathrm{d}}{\mathrm{d} t}f(\exp_p(tv))(0).
\end{equation}
This can also be approximated
\begin{equation}
\label{eq:differential-3}
\mathrm{d}f_p(v) \approx \log_{f(p)}( f(\exp_p(v)))
\end{equation}
for small vectors $v$. For sufficiently small vectors $v$ all functions are defined.
We can represent the function $f$ as a mapping between subsets of the spaces $M_p$ and $N$ are embedded in, $\hat{f} \colon h_M^{-1}(M_p) \to \mathbb{R}^{\hat{n}}$:
\begin{equation}
\hat{f}(x) = h_N(f(h_M^{-1}(x)))
\end{equation}
for any $x\in h_M(M_p)$.
Similarly, let us assume that $T_p M$ is embedded by $h_{T_p M}$ as a linear subspace $U_M$ of $\mathbb{R}^{\hat{m}}$ and $T_{f(p)} N$ is embedded by $h_{T_{f(p)} N}$ as a linear subspace $U_N$ of $\mathbb{R}^{\hat{n}}$.
Using these, we can represent the differential $\mathrm{d}f_p$ in the embedding by $\hat{\mathrm{d}f_p} \colon U_M \to U_N$:
\begin{equation}
\hat{\mathrm{d}f_p}(v) = h_{T_{f(p)} N}(\mathrm{d}f_p (h_{T_p M}^{-1}(v)))
\end{equation}
for any $v \in U_M$.
Let $\hat{\mathrm{d}f_{p,\Pi}}$ be an extension of $\hat{\mathrm{d}f_p}$ to the entire space $\mathbb{R}^{\hat{m}}$ given by
\begin{equation}
\hat{\mathrm{d}f_{p,\Pi}}(v) = \hat{\mathrm{d}f_{p}}(\Pi_{U_M}(v))
\end{equation}
where $v\in \mathbb{R}^{\hat{m}}$ and $\Pi_{U_M}\colon \mathbb{R}^{\hat{m}} \to U_M$ is the orthogonal projection onto the subspace $U_M$.
In this setting $\hat{\mathrm{d}f_{p,\Pi}}$ is just a linear transformation between two vector spaces. It is thus completely determined by, for example, its values on a basis of $\mathbb{R}^{\hat{m}}$.
Substituting everything we get
\begin{equation}
\label{eq:lots-of-circs}
\hat{\mathrm{d}f_{p,\Pi}}(v) \approx \tilde{\mathrm{d}f_p}(v) = (h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1} \circ \hat{f}\circ h_M \circ \exp_p \circ h_{T_p M}^{-1} \circ \Pi_{U_M})(v)
\end{equation}
which might look useless until we notice that we can calculate values of $h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1}$, $\hat{f}$, $h_M \circ \exp_p \circ h_{T_p M}^{-1}$ and $\Pi_{U_M}$ easily in our computer programs.
One way forward now is to put different vectors $v$ to Eq.~\eqref{eq:lots-of-circs} and see what is returned. We could, however, take an orthonormal basis $v_1, v_2, \dots, v_{\hat{m}}$ of $\mathbb{R}^{\hat{m}}$, define
\begin{equation}
g(t_1, t_2, \dots, t_{\hat{m}}) = \tilde{\mathrm{d}f_p}\left(\sum_{i=1}^{v_{\hat{m}}} t_i v_i\right)
\end{equation}
and calculate Jacobian of $g$ at zeros using automatic differentiation to get an easy method of computing $\hat{\mathrm{d}f_{p,\Pi}}(v)$ which is equal to $\hat{\mathrm{d}f_p}(v)$ on the subspace $U_M$.
\end{document} Maybe it's not perfect and there will be some numerical issues but it looks to me more solid than what pymanopt does. |
@kellertuer Would this paper be helpful for designing the interface for Riemannian Hessians? LInk: https://arxiv.org/abs/2009.10159 . |
Looks interesting, though starts in the embedding as well. But the quotient part looks cool once we have quotients (which I want to tackle when our reorg is done). |
This thread -- as a continuation of the discussion started in JuliaManifolds/Manifolds.jl#27 -- shall collect ideas and approaches to the Riemannian Gradient (for example employing tools like the
egrad2rgrad
from Manopt) and the Riemannian Hessian. Both should be accompanied by a thorough theoretical documentation. I am not completely sure how far this can be generalized to get gradients of arbitrary functions, but let's see how far we get in this discussion.The text was updated successfully, but these errors were encountered: