Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symmetric/Hermitian matrix function rules #193
Symmetric/Hermitian matrix function rules #193
Changes from 76 commits
6d66438
cac5290
bd16f24
bbbabf5
b776223
8d1fdd0
d50ba1e
ee3a6fb
622b5b4
14b7266
57df366
8c8790a
0fdd8a5
ef15c71
ad9d36f
0078a58
5d1685f
4366b26
f6075a5
90e3225
e35ffef
a604e45
c28304c
e211034
3ccfa91
e4ec19d
6d7c00c
115c201
c877818
f41bfe0
52eef4d
af12bed
46a4ec4
e257b11
6010c2f
9571d19
d7e3762
64f96ee
109ce2c
00cccbc
eb0a7e2
c5a37da
648e13e
d8b22f1
0b0cd85
f945b13
2087dac
97ec070
5f1529d
537c1f8
65be168
2b3e11a
ac0253c
b7b83f5
1c6a889
21340c9
7bf9b7c
7d78762
d1e9947
444a49b
7dcc8a2
eb52188
f78945a
c8885cd
b65f552
e6106f3
68b9597
0778d7a
77cba6d
fc34770
62afe5e
69bcbe8
6f6d38b
2f6cbeb
673a258
357ecb8
bf2191c
f1cba00
05e8363
8a60771
8c77687
3ce3d8a
9b40c09
e9aef74
e338822
17686f2
359a1dc
32a9cca
d81ee72
73d5d01
fd60d44
867ea12
b7f6c40
f956f62
d3ff01a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
Y is Diagonal
? do we need to worry about that case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matrix functions when called on a
LinearAlgebra.RealHermSymComplexHerm
that wraps aStridedMatrix
will always returnUnion{Symmetric,Hermitian,Matrix}
. If someone wraps aDiagonal
, all of the matrix functions error ateigen!
. The only way one could get aDiagonal
would be to implement their own diagonal matrix type, so I think we're safe here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because something something linear operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, a different reason. I don't have a name for the property, but is a property of any function that can be written as a converging power series with real coefficients. I haven't posted the proof anywhere or seen it before, but I'm sure it could be argued from some generic property. For Hermitian matrices, it can be shown by applying the usual inner product trick in the ChainRules docs to derive the pullback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps @antoine-levitt knows of this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, what property are you talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a matrix function
f
that can be written in terms of a converging power series with real coefficients,its pullback
f^*
is related to its pushforwardf_*
by an adjoint. Specifically, ifY=f(A)
,ΔA
is a tangent ofA
, andΔY
is a cotangent ofY
(adopting ChainRules'/Zygote's conventions for how a cotangent is represented), then(f^*)_{Y} (ΔY) = (f_*)_{A'} (ΔY)
. The property means we can write the pullback for any matrix function in terms of itsfrule
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got there through a much more complicated route (writing the power series as a recurrence and working out the corresponding pushforwards and pullbacks as recurrences yields this relation), but I'll check to see if we can get there from that property as well. That would be nice, haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm no it looks more complicated than that, it's df(A)' = df(A') or something like that, I'm not sure it follows from f(A)' = f(A') (but maybe it does, haven't checked carefully). I use this trick in the hermitian case because it means that the differential is self-adjoint on the space of hermitian matrices equipped with the Frobenius metric, but I didn't know about it in the non-hermitian case, it's cute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would be worth me writing this up somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so let's do the proof for A^n, n ≥ 1. Then we can pass to arbitrary functions by linearity.
df(A)⋅dA = ∑_{k=0^n-1} A^k dA A^(n-1-k), so let's look at the adjoint of the linear operator L(A) : dA -> A^k1 dA A^k2, and the result follows again by linearity
<L(A) dA, dB> = tr((L(A) dA)' dB) = tr(A^k2' dA' A^k1' dB) = tr(dA' A^k1' dB A^k2') so adj(L(A)) : dB -> A^k1' dB A^k2' so adj(L(A)) = L(A')
That's probably close to the derivation you had?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay yes, but yours is much simpler and more concise than mine. Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's more clever than what I do and gets you eps^2/3 (I only get eps^1/2), nice! Of course this is all assuming that all quantities are order 1 (or else you need to be careful about relative errors rather than absolute ones), but it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, yeah. TBH, error analysis is not my thing. I plan to come back and try to improve this in the future. For the moment, this seems to be fine. I haven't been able to construct a random almost-degenerate matrix for which the pushforward/pullback disagrees with finite differences so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it'll agree to eps^2/3 if what you wrote is correct. I can check the math if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be great if you have the time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, checks out. The approximation (f(l1) - f(l2))/(l1-l2) ~= (f'(l1) + f'(l2))/2 is accurate to order dl^2. The roundoff error when computing (f(l1) - f(l2))/(l1-l2) is order eps/dl. So it's advantageous to switch to the approximation when eps/dl ~= dl^2 => dl = cbrt(eps). The worst case accuracy is indeed eps^2/3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks for checking that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want a package that just knows about this kind of thing
SpecialFunctionProperties.jl