-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Q-based qrjacobimatrix() in O(n) #138
Conversation
@dlfivefifty Not sure what's going on here, I think the failing tests have nothing to do with my changes. Did something recently change further upstream in the dependencies? |
Since we know the true result in this case, how is the 2-norm stability of each approach? Some weight modifications preserve even-odd symmetries, like sqrtw(x) = 1-x^2. How does each approach fare for symmetry preservation? |
Let's see... julia> Jclass = jacobimatrix(Normalized(jacobi(2,0,0..1)))
ℵ₀×ℵ₀ LazyBandedMatrices.SymTridiagonal{Float64, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(-), Tuple{BroadcastVector{Float64, typeof(/), Tuple{Float64, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}}}, Float64}}}}, BroadcastVector{Float64, typeof(sqrt), Tuple{BroadcastVector{Float64, typeof(*), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Int64, Int64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}}}}}}}} with indices OneToInf()×OneToInf():
0.25 0.193649 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ …
0.193649 0.416667 0.225374 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 0.225374 0.458333 0.236228 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 0.236228 0.475 0.241209 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 0.241209 0.483333 0.243904 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 0.243904 0.488095 0.245525 ⋅ …
⋅ ⋅ ⋅ ⋅ ⋅ 0.245525 0.491071 0.246576
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.246576 0.493056
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.247296
⋮ ⋮ ⋱
julia> N = 100_000
100000
julia> norm(Jchol[1:N,1:N]-Jclass[1:N,1:N],2)
4.847754503646846e-7
julia> norm(JqrQ[1:N,1:N]-Jclass[1:N,1:N],2)
7.385957369779282e-14
julia> norm(JqrR[1:N,1:N]-Jclass[1:N,1:N],2)
1.5291452704376217e-13 Maybe ever so slightly better to do Q than R in terms of stability? But given the roughly equivalent cost (slightly larger constant in Q method I think due to constructing the givens / householder matrices) I don't see a reason not to use this. Keep in mind this comparison is unfair to Cholesky since it goes directly here rather than step by step where it performs a lot closer to the QR method. Here is another example (again, where I do the not-ideal thing of going directly instead of step by step) which shows similar behavior: julia> wf(x) = x^2*(1-x)^2;
julia> sqrtwf(x) = x*(1-x);
julia> Jchol = cholesky_jacobimatrix(wf, P);
julia> JqrQ = qr_jacobimatrix(sqrtwf, P);
julia> JqrR = qr_jacobimatrix(sqrtwf, P, :R);
julia> Jclass = jacobimatrix(Normalized(jacobi(2,2,0..1)))
ℵ₀×ℵ₀ LazyBandedMatrices.SymTridiagonal{Float64, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(-), Tuple{BroadcastVector{Float64, typeof(/), Tuple{Float64, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}}}, Float64}}}}, BroadcastVector{Float64, typeof(sqrt), Tuple{BroadcastVector{Float64, typeof(*), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Int64, Int64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}}}}}}}} with indices OneToInf()×OneToInf():
0.5 0.188982 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ …
0.188982 0.5 0.218218 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 0.218218 0.5 0.230283 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 0.230283 0.5 0.236525 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 0.236525 0.5 0.240192 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 0.240192 0.5 0.242536 ⋅ …
⋅ ⋅ ⋅ ⋅ ⋅ 0.242536 0.5 0.244126
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.244126 0.5
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.245256
⋮ ⋮ ⋱
julia> N = 100_000
100000
julia> norm(Jchol[1:N,1:N]-Jclass[1:N,1:N],2)
8.88926155041543e-5
julia> norm(JqrQ[1:N,1:N]-Jclass[1:N,1:N],2)
7.653723227459066e-11
julia> norm(JqrR[1:N,1:N]-Jclass[1:N,1:N],2)
2.2962555778506003e-10 What would you like to see as a test of symmetry? I think currently Sheehan's |
I can't decide whether to ask why the errors are so large in the second example or so small in the first 😅 Since if round-off grows like N we expect errors on the order of: julia> eps() * 100_000
2.220446049250313e-11 |
I don't know. 😅 But actually I think the first is more representative of the general behavior, not the second. In the second I'm doing |
Just to back up my claim with code here is an example of raising by the other weight term in one step, at a higher point in the Jacobi family. If we do a single step we see the errors observed in the first example. julia> P = Normalized(jacobi(3,3,0..1));
julia> x = axes(P,1);
julia> J = jacobimatrix(P);
julia> sqrtwf(x) = x;
julia> JqrQ = qr_jacobimatrix(sqrtwf, P);
julia> JqrR = qr_jacobimatrix(sqrtwf, P, :R);
julia> N = 100_000
100000
julia> Jclass = jacobimatrix(Normalized(jacobi(3,5,0..1)))
ℵ₀×ℵ₀ LazyBandedMatrices.SymTridiagonal{Float64, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(-), Tuple{BroadcastVector{Float64, typeof(/), Tuple{Float64, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}}}, Float64}}}}, BroadcastVector{Float64, typeof(sqrt), Tuple{BroadcastVector{Float64, typeof(*), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Int64, Int64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}}}}}}}} with indices OneToInf()×OneToInf():
0.6 0.14771 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ …
0.14771 0.566667 0.184374 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 0.184374 0.547619 0.203579 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 0.203579 0.535714 0.215229 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 0.215229 0.527778 0.222909 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 0.222909 0.522222 0.228266 ⋅ …
⋅ ⋅ ⋅ ⋅ ⋅ 0.228266 0.518182 0.232161
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.232161 0.515152
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.235087
⋮ ⋮ ⋱
julia> norm(JqrQ[1:N,1:N]-Jclass[1:N,1:N],2)
6.784308695781622e-14
julia> norm(JqrR[1:N,1:N]-Jclass[1:N,1:N],2)
1.619843167289299e-13 Same behavior in terms of norms. |
Ok actually the constant between the Q and R method is more substantial than the simple timer was suggesting because of an implementation detail in how cached vectors are expanded. Here are two CPU timings showing the actual linear complexity in both methods with current implementation. I prematurely stopped optimizing the Q method because I thought it was matching R but with the difference being this big (roughly 5x the cpu time) it is worth taking another shot at bringing the cost down. I have a few implementation ideas that should do it based on profiling them, so it should end up closer together than it is now. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #138 +/- ##
==========================================
+ Coverage 89.69% 89.91% +0.22%
==========================================
Files 17 17
Lines 1756 1795 +39
==========================================
+ Hits 1575 1614 +39
Misses 181 181
☔ View full report in Codecov by Sentry. |
Let's say it's fast enough! There's probably a hard theoretical limit where the Q method is some specific constant slower than the R method. Are you computing both bands of the modified Jacobi matrix independently? Looks like some code specializes on :dv and :ev. |
Yes but The way around this would be to make a new struct as a subset of This would be easy to do if we want to tease out more efficiency but it would "fit" less well into existing packages which expect |
I should add: There are certain advantages to the current approach. Resizing a vector is better than resizing a matrix, so if we resize often at small dimensions (arguably the usual use-case) this will be faster. But to reach high |
Maybe the key then is for both bands to store the same cached QR factorization of sqrtw? This could be done I think if the cached qr(sqrtw) could be called in qr_jacobimatrix and it gets passed to both band constructors. |
That would work, yes. It wouldn't affect the factor 2 I spoke of since that's all the householder stuff we have to redo but it should lead to a speedup nonetheless. Should be a quick change. |
As a matter of fact, the Cholesky version already does this. So that's an oversight in the QR variant. |
I am somewhat surprised that this doesn't appear to make a measureable performance impact but it still makes sense to do it this way regardless (at least it should save memory). I will check tomorrow whether I missed something. But I checked by selectively expanding and they are definitely sharing the same QR now in the same way as the Cholesky approach has been doing. I guess maybe it wasn't a significant contribution in high N benchmarks since I pre-fill before the entry-expansion loop. |
There are extreme amounts of allocations happening, and you should never form a dense Householder matrix, instead only apply it to a vector/matrix. So I'd expect the timings could be significantly improved (but not urgent). |
I'll take another stab at reducing the cost in a bit. |
@dlfivefifty I am happy for this to be merged if all tests pass. |
Ok, I reworked it such that both bands are generated together while retaining the nice interaction with SymTridiagonal. This is a strange object so there may still be some stuff to tidy further but it all works at least. This hasn't affected the gap between R and Q, which is still about as narrow as shown above but it has given us an approximately 1.5x speed-up (it's less than 2x because we were already resizing a joint QR / Cholesky so the workload was already partially shared). Allocations have also been more than halved since the first working implementation. |
@dlfivefifty Some mind bending indexing later, this now just uses |
Not sure why 1.7 specifically fails. I guess I have to download that version and check what's happening there. |
Ok, so @dlfivefifty: apparently in 1.7 they changed The 1.7 version in general seems to have had loss of functionality, so I am considering just copying the 1.9 version explicitly into this code. Alternatively we can make some special case for 1.7 but I guess I am not a big fan of that. Your preference? |
Why not just make v1.9 required? That would mean we can start using extensions (it doesn't look like anything currently here can be moved to an extension but why not?) |
That's alright with me, I just didn't want to make a package-level decision like that for you. I'll change it to do that. 👍 |
ok, I changed it to ask for 1.9, to only test on 1.9 and upped the version number since I guess this requirement is a significant change. Other than that, I am once again happy with the state of the PR. ready for review or merge at your discretion assuming the tests pass |
I think its fine for now. There are still a few obvious allocations but its ok |
@ioannisPApapadopoulos @dlfivefifty
This PR adds an optional
method
argument toqr_jacobimatrix()
which allows either:Q
or:R
as input and will do the QR raising using the chosen matrix, both inO(n)
. The default if no method is supplied is to use:Q
.The Q method more or less matches the R method in efficiency (some efficiency is lost to make sure the resulting matrix fits into an adaptive SymTridiagonal but this is also true for the R approach) - there is more optimization that could be done in principle but I am not sure it's worth it considering we can do this:
Once this is done I will change my PR in SemiclassicalOPs to use this approach for the hierarchy.