Implement Q-based qrjacobimatrix() in O(n) #138

TSGut · 2023-05-22T13:46:15Z

This PR adds an optional method argument to qr_jacobimatrix() which allows either :Q or :R as input and will do the QR raising using the chosen matrix, both in O(n). The default if no method is supplied is to use :Q.

The Q method more or less matches the R method in efficiency (some efficiency is lost to make sure the resulting matrix fits into an adaptive SymTridiagonal but this is also true for the R approach) - there is more optimization that could be done in principle but I am not sure it's worth it considering we can do this:

julia> P = Normalized(legendre(0..1));

julia> x = axes(P,1);

julia> J = jacobimatrix(P);

julia> wf(x) = (1-x)^2;

julia> sqrtwf(x) = (1-x);

julia> Jchol = cholesky_jacobimatrix(wf, P);

julia> JqrQ = qr_jacobimatrix(sqrtwf, P);

julia> JqrR = qr_jacobimatrix(sqrtwf, P, :R);

julia> N = 100_000
100000

julia> @time Jchol[1:N,1:N]
  1.015375 seconds (13.59 M allocations: 988.733 MiB, 11.53% gc time)
100000×100000 BandedMatrix{Float64} with bandwidths (1, 1):
 0.25      0.193649   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
 0.193649  0.416667  0.225374   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅        0.225374  0.458333  0.236228   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅        0.236228  0.475     0.241209   ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅        0.241209  0.483333  0.243904   ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅        0.243904  0.488095  0.245525   ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅        0.245525  0.491071  0.246576   ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.246576  0.493056  0.247296   ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.247296  0.494444  0.24781      ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
 ⋮                                                 ⋮                                                ⋱                 ⋮                        ⋮                            ⋮                       
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5

julia> @time JqrQ[1:N,1:N]
  1.347002 seconds (21.48 M allocations: 1.476 GiB, 12.26% gc time)
100000×100000 BandedMatrix{Float64} with bandwidths (1, 1):
 0.25      0.193649   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
 0.193649  0.416667  0.225374   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅        0.225374  0.458333  0.236228   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅        0.236228  0.475     0.241209   ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅        0.241209  0.483333  0.243904   ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅        0.243904  0.488095  0.245525   ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅        0.245525  0.491071  0.246576   ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.246576  0.493056  0.247296   ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.247296  0.494444  0.24781      ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
 ⋮                                                 ⋮                                                ⋱                 ⋮                        ⋮                            ⋮                       
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5

julia> @time JqrR[1:N,1:N]
  1.179295 seconds (16.09 M allocations: 1.229 GiB, 11.27% gc time)
100000×100000 BandedMatrix{Float64} with bandwidths (1, 1):
 0.25      0.193649   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
 0.193649  0.416667  0.225374   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅        0.225374  0.458333  0.236228   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅        0.236228  0.475     0.241209   ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅        0.241209  0.483333  0.243904   ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅        0.243904  0.488095  0.245525   ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅        0.245525  0.491071  0.246576   ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.246576  0.493056  0.247296   ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.247296  0.494444  0.24781      ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
 ⋮                                                 ⋮                                                ⋱                 ⋮                        ⋮                            ⋮                       
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅   0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅       …   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅     ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25   ⋅ 
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5   0.25
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅           ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅     ⋅     ⋅     ⋅     ⋅     ⋅     ⋅    0.25  0.5

Once this is done I will change my PR in SemiclassicalOPs to use this approach for the hierarchy.

TSGut · 2023-05-22T14:04:37Z

@dlfivefifty Not sure what's going on here, I think the failing tests have nothing to do with my changes. Did something recently change further upstream in the dependencies?

dlfivefifty · 2023-05-22T14:09:54Z

JuliaArrays/FillArrays.jl#254

MikaelSlevinsky · 2023-05-22T14:29:58Z

Since we know the true result in this case, how is the 2-norm stability of each approach?

Some weight modifications preserve even-odd symmetries, like sqrtw(x) = 1-x^2. How does each approach fare for symmetry preservation?

TSGut · 2023-05-22T14:51:21Z

Let's see...

julia> Jclass = jacobimatrix(Normalized(jacobi(2,0,0..1)))
ℵ₀×ℵ₀ LazyBandedMatrices.SymTridiagonal{Float64, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(-), Tuple{BroadcastVector{Float64, typeof(/), Tuple{Float64, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}}}, Float64}}}}, BroadcastVector{Float64, typeof(sqrt), Tuple{BroadcastVector{Float64, typeof(*), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Int64, Int64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}}}}}}}} with indices OneToInf()×OneToInf():
 0.25      0.193649   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        …  
 0.193649  0.416667  0.225374   ⋅         ⋅         ⋅         ⋅         ⋅           
  ⋅        0.225374  0.458333  0.236228   ⋅         ⋅         ⋅         ⋅           
  ⋅         ⋅        0.236228  0.475     0.241209   ⋅         ⋅         ⋅           
  ⋅         ⋅         ⋅        0.241209  0.483333  0.243904   ⋅         ⋅           
  ⋅         ⋅         ⋅         ⋅        0.243904  0.488095  0.245525   ⋅        …  
  ⋅         ⋅         ⋅         ⋅         ⋅        0.245525  0.491071  0.246576     
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.246576  0.493056     
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.247296     
 ⋮                                                 ⋮                             ⋱ 

julia> N = 100_000
100000

julia> norm(Jchol[1:N,1:N]-Jclass[1:N,1:N],2)
4.847754503646846e-7

julia> norm(JqrQ[1:N,1:N]-Jclass[1:N,1:N],2)
7.385957369779282e-14

julia> norm(JqrR[1:N,1:N]-Jclass[1:N,1:N],2)
1.5291452704376217e-13

Maybe ever so slightly better to do Q than R in terms of stability? But given the roughly equivalent cost (slightly larger constant in Q method I think due to constructing the givens / householder matrices) I don't see a reason not to use this. Keep in mind this comparison is unfair to Cholesky since it goes directly here rather than step by step where it performs a lot closer to the QR method.

Here is another example (again, where I do the not-ideal thing of going directly instead of step by step) which shows similar behavior:

julia> wf(x) = x^2*(1-x)^2;

julia> sqrtwf(x) = x*(1-x);

julia> Jchol = cholesky_jacobimatrix(wf, P);

julia> JqrQ = qr_jacobimatrix(sqrtwf, P);

julia> JqrR = qr_jacobimatrix(sqrtwf, P, :R);

julia> Jclass = jacobimatrix(Normalized(jacobi(2,2,0..1)))
ℵ₀×ℵ₀ LazyBandedMatrices.SymTridiagonal{Float64, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(-), Tuple{BroadcastVector{Float64, typeof(/), Tuple{Float64, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}}}, Float64}}}}, BroadcastVector{Float64, typeof(sqrt), Tuple{BroadcastVector{Float64, typeof(*), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Int64, Int64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}}}}}}}} with indices OneToInf()×OneToInf():
 0.5       0.188982   ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        …  
 0.188982  0.5       0.218218   ⋅         ⋅         ⋅         ⋅         ⋅           
  ⋅        0.218218  0.5       0.230283   ⋅         ⋅         ⋅         ⋅           
  ⋅         ⋅        0.230283  0.5       0.236525   ⋅         ⋅         ⋅           
  ⋅         ⋅         ⋅        0.236525  0.5       0.240192   ⋅         ⋅           
  ⋅         ⋅         ⋅         ⋅        0.240192  0.5       0.242536   ⋅        …  
  ⋅         ⋅         ⋅         ⋅         ⋅        0.242536  0.5       0.244126     
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.244126  0.5          
  ⋅         ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.245256     
 ⋮                                                 ⋮                             ⋱  

julia> N = 100_000
100000

julia> norm(Jchol[1:N,1:N]-Jclass[1:N,1:N],2)
8.88926155041543e-5

julia> norm(JqrQ[1:N,1:N]-Jclass[1:N,1:N],2)
7.653723227459066e-11

julia> norm(JqrR[1:N,1:N]-Jclass[1:N,1:N],2)
2.2962555778506003e-10

What would you like to see as a test of symmetry? I think currently Sheehan's ConvertedOrthogonalPolynomial always uses Cholesky (which makes sense since it's more universal) so proper hierarchy testing may have to wait until we are in the SemiclassicalOPs context.

dlfivefifty · 2023-05-22T19:13:19Z

I can't decide whether to ask why the errors are so large in the second example or so small in the first 😅 Since if round-off grows like N we expect errors on the order of:

julia> eps() * 100_000
2.220446049250313e-11

TSGut · 2023-05-22T19:38:20Z

I don't know. 😅 But actually I think the first is more representative of the general behavior, not the second. In the second I'm doing x and 1-x in one step, which is worse than doing them one at a time. In the semiclassical hierarchy we'd be doing things like in the first example.

TSGut · 2023-05-22T19:58:26Z

Just to back up my claim with code here is an example of raising by the other weight term in one step, at a higher point in the Jacobi family. If we do a single step we see the errors observed in the first example.

julia> P = Normalized(jacobi(3,3,0..1));

julia> x = axes(P,1);

julia> J = jacobimatrix(P);

julia> sqrtwf(x) = x;

julia> JqrQ = qr_jacobimatrix(sqrtwf, P);

julia> JqrR = qr_jacobimatrix(sqrtwf, P, :R);

julia> N = 100_000
100000
julia> Jclass = jacobimatrix(Normalized(jacobi(3,5,0..1)))
ℵ₀×ℵ₀ LazyBandedMatrices.SymTridiagonal{Float64, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(-), Tuple{BroadcastVector{Float64, typeof(/), Tuple{Float64, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}}}, Float64}}}}, BroadcastVector{Float64, typeof(sqrt), Tuple{BroadcastVector{Float64, typeof(*), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}, ApplyArray{Float64, 1, typeof(vcat), Tuple{Float64, BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(/), Tuple{BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Int64, Int64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(*), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}}}, Float64}}}}}}}}} with indices OneToInf()×OneToInf():
 0.6      0.14771    ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        …  
 0.14771  0.566667  0.184374   ⋅         ⋅         ⋅         ⋅         ⋅           
  ⋅       0.184374  0.547619  0.203579   ⋅         ⋅         ⋅         ⋅           
  ⋅        ⋅        0.203579  0.535714  0.215229   ⋅         ⋅         ⋅           
  ⋅        ⋅         ⋅        0.215229  0.527778  0.222909   ⋅         ⋅           
  ⋅        ⋅         ⋅         ⋅        0.222909  0.522222  0.228266   ⋅        …  
  ⋅        ⋅         ⋅         ⋅         ⋅        0.228266  0.518182  0.232161     
  ⋅        ⋅         ⋅         ⋅         ⋅         ⋅        0.232161  0.515152     
  ⋅        ⋅         ⋅         ⋅         ⋅         ⋅         ⋅        0.235087     
 ⋮                                                ⋮                             ⋱  

julia> norm(JqrQ[1:N,1:N]-Jclass[1:N,1:N],2)
6.784308695781622e-14

julia> norm(JqrR[1:N,1:N]-Jclass[1:N,1:N],2)
1.619843167289299e-13

Same behavior in terms of norms.

TSGut · 2023-05-22T20:45:29Z

Ok actually the constant between the Q and R method is more substantial than the simple timer was suggesting because of an implementation detail in how cached vectors are expanded.

Here are two CPU timings showing the actual linear complexity in both methods with current implementation.

I prematurely stopped optimizing the Q method because I thought it was matching R but with the difference being this big (roughly 5x the cpu time) it is worth taking another shot at bringing the cost down. I have a few implementation ideas that should do it based on profiling them, so it should end up closer together than it is now.

TSGut · 2023-05-22T22:07:38Z

Factor 2 speed-up bringing Q closer to R method. Let's see if there is anything left to do (which doesn't involve new custom struct to replace SymTridiagonal which is always an option).

codecov · 2023-05-22T22:23:14Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.22 🎉

Comparison is base (a143ef5) 89.69% compared to head (765d0e2) 89.91%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #138      +/-   ##
==========================================
+ Coverage   89.69%   89.91%   +0.22%     
==========================================
  Files          17       17              
  Lines        1756     1795      +39     
==========================================
+ Hits         1575     1614      +39     
  Misses        181      181

Impacted Files	Coverage Δ
src/ClassicalOrthogonalPolynomials.jl	`86.60% <ø> (ø)`
src/choleskyQR.jl	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

TSGut · 2023-05-22T23:20:54Z

Okay, I will leave it for now, I think the difference in cost has become acceptable considering the Q variant has to construct parts of the Householder matrices. The final comparison for a raising from Legendre looks like this.

Notably raising in some random Jacobi basis seems to put the two methods closer together, so maybe R has some additional advantages due to how Legendre is implemented somewhere. Here is a more general Jacobi example:

In any case they are both easily fast enough to go far beyond 100k entries so we probably don't need to worry too much. I just didn't like the ~5x cost, <1.5x is something I can stomach given that the Q approach is more involved.

MikaelSlevinsky · 2023-05-22T23:49:33Z

Let's say it's fast enough! There's probably a hard theoretical limit where the Q method is some specific constant slower than the R method.

Are you computing both bands of the modified Jacobi matrix independently? Looks like some code specializes on :dv and :ev.

TSGut · 2023-05-23T00:02:25Z

Yes but R and Q both do this, which makes both of them at minimum a factor 2 slower than their theoretical limit since they have to redo the computations for each band. It's a frustrating consequence of sticking to SymTridiagonal. The reason for that is that we want this to be a cached SymTridiagonal which takes two bands as input. To be consistent with SymTridiagonal , Julia needs to know how to extend each band individually and since each band is its own object it can't tell Julia to simultaneously extend the other one. The two bands of a SymTridiagonal are unaware of each other.

The way around this would be to make a new struct as a subset of AbstractCachedArray which then gets extended as a whole entity. This would be a factor 2 speed-up for both methods and I actually had it set up that way before but after Sheehan's feedback in the previous PR we settled on it being a SymTridiagonal.

This would be easy to do if we want to tease out more efficiency but it would "fit" less well into existing packages which expect SymTridiagonal for a jacobimatrix.

TSGut · 2023-05-23T00:10:10Z

I should add: There are certain advantages to the current approach. Resizing a vector is better than resizing a matrix, so if we resize often at small dimensions (arguably the usual use-case) this will be faster. But to reach high N in one swoop as in the above benchmarks, the current approach underperforms the theoretical limit by a factor 2.

MikaelSlevinsky · 2023-05-23T01:15:18Z

Maybe the key then is for both bands to store the same cached QR factorization of sqrtw? This could be done I think if the cached qr(sqrtw) could be called in qr_jacobimatrix and it gets passed to both band constructors.

TSGut · 2023-05-23T01:31:34Z

That would work, yes. It wouldn't affect the factor 2 I spoke of since that's all the householder stuff we have to redo but it should lead to a speedup nonetheless. Should be a quick change.

TSGut · 2023-05-23T01:33:25Z

As a matter of fact, the Cholesky version already does this. So that's an oversight in the QR variant.

TSGut · 2023-05-23T01:49:32Z

I am somewhat surprised that this doesn't appear to make a measureable performance impact but it still makes sense to do it this way regardless (at least it should save memory). I will check tomorrow whether I missed something. But I checked by selectively expanding and they are definitely sharing the same QR now in the same way as the Cholesky approach has been doing. I guess maybe it wasn't a significant contribution in high N benchmarks since I pre-fill before the entry-expansion loop.

dlfivefifty · 2023-05-23T12:19:44Z

There are extreme amounts of allocations happening, and you should never form a dense Householder matrix, instead only apply it to a vector/matrix. So I'd expect the timings could be significantly improved (but not urgent).

TSGut · 2023-05-23T12:26:27Z

I'll take another stab at reducing the cost in a bit.

TSGut · 2023-05-23T14:57:09Z

Ok, so I of course wasn't computing dense householder matrices per se, I was computing the dense acting block of the householder which is an O(b) where b=bandwidths(sqrtW) operation and allocation, so it's cheap.

I have benchmarked it now and pre-allocating the b x b part of H (without ever densely computing H) is faster for computing all the required entries in the block H*M*H where M is a O(b) x O(b) block that moves down the band as we expand. The reason it's cheaper to pre-allocate the important part of H I think is that writing it all out the action of H*M*H is something like (M-vv'M-Mvv'+vv'Mvv' (I am omitting the tau constants for simplicity). And we really do need all those entries, not just one of them, to be adaptive. Doing it in this way unnecessarily recomputes the vv' repeatedly which in benchmarks costs more than just pre-computing it and re-using it (and this is equivalent to computing the acting block of H). So I don't think this path leads to any more speed-ups. To be clear, the v, M and H are all just of sizes b or b x b so there is very little to gain here either way. But maybe there is another special case thing here that I am missing (it's clear to me that one-sided H application can be sped up this way, just not sure on two-sided, hence why I benchmarked it).

Nevertheless, this tip did lead me to find an operation that was O(n*b^2) for no reason and reduce it to O(n*b), so Q is even closer to R now. The general Jacobi case is now strikingly similar:

I think it's fair to say they now perform de-facto the same. In any case these optimizations are mainly for vanity I think, or more charitably so that we can say in the paper that the two methods perform similarly well.

TSGut · 2023-05-23T14:57:47Z

@dlfivefifty I am happy for this to be merged if all tests pass.

src/choleskyQR.jl

TSGut · 2023-05-27T16:18:03Z

Ok, I reworked it such that both bands are generated together while retaining the nice interaction with SymTridiagonal. This is a strange object so there may still be some stuff to tidy further but it all works at least.

This hasn't affected the gap between R and Q, which is still about as narrow as shown above but it has given us an approximately 1.5x speed-up (it's less than 2x because we were already resizing a joint QR / Cholesky so the workload was already partially shared). Allocations have also been more than halved since the first working implementation.

src/choleskyQR.jl

Co-authored-by: Sheehan Olver <[email protected]>

TSGut · 2023-06-07T10:02:00Z

@dlfivefifty Some mind bending indexing later, this now just uses reflectorApply!. I don't see any meaningful performance / allocation improvements (but to be fair also no deprecations, it's pretty much the same). But as said before the Q approach did dip under the R approach in allocations so perhaps there just isn't much more to be gained. Perhaps there is also some new sneaky allocation somewhere. I will take my eyes off it for a bit and have another look later today.

TSGut · 2023-06-07T11:25:58Z

Not sure why 1.7 specifically fails. I guess I have to download that version and check what's happening there.

TSGut · 2023-06-07T12:16:53Z

Ok, so @dlfivefifty: apparently in 1.7 they changed reflectorApply!() to only accept matrices, i.e. it's function reflectorApply!(x::AbstractVector, τ::Number, A::AbstractMatrix). This was then reverted at some point, in 1.9 at least it reads function reflectorApply!(x::AbstractVector, τ::Number, A::AbstractVecOrMat).

The 1.7 version in general seems to have had loss of functionality, so I am considering just copying the 1.9 version explicitly into this code. Alternatively we can make some special case for 1.7 but I guess I am not a big fan of that.

Your preference?

dlfivefifty · 2023-06-08T09:29:03Z

Why not just make v1.9 required?

That would mean we can start using extensions (it doesn't look like anything currently here can be moved to an extension but why not?)

TSGut · 2023-06-08T09:30:12Z

That's alright with me, I just didn't want to make a package-level decision like that for you. I'll change it to do that. 👍

TSGut · 2023-06-08T10:31:49Z

ok, I changed it to ask for 1.9, to only test on 1.9 and upped the version number since I guess this requirement is a significant change.

Other than that, I am once again happy with the state of the PR. ready for review or merge at your discretion assuming the tests pass

dlfivefifty · 2023-06-08T12:33:42Z

I think its fine for now. There are still a few obvious allocations but its ok

implement Q based qrjacobimatrix in O(n)

75ff7ab

fix test name

ea62ec6

factor 2 spee-up for Q method

3dd9a33

more optimisations (final for now)

c7ca9a3

same QR in both bands. affects both R and Q method

2f7b7e5

reduce cost slightly

041d844

dlfivefifty requested changes May 24, 2023

View reviewed changes

src/choleskyQR.jl Outdated Show resolved Hide resolved

dlfivefifty requested changes May 24, 2023

View reviewed changes

TSGut added 6 commits May 26, 2023 15:17

write out the dot products in R method

703e867

view instead of getindex to allocations

f995f20

remove more allocations

a0ca0be

switch to in-place householder

2817577

bugfix for changes in R method

8c874ae

generate bands in tandem but also keep them as vectors

ad8c5b5

TSGut requested a review from dlfivefifty May 27, 2023 16:19

dlfivefifty requested changes Jun 5, 2023

View reviewed changes

src/choleskyQR.jl Outdated Show resolved Hide resolved

src/choleskyQR.jl Outdated Show resolved Hide resolved

src/choleskyQR.jl Outdated Show resolved Hide resolved

TSGut and others added 7 commits June 5, 2023 11:50

Update src/choleskyQR.jl

654b6df

Co-authored-by: Sheehan Olver <[email protected]>

remove redundant dv = J.dv

8847388

steps towards lowering allocations

81055e1

reduce allocations again

014b470

clarify minor comment

17741bc

use reflectorApply!

651536a

removal of redundant computation

b14ffef

replace reflectorapply! with inplaceHouseholder!

d1c6435

TSGut added 4 commits June 8, 2023 11:13

prep for 1.9 fix

ee5f0aa

Update Project.toml

b0c67b4

Update ci.yml

72c8125

Update Project.toml

765d0e2

dlfivefifty approved these changes Jun 8, 2023

View reviewed changes

dlfivefifty merged commit 526dc6d into JuliaApproximation:main Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Q-based qrjacobimatrix() in O(n) #138

Implement Q-based qrjacobimatrix() in O(n) #138

TSGut commented May 22, 2023 •

edited

Loading

TSGut commented May 22, 2023

dlfivefifty commented May 22, 2023

MikaelSlevinsky commented May 22, 2023

TSGut commented May 22, 2023

dlfivefifty commented May 22, 2023

TSGut commented May 22, 2023

TSGut commented May 22, 2023

TSGut commented May 22, 2023

TSGut commented May 22, 2023

codecov bot commented May 22, 2023 •

edited

Loading

TSGut commented May 22, 2023 •

edited

Loading

MikaelSlevinsky commented May 22, 2023

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

MikaelSlevinsky commented May 23, 2023

TSGut commented May 23, 2023

TSGut commented May 23, 2023

TSGut commented May 23, 2023 •

edited

Loading

dlfivefifty commented May 23, 2023

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023

TSGut commented May 27, 2023 •

edited

Loading

TSGut commented Jun 7, 2023

TSGut commented Jun 7, 2023

TSGut commented Jun 7, 2023

dlfivefifty commented Jun 8, 2023

TSGut commented Jun 8, 2023

TSGut commented Jun 8, 2023

dlfivefifty commented Jun 8, 2023

Implement Q-based qrjacobimatrix() in O(n) #138

Implement Q-based qrjacobimatrix() in O(n) #138

Conversation

TSGut commented May 22, 2023 • edited Loading

TSGut commented May 22, 2023

dlfivefifty commented May 22, 2023

MikaelSlevinsky commented May 22, 2023

TSGut commented May 22, 2023

dlfivefifty commented May 22, 2023

TSGut commented May 22, 2023

TSGut commented May 22, 2023

TSGut commented May 22, 2023

TSGut commented May 22, 2023

codecov bot commented May 22, 2023 • edited Loading

Codecov Report

TSGut commented May 22, 2023 • edited Loading

MikaelSlevinsky commented May 22, 2023

TSGut commented May 23, 2023 • edited Loading

TSGut commented May 23, 2023 • edited Loading

MikaelSlevinsky commented May 23, 2023

TSGut commented May 23, 2023

TSGut commented May 23, 2023

TSGut commented May 23, 2023 • edited Loading

dlfivefifty commented May 23, 2023

TSGut commented May 23, 2023 • edited Loading

TSGut commented May 23, 2023 • edited Loading

TSGut commented May 23, 2023

TSGut commented May 27, 2023 • edited Loading

TSGut commented Jun 7, 2023

TSGut commented Jun 7, 2023

TSGut commented Jun 7, 2023

dlfivefifty commented Jun 8, 2023

TSGut commented Jun 8, 2023

TSGut commented Jun 8, 2023

dlfivefifty commented Jun 8, 2023

TSGut commented May 22, 2023 •

edited

Loading

codecov bot commented May 22, 2023 •

edited

Loading

TSGut commented May 22, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 23, 2023 •

edited

Loading

TSGut commented May 27, 2023 •

edited

Loading