Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: faster kronecker product between hermitian and symmetric matrices #53186

Merged
merged 10 commits into from
Apr 18, 2024
Merged

stdlib: faster kronecker product between hermitian and symmetric matrices #53186

merged 10 commits into from
Apr 18, 2024

Conversation

araujoms
Copy link
Contributor

@araujoms araujoms commented Feb 5, 2024

The kronecker product between complex hermitian matrices is again hermitian, so it can be computed much faster by only doing the upper (or lower) triangular. As @andreasnoack will surely notice, this only true for types where conj(a*b) == conj(a)*conj(b), so I'm restricting the function to act only on real and complex numbers. In the symmetric case, however, no additional assumption is needed, so I'm letting it act on anything.

Benchmarking showed that the code is roughly 2 times as fast as the vanilla kronecker product, as expected. The fastest case was always the UU case, and the slowest the LU case. The code I used is below

using LinearAlgebra
using BenchmarkTools
using Quaternions

randrmatrix(d, uplo = :U) = Hermitian(randn(Float64, d, d), uplo)
randcmatrix(d, uplo = :U) = Hermitian(randn(ComplexF64, d, d), uplo)
randsmatrix(d, uplo = :U) = Symmetric(randn(ComplexF64, d, d), uplo)
randqmatrix(d, uplo = :U) = Symmetric(randn(QuaternionF64, d, d), uplo)

dima = 69
dimb = 71
for randmatrix in [randrmatrix, randcmatrix, randsmatrix, randqmatrix]
    for auplo in [:U, :L]
        for buplo in [:U, :L]
            a = randmatrix(dima, auplo)
            b = randmatrix(dimb, buplo)
            c = kron(a,b)
            therm = @belapsed kron!($c, $a, $b)
            C = Matrix(c)
            A = Matrix(a)
            B = Matrix(b)
            told = @belapsed kron!($C, $A, $B)
            @show told/therm
        end
    end
end

Weirdly enough, I got this expected speedup in one of my machines, but when running the benchmark in another I got roughly the same time. I guess that's a bug with BechmarkTools, because that's not consistent with the times I get running the functions individually, out of the loop.

Another issue is that although I added a couple of tests, I couldn't get them to run. Perhaps someone here can tell me what's going on? I could run the tests from LinearAlgebra, it's just that editing the files made no difference to what was being run. I did get hundreds of errors from triangular.jl, but that's untouched by my code.

@barucden
Copy link
Contributor

barucden commented Feb 5, 2024

I could run the tests from LinearAlgebra, it's just that editing the files made no difference to what was being run.

Did you change the UUID in Project.toml (https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#contributing-to-the-standard-library)?

@araujoms
Copy link
Contributor Author

araujoms commented Feb 5, 2024

Yes, I did. My code runs, I've confirmed it by doing @edit kron(a,b), only the tests get mysteriously ignored.

@jishnub
Copy link
Contributor

jishnub commented Feb 5, 2024

Are you running the tests from LinearAlgebra.jl, based on you changing the UUID? That repo isn't being updated at present. You should build Julia with your changes, and use include("stdlib/LinearAlgebra/test/symmetric.jl") to run the tests. Usually building Julia isn't necessary as well if you're using Revise, as you may use Revise.track(LinearAlgebra) to include your local changes, but I've encountered some issues with Revise on Julia nightly, so a fresh build might be better.

return kron!(R, A, B)
end

for (T, conj, real) in [(:Symmetric, :identity, :identity), (:(Hermitian{<:Union{Real,Complex}}), :conj, :real)]
Copy link
Contributor

@jishnub jishnub Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method should probably be restricted to Hermitian{<:Union{Real,Complex}, <StridedMatrix} and similarly for Symmetric, and a fallback implementation of the form kron!(C.data, A.data, B.data) may be added. This will ensure that sparse arrays (or other special array types) remain performant, if they provide a kron! implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forgive me my ignorance as I'm rather new to Julia. You are saying I should change the function declarations to

function kron(A::Hermitian{T, <:StridedMatrix}, B::Hermitian{S, <:StridedMatrix}) where {T<:Union{Real,Complex},S<:Union{Real,Complex}}
function kron(A::Symmetric{T, <:StridedMatrix}, B::Symmetric{S, <:StridedMatrix}) where {T,S}

in order to act only on dense matrices, not sparse ones. Fine, I get this. But then I should also give a fallback implementation, that only acts if the sparse type doesn't provide their own kron!? How? I thought the original function declaration was already doing this, as a more specialized function declaration for a specific type would take precedence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm recommending

function kron!(C::Hermitian{TC, <:StridedMatrix}, A::Hermitian{T, <:StridedMatrix}, B::Hermitian{S, <:StridedMatrix}) where {T<:Union{Real,Complex},S<:Union{Real,Complex},TC<:Union{Real,Complex}}

as the signature of the method that is added here, and add another method

kron!(C::Hermitian{<:Union{Real,Complex}}, A::Hermitian{<:Union{Real,Complex}}, B::Hermitian{<:Union{Real,Complex}}) = Hermitian(kron!(C.data, A.data, B.data))

If the parent has an efficient kron! implementation, this would not loop over the entire array. This way, custom array types only need to specialize kron! for their own matrix types, and not for wrappers such as Hermitian{<:Union{Real,Complex}, MyArray}.

You're right that the original method already provides a more efficient implementation than the fully generic dense-array implementation that loops over the entire A and B. However, forwarding kron! to the parents allows us to access even more optimized methods that packages may provide. Ultimately, it's a tradeoff. You're right that a more specialized method kron!(::Hermitian{<:Union{Real,Complex}}, ::Hermitian{<:Union{Real,Complex}, MyArray}, ::Hermitian{<:Union{Real,Complex}, MyArray}) would take precedence, but it's better if packages don't need to define such methods for wrappers around their arrays, as that greatly increases the total number of methods that need to be defined. If we forward the operation to the parent, then the parent does not need to know about Hermitian at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the issue is that a generic Kronecker product over sparse arrays will still be vastly more efficient for Hermitian matrices than my dense version, so we don't want to capture that. Of course, such a sparse type can also implement its own specialization for Hermitian matrices, but that's a different issue.

The best I could come up with was

for T in [:Symmetric, :(Hermitian{<:Union{Real,Complex}})]
    @eval begin
        function kron!(C::$T, A::$T, B::$T)
            wrapper = Base.typename($T).wrapper
            return wrapper(kron!(C.data, A.data, B.data))
        end
    end
end

for the fallback and

for (T, conj, real) in [(:(Symmetric{<:Any,<:StridedMatrix}), :identity, :identity), (:(Hermitian{<:Union{Real,Complex}, <:StridedMatrix}), :conj, :real)]
    @eval begin
        function kron!(C::$T, A::$T, B::$T)

for the signature of the method itself. It's rather hideous. Is there a better solution?

Also, doesn't exactly the same issue applies to dot? Is there any package actually implementing faster versions of dot and kron for sparse arrays?

Copy link
Contributor

@jishnub jishnub Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take my comment back, as kron forwarded to the parent doesn't produce the same result:

julia> H1 = Hermitian(rand(2,2));

julia> H2 = Hermitian(rand(2,2));

julia> kron(H1, H2)
4×4 Matrix{Float64}:
 0.343295  0.436362  0.106336   0.135163
 0.436362  0.717831  0.135163   0.222349
 0.106336  0.135163  0.0616077  0.0783094
 0.135163  0.222349  0.0783094  0.128822

julia> Hermitian(kron(H1.data, H2.data))
4×4 Hermitian{Float64, Matrix{Float64}}:
 0.343295  0.436362  0.106336   0.135163
 0.436362  0.717831  0.219534   0.222349
 0.106336  0.219534  0.0616077  0.0783094
 0.135163  0.222349  0.0783094  0.128822

I think the current implementation is fine.

Copy link
Contributor

@jishnub jishnub Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do the symmetric/hermitian for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I wasted a lot of time doing the triangular case only because @dkarrasch asked. I have no use for it. Next time please only ask for features you actually want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I'm in no way suggesting that we abandon the work if you've implemented it already. My impression was otherwise. Please continue with your work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Well, it's done already, I've already pushed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well?

@araujoms
Copy link
Contributor Author

araujoms commented Feb 5, 2024

Well yes, I was following the instructions for contributing to a standard library, not to base Julia. In any case it worked now, I've changed back the UUID, built Julia again, now the tests run (and pass).

@ViralBShah
Copy link
Member

@araujoms Is this PR ready from your perspective?

@araujoms
Copy link
Contributor Author

Yes.

@ViralBShah
Copy link
Member

I'll wait for @jishnub to chime in and hope to get this merged next week. Thanks!

@araujoms
Copy link
Contributor Author

Oh, ok, thanks a lot.

Copy link
Contributor

@jishnub jishnub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, but we would want to add tests for matrix-valued matrices (which is broken on this PR currently), and also for matrices where the parent isn't initialized corresponding to the structural zeros (which works, but doesn't seem to be tested for).
E.g.:

julia> M = Matrix{BigInt}(undef,2,2);

julia> M[1,1] = M[2,2] = M[1,2] = 2;

julia> U = UpperTriangular(M);

julia> kron(U, U)
4×4 Matrix{BigInt}:
 4  4  4  4
 0  4  0  4
 0  0  4  4
 0  0  0  4

Some such tests would prevent future breakages.

stdlib/LinearAlgebra/src/triangular.jl Show resolved Hide resolved
@araujoms araujoms requested a review from jishnub March 3, 2024 10:08
@araujoms
Copy link
Contributor Author

araujoms commented Mar 3, 2024

I've added the tests you asked and removed the recursive matrices from consideration.

@araujoms
Copy link
Contributor Author

Gentle bump.

@oscardssmith oscardssmith added the performance Must go faster label Mar 22, 2024
@araujoms
Copy link
Contributor Author

araujoms commented Apr 4, 2024

Well?

@araujoms
Copy link
Contributor Author

Please?

@ViralBShah
Copy link
Member

@jishnub The merging is blocked. I believe your review is holding it back (and perhaps it just needed to be marked as resolved). Can this be merged?

Copy link
Contributor

@jishnub jishnub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, looks good to me

@jishnub jishnub merged commit c741bd3 into JuliaLang:master Apr 18, 2024
7 checks passed
@araujoms araujoms deleted the hermkron branch April 18, 2024 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants