Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add direct calls to BLAS to compute SVDs #1259

Merged
merged 5 commits into from
Jun 6, 2024

Conversation

ronisbr
Copy link
Contributor

@ronisbr ronisbr commented May 29, 2024

This PR implements direct calls to BLAS library for computing SVDs, reducing the allocations if the input types are SMatrix ou MMatrix with real numbers.

The approach is the convert the input matrix to MMatrix, call BLAS function, and convert it back to the input type. Since MMatrix does not leave the function scope, the compiler is clever enough to avoid allocations.

With this PR, we can call svdvals, svd, and pinv with 0 allocations.

The current implementation of svd and svdvals converts the static matrix to Matrix and call LinearAlgebra.svd. Here, we are calling the same BLAS function as in LinearAlgebra stdlib. Hence, the performance gain here seems "free" except for maintaining new code inside StaticArrays.

This PR does not add any noticeable delay when importing StaticArrays:

Before: 0.126202 seconds (253.52 k allocations: 13.714 MiB)
After: 0.128506 seconds (253.58 k allocations: 13.751 MiB)

Here are some benchmarks comparing the new and old implementation:

svd

Dimension SMatrix (old) SMatrix (new) MMatrix (old) MMatrix (new)
2 0.911 us (8 allocations) 0.441 us (0 allocations) 0.907 us (11 allocations) 0.450 us (3 allocations)
3 1.328 us (8 allocations) 0.875 us (0 allocations) 1.504 us (11 allocations) 1.042 us (3 allocations)
4 2.102 us (8 allocations) 1.562 us (0 allocations) 2.250 us (11 allocations) 1.721 us (3 allocations)
5 2.893 us (8 allocations) 2.343 us (0 allocations) 2.847 us (11 allocations) 2.292 us (3 allocations)
6 3.651 us (8 allocations) 3.026 us (0 allocations) 4.054 us (11 allocations) 3.432 us (3 allocations)

We also have a performance boost when computing pinv because it basically calls svd. For example, computing pinv in a 3 x 3 matrix takes 0.995 us in this commit versus 1.462 us in master.

Closes #1255

Copy link
Member

@fredrikekre fredrikekre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put these these in a separate file like BLAS.jl or something?

src/svd.jl Outdated Show resolved Hide resolved
@mateuszbaran
Copy link
Collaborator

Thank you! The code looks generally fine to me but we still want to support Julia 1.6 here, which does not have libblastrampoline -- you need to call LAPACK methods without it on Julia 1.6.

@fredrikekre
Copy link
Member

fredrikekre commented May 31, 2024

I don't know how difficult it is to support both with and without libblastrampoline, but alternatively we could just wait with merging this until there is a new Julia LTS version out (assuming @ronisbr 's immediate needs are fulfilled by https://github.com/ronisbr/StaticArraysBlasInterfaces.jl).

@mateuszbaran
Copy link
Collaborator

A very simple change to merge this soon would be to just conditionally load blas.jl:

@static if VERSION >= v"1.7"
    include("blas.jl")
end

and import libblastrampoline in that file. We don't have to make SVD calls faster on Julia 1.6.

@ronisbr
Copy link
Contributor Author

ronisbr commented Jun 2, 2024

Hi! Sorry, I missed the notifications!

Yes, I am totally fine to allow this improvement only for Julia >= 1.7. I will make that change and submit a new commit.

Thanks!

@ronisbr
Copy link
Contributor Author

ronisbr commented Jun 5, 2024

Done @mateuszbaran !

Copy link
Collaborator

@mateuszbaran mateuszbaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you just bump version to 1.9.5?

@ronisbr
Copy link
Contributor Author

ronisbr commented Jun 6, 2024

Sure!

@ronisbr
Copy link
Contributor Author

ronisbr commented Jun 6, 2024

@mateuszbaran Done!

@mateuszbaran mateuszbaran merged commit 609aa34 into JuliaArrays:master Jun 6, 2024
19 of 24 checks passed
@ronisbr ronisbr deleted the blas_interface branch June 6, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

svd allocates even with small matrices
3 participants