-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymmetric speed of in-place sparse*dense
matrix product
#29956
Comments
Actually even more speedup can be achieved for dense*sparse by proper arrangement of loops. Using the same setup: julia> using BenchmarkTools, SparseArrays, LinearAlgebra
julia> A = sprand(100,100,0.01);
julia> B = rand(100,100);
julia> C = A*B; proposed function: import LinearAlgebra.mul!
function mul!(C::StridedMatrix, X::StridedMatrix, A::SparseMatrixCSC)
mX, nX = size(X)
nX == A.m || throw(DimensionMismatch())
fill!(C, zero(eltype(C)))
rowval = A.rowval
nzval = A.nzval
@inbounds for multivec_row=1:mX, col = 1:A.n, k=A.colptr[col]:(A.colptr[col+1]-1)
C[multivec_row, col] += X[multivec_row, rowval[k]] * nzval[k]
end
C
end gives: julia> @btime mul!($C,$B,$A);
21.778 μs (0 allocations: 0 bytes) while: function mul_alt!(C::StridedMatrix, X::StridedMatrix, A::SparseMatrixCSC)
mX, nX = size(X)
nX == A.m || throw(DimensionMismatch())
fill!(C, zero(eltype(C)))
rowval = A.rowval
nzval = A.nzval
@inbounds for col = 1:A.n, k=A.colptr[col]:(A.colptr[col+1]-1)
ki=rowval[k]
kv=nzval[k]
for multivec_row=1:mX
C[multivec_row, col] += X[multivec_row, ki] * kv
end
end
C
end gives: @btime mul_alt!($C,$B,$A);
4.624 μs (0 allocations: 0 bytes) |
Ditto!
Basically, yes :) As you can see in my first post the current performance is pretty bad and the suggested fix is simple and effective (~200x speedup). If your PR is overriding this any time soon though it might not be worth the effort, that's why I'm asking. It'd be great to see a revival of #24045, of course :) |
@crstnbr There is a discussion JuliaLang/LinearAlgebra.jl#473 for adding an API for "combined multiply-add" C = αAB + βC and I wrote a PR #29634 for this. If you are going to write a new method (which BTW sounds like a great addition!), it'd be nice if it uses this API. |
(A little update: I ended up working through the holiday, so likely it'll be a few more weeks. Best! S) |
Hey @crstnbr! :) Regrettably reality has sunk in: I will not have any bandwidth to work on things Julia for the foreseeable future. That being the case, it would be fantastic to see someone else push #24045 over the line. Formerly all that was necessary was transforming method signatures to use |
Has the original issue been solved? julia> @btime $C = $A*$B;
22.556 μs (2 allocations: 78.20 KiB)
julia> @btime $C = $B*$A;
29.979 μs (2 allocations: 78.20 KiB)
julia> @btime mul!($C,$A,$B);
19.144 μs (0 allocations: 0 bytes)
julia> @btime mul!($C,$B,$A);
21.595 μs (0 allocations: 0 bytes) |
Yes, multiplication is not falling back to any generic matmul method, but to julia/stdlib/SparseArrays/src/linalg.jl Line 131 in d2daa81
|
First reported here.
This asymmetry of performance, which already existed on 0.6, is not (just) due to CSC format of sparse matrices but because
mul!
falls back to the most generic method inLinearAlgebra
.It can be fixed (most simply) by copying the
Base.:*
method and adjusting it to be amul!
version:which gives
Note that PR #24045 might fix this (haven't looked into it in detail). However, since this PR is sort of lying around for over a year now, maybe we should add an intermediate fix?
The text was updated successfully, but these errors were encountered: