Multiplication of sparse matrices with diagonal matrices is slow (fix suggested) #28934

Pbellive · 2018-08-28T19:23:34Z

Hi all,
It's great to see all the work that's gone into the linear algebra support for 1.0! I've recently noticed one small area where I think significant performance is being left on the table. Namely, multiplication of sparse matrices with diagonal matrices. This was also mentioned last month on discourse. Multiplication between diagonal matrices and sparse matrices is currently defined at:

julia/stdlib/SparseArrays/src/linalg.jl

Lines 129 to 136 in 255030e

    
           function (*)(D::Diagonal, A::SparseMatrixCSC) 
        
               T = Base.promote_op(*, eltype(D), eltype(A)) 
        
               mul!(LinearAlgebra.copy_oftype(A, T), D, A) 
        
           end 
        
           function (*)(A::SparseMatrixCSC, D::Diagonal) 
        
               T = Base.promote_op(*, eltype(D), eltype(A)) 
        
               mul!(LinearAlgebra.copy_oftype(A, T), A, D) 
        
           end

these methods seem to fall back to pretty generic matrix multiplication methods and not to any of the specialized methods defined in stdlib/LinearAlgebra/src/diagonal.jl. I've written a couple of potential replacements. The versions for right multiplying a sparse matrix by a diagonal matrix are benchmarked below. Please let me know if there is a reason why something similar hasn't been implemented already. If it's just an oversight then I can prepare a PR to add in the optimized methods. The benchmarking was run on a fresh clone of master from this morning

julia> versioninfo()
Julia Version 1.1.0-DEV.129
Commit 59c7cedfc4 (2018-08-28 15:47 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libimf
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)

To benchmark I did the following. I wrote the following script and saved it to a file bnchmrkDiag.jl

using LinearAlgebra,SparseArrays,BenchmarkTools

# Optimized matrix matrix product. Underlying in-place multiplication is non-allocating
function MTimesDiag( A::SparseMatrixCSC, D::Diagonal )
    T = Base.promote_op(*, eltype(D), eltype(A))
    C = LinearAlgebra.copy_oftype(A, T)
    MTimesDiag!(C,D)
end

function MTimesDiag!( A::SparseMatrixCSC, D::Diagonal )

    n = size(A,2)
    if (length(D.diag) != n)
        error("Incompatible sizes")
    end

    for ir = 1:n
        j1 = A.colptr[ir]
        j2 = A.colptr[ir+1] - 1
        dgir = D.diag[ir]
        for ic = j1:j2
            A.nzval[ic] *= dgir
        end
    end
    return A
end

# Alternative implementation using rmul!. It's much better than what you currently get
# with * but doesn't compete with the above method.
function MTimesDiagRmul(A::SparseMatrixCSC, D::Diagonal)
    T = Base.promote_op(*, eltype(D), eltype(A))
    C = LinearAlgebra.copy_oftype(A, T)
    rmul!(C, D)
end

n = 1000;
B = sprandn(n,n,1e-3);
#n = 100000;
#B = sprandn(n,n,5e-4);
D = Diagonal(randn(n));

# Allocating versions
b1 = @benchmarkable $B * $D
b2 = @benchmarkable MTimesDiagRmul($B,$D)
b3 = @benchmarkable MTimesDiag($B,$D)

#in place versions
b4 = @benchmarkable rmul!(C,$D) setup=(C = copy($B))
b5 = @benchmarkable MTimesDiag!(C,$D) setup=(C = copy($B))

Then from the REPL I did

julia> include("bnchmrkDiag.jl")
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(b1)
BenchmarkTools.Trial: 
  memory estimate:  24.13 KiB
  allocs estimate:  10
  --------------
  minimum time:     750.135 ms (0.00% GC)
  median time:      755.622 ms (0.00% GC)
  mean time:        756.598 ms (0.00% GC)
  maximum time:     763.333 ms (0.00% GC)
  --------------
  samples:          7
  evals/sample:     1

julia> run(b2)
BenchmarkTools.Trial: 
  memory estimate:  120.09 KiB
  allocs estimate:  30
  --------------
  minimum time:     36.090 μs (0.00% GC)
  median time:      38.475 μs (0.00% GC)
  mean time:        50.252 μs (17.26% GC)
  maximum time:     41.074 ms (99.82% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> run(b3)
BenchmarkTools.Trial: 
  memory estimate:  23.80 KiB
  allocs estimate:  4
  --------------
  minimum time:     3.568 μs (0.00% GC)
  median time:      4.060 μs (0.00% GC)
  mean time:        10.303 μs (55.05% GC)
  maximum time:     41.750 ms (99.95% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> run(b4)
BenchmarkTools.Trial: 
  memory estimate:  96.30 KiB
  allocs estimate:  26
  --------------
  minimum time:     34.150 μs (0.00% GC)
  median time:      36.624 μs (0.00% GC)
  mean time:        45.483 μs (16.58% GC)
  maximum time:     41.385 ms (99.83% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> run(b5)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.841 μs (0.00% GC)
  median time:      2.016 μs (0.00% GC)
  mean time:        2.137 μs (0.00% GC)
  maximum time:     21.772 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

As you can see the simple B * D is incredibly slow compared to the others. As you would expect, the differences become more significant for larger matrices. I re ran the benchmarks with

n = 100000;
B = sprandn(n,n,5e-4);

For reference, that gave nnz(B)/n = 50.00124. I got impatient and killed the B * D benchmark after about a minute. The results for the other methods were

julia> run(b2)
BenchmarkTools.Trial: 
  memory estimate:  160.93 MiB
  allocs estimate:  48
  --------------
  minimum time:     90.440 ms (15.96% GC)
  median time:      123.624 ms (38.25% GC)
  mean time:        110.487 ms (29.57% GC)
  maximum time:     139.199 ms (41.40% GC)
  --------------
  samples:          46
  evals/sample:     1

julia> run(b3)
BenchmarkTools.Trial: 
  memory estimate:  77.06 MiB
  allocs estimate:  7
  --------------
  minimum time:     36.896 ms (10.34% GC)
  median time:      37.501 ms (10.50% GC)
  mean time:        38.607 ms (12.12% GC)
  maximum time:     89.270 ms (62.05% GC)
  --------------
  samples:          130
  evals/sample:     1

julia> run(b4)
BenchmarkTools.Trial: 
  memory estimate:  83.88 MiB
  allocs estimate:  41
  --------------
  minimum time:     47.182 ms (1.35% GC)
  median time:      52.857 ms (9.70% GC)
  mean time:        53.374 ms (11.09% GC)
  maximum time:     88.956 ms (46.19% GC)
  --------------
  samples:          46
  evals/sample:     1

julia> run(b5)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.496 ms (0.00% GC)
  median time:      4.910 ms (0.00% GC)
  mean time:        4.945 ms (0.00% GC)
  maximum time:     6.768 ms (0.00% GC)
  --------------
  samples:          128
  evals/sample:     1

You see about the same thing for left multiplication by a diagonal matrix.

Thanks, Patrick

The text was updated successfully, but these errors were encountered:

GiggleLiu · 2018-09-04T16:28:28Z

This bug is related to incorrect function dispatch

julia/stdlib/SparseArrays/src/linalg.jl

Line 971 in c8450d8

function mul!(C::SparseMatrixCSC, A::SparseMatrixCSC, D::Diagonal{<:Vector})

There is no Diagonal{<:Vector} type! it should be Diagonal{T, <:Vector} where T.

Let's see first several lines of profile

1061 ./task.jl:259; (::getfield(REPL, Symbol("##28#29")){REP...
 1061 ...lib/v1.0/REPL/src/REPL.jl:117; macro expansion
  1061 ...lib/v1.0/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
   1061 ./boot.jl:319; eval(::Module, ::Any)
    1061 ....0/Profile/src/Profile.jl:25; top-level scope
     1061 ...arseArrays/src/linalg.jl:135; *(::SparseMatrixCSC{Float64,Int64}, :...
      1061 ...arAlgebra/src/matmul.jl:172; mul!
       1061 ...arAlgebra/src/matmul.jl:578; generic_matmatmul!(::SparseMatrixCSC...
        2   ...arAlgebra/src/matmul.jl:0; _generic_matmatmul!(::SparseMatrixCS...
        119 ...arAlgebra/src/matmul.jl:638; _generic_matmatmul!(::SparseMatrixCS...
         119 ...arAlgebra/src/matmul.jl:480; copy_transpose!
          1  ./sort.jl:0; copy_transpose!(::Array{Float64,2},...
          9  ./sort.jl:177; copy_transpose!(::Array{Float64,2},...
          1  ...gebra/src/transpose.jl:183; copy_transpose!(::Array{Float64,2},...
          4  ...gebra/src/transpose.jl:196; copy_transpose!(::Array{Float64,2},...
          84 ...gebra/src/transpose.jl:197; copy_transpose!(::Array{Float64,2},...

See, it is not diapatched correctly as @Pbellive mensioned.

Pbellive · 2018-09-04T19:29:48Z

Thanks @GiggleLiu. I missed that the correct implementation existed in linalg.jl. Just need to change the type signatures to fix the bug. Nice and easy. Preparing a PR now.

Pbellive changed the title ~~Multiplication of sparse matrices with diagonal matrices~~ Multiplication of sparse matrices with diagonal matrices is slow (fix suggested) Aug 28, 2018

Pbellive mentioned this issue Aug 28, 2018

[WIP] Optimizing {+,-,*} for structured matrices #28883

Merged

2 tasks

chriscoey mentioned this issue Aug 28, 2018

Alfonso matlab code comparison jump-dev/Hypatia.jl#23

Closed

Pbellive mentioned this issue Sep 4, 2018

Fix dispatch of SparseMatrixCSC*Diagonal multiplication #29045

Merged

andreasnoack closed this as completed in #29045 Sep 5, 2018

dkarrasch mentioned this issue Sep 10, 2018

Inplace-multiplication of sparse matrices with diagonal matrices is slow #29111

Closed

ettersi mentioned this issue Feb 7, 2019

Dispatch to generic matrix product rather than gemm for reshaped transposed arrays JuliaLang/LinearAlgebra.jl#606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiplication of sparse matrices with diagonal matrices is slow (fix suggested) #28934

Multiplication of sparse matrices with diagonal matrices is slow (fix suggested) #28934

Pbellive commented Aug 28, 2018

GiggleLiu commented Sep 4, 2018

Pbellive commented Sep 4, 2018

Multiplication of sparse matrices with diagonal matrices is slow (fix suggested) #28934

Multiplication of sparse matrices with diagonal matrices is slow (fix suggested) #28934

Comments

Pbellive commented Aug 28, 2018

GiggleLiu commented Sep 4, 2018

Pbellive commented Sep 4, 2018