Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very Poor Performance of Diagonal + SparseMatrixCSC #31770

Closed
angeris opened this issue Apr 19, 2019 · 9 comments · Fixed by #32466
Closed

Very Poor Performance of Diagonal + SparseMatrixCSC #31770

angeris opened this issue Apr 19, 2019 · 9 comments · Fixed by #32466
Labels
broadcast Applying a function over a collection performance Must go faster sparse Sparse arrays

Comments

@angeris
Copy link

angeris commented Apr 19, 2019

There appears to be quite poor performance when adding a Diagonal matrix to a matrix of type SparseMatrixCSC. MWE:

N = 63001
@time Diagonal(ones(N)) + spdiagm(0 => ones(N));
@time spdiagm(0 => ones(N)) +  spdiagm(0 => ones(N));

gives

julia> @time Diagonal(ones(N)) + spdiagm(0 => ones(N))
 10.323352 seconds (77 allocations: 12.579 MiB)
julia> @time spdiagm(0 => ones(N)) + spdiagm(0 => ones(N))
  0.053357 seconds (71 allocations: 12.982 MiB, 80.98% gc time)

@briochemc also reports the same (see #linear-algebra in the julialang Slack)

julia> v = ones(1000) ;

julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
  85.206 μs (40 allocations: 198.98 KiB)

julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
  55.500 μs (26 allocations: 143.19 KiB)

julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
  5.564 ms (45 allocations: 191.73 KiB)
@angeris angeris changed the title Poor Performance of Diagonal + SparseMatrixCSC Very Poor Performance of Diagonal + SparseMatrixCSC Apr 19, 2019
@KristofferC
Copy link
Member

Just converting the Diagonal to sparse seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.

@abraunst
Copy link
Contributor

Just converting the Diagonal to sparse seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.

Is this a case automatically covered by #31563?

@mbauman mbauman added broadcast Applying a function over a collection performance Must go faster sparse Sparse arrays labels Apr 19, 2019
@mbauman
Copy link
Member

mbauman commented Apr 19, 2019

This is actually a really easy fix — we're already attempting to convert the Diagonal to sparse but it seems we're doing it in a really poor way.

# This is the function broadcast uses to convert structured array types to sparse
julia> @time SparseArrays.HigherOrderFns._sparsifystructured(Diagonal(ones(63001)));
  8.233826 seconds (42 allocations: 2.963 MiB)

# This is what it uses to do its work — it's a naive O(N^2) algorithm!
julia> @time SparseMatrixCSC(Diagonal(ones(63001)));
  8.216935 seconds (88 allocations: 2.966 MiB)

# But we have this specialization that's not being used!
julia> @time sparse(Diagonal(ones(63001)));
  0.043160 seconds (70.57 k allocations: 5.605 MiB)

@angeris
Copy link
Author

angeris commented Apr 19, 2019

For sure. I think expected behavior here should be the same as calling sparse(Diagonal(v)) before the sum. In particular, as @mbauman mentioned, the resulting matrix is indeed of type SparseMatrixCSC (not dense), but the conversion is extremely slow, which is wholly unexpected.

@briochemc
Copy link
Contributor

briochemc commented Apr 20, 2019

I agree with @mbauman's suggestion to make sure the conversion is fast

I also thought I should point out to @angeris, to be exhaustive, that Diagonal(v) + Diagonal(v) is much faster than all other cases:

julia> v = ones(1000) ;

julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
  79.443 μs (44 allocations: 199.05 KiB)

julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
  49.983 μs (26 allocations: 143.19 KiB)

julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
  5.556 ms (45 allocations: 191.73 KiB)

julia> @btime Diagonal($v) + Diagonal($v) ; # faster than fastest ;)
  1.310 μs (2 allocations: 7.95 KiB)

So it depends what one plans to do with the matrix because sometimes it is better not to do the conversion to sparse. For example,

julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;

julia> @btime $v1' * Diagonal($v2) * $v3 ;
  1.751 μs (3 allocations: 64 bytes)

julia> @btime $v1' * sparse(Diagonal($v2)) * $v3 ;
  8.863 μs (6 allocations: 31.88 KiB)

@angeris
Copy link
Author

angeris commented Apr 20, 2019

@briochemc Indeed! This was an MWE which came up because I was generating a SparseMatrixCSC from a large, 2D laplacian-type matrix (rather than a diagonal one, via spdiagm) and was then adding it to one of type Diagonal :)

Thanks for the latter example... it's super interesting: I didn't realize the performance would be 8 times worse with the sparse conversion!

@KlausC
Copy link
Contributor

KlausC commented Apr 20, 2019

What do you think about that proposal #31563, which would simplify the the solution considerably in view of the problem

there are practically infinite number of combinations of operators and wrappers of abstract arrays

@dkarrasch
Copy link
Member

dkarrasch commented Jun 30, 2019

julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;

julia> @btime $v1' * Diagonal($v2) * $v3 ;
  1.751 μs (3 allocations: 64 bytes)

is fast because it has a special method, which, in a single loop goes through v1, v2 and v3, takes the product and adds them up. That method has a signature (::Adjoint, ::Diagonal, ::(Abstract/Strided)Vector), and won't be affected by whatever we do with the broadcasting. So, how about adding these

SparseMatrixCSC(T::SymTridiagonal) = sparse(T)
SparseMatrixCSC(T::Tridiagonal) = sparse(T)
SparseMatrixCSC(B::Bidiagonal) = sparse(B)
SparseMatrixCSC(D::Diagonal) = sparse(D)

The other, equivalent option is to add in HigherOrderFns

_sparsifystructured(D::Diagonal{<:Number}) = sparse(D)

which would restrict its effect on broadcasting aspects. The equivalence stems from the fact that we have

_sparsifystructured(M::AbstractMatrix) = SparseMatrixCSC(M)

and since the structured matrices are not Matrix but AbstractMatrix, calling sparse on them does not end up in the fast, but in the slow conversion. But I think redirecting the constructor SparseMatrixCSC to the fast specialized sparse seems to make sense more generally. Any opinions? I can make a PR.

Edit: As an alternative, we could move the current, specific code from sparse to the constructors SparseMatrixCSC(::StructuredMatrix), and let sparse call SparseMatrixCSC. I think this is how its done for usual matrices, and we might get rid of specialized sparse methods anyway, since the sparse(::AbstractMatrix) method would handle it and pass to the specialized constructor. Since there are many ways to Rome here, I'd appreciate some advice on which way is more conformant with certain style rules or whatever. In any case, we should treat all structured matrices, not only the Diagonals.

@mbauman
Copy link
Member

mbauman commented Jul 1, 2019

I would fully support moving the relevant sparse specializations onto SparseMatrixCSC (your "edit" alternative).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster sparse Sparse arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants