faster A_mul_B! and * involving bidiagonal and tridiagonal matrices #15505

jw3126 · 2016-03-14T20:37:57Z

Two A_mul_B!(C, A, B) methods where B or A, B are tridiagonal are implemented. This gives faster matrix multiplication methods for some bi and tridiagonal argument combinations, which would fall back to full matrix multiplication before. See JuliaLang/LinearAlgebra.jl#263 and timings below.

begin
           import Base.LinAlg.A_mul_B!
           k = 1000;
           A = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           B = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           C = randn(k, k);

           Ai = Bidiagonal(randn(k), randn(k-1), rand(Bool));
           Bi = Bidiagonal(randn(k), randn(k-1), rand(Bool));
           fullAi = full(Ai);
           fullBi = full(Bi);
           Atri = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           Btri = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           C = randn(k, k);
       end

julia> @time A_mul_B!(C, fullAi, fullBi);
  0.035776 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, fullAi, Btri);
  0.002102 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, Atri, fullBi);
  0.004037 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, Atri, Btri);
  0.000815 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, fullAi, Bi);
  0.001518 seconds (11 allocations: 16.141 KB)

julia> @time A_mul_B!(C, Ai, fullBi);
  0.004141 seconds (11 allocations: 16.141 KB)

julia> @time A_mul_B!(C, Ai, Bi);
  0.001324 seconds (18 allocations: 32.125 KB)

julia> @time fullAi * fullBi;
  0.035390 seconds (9 allocations: 7.630 MB)

julia> @time fullAi * Btri;
  0.001393 seconds (9 allocations: 7.630 MB)

julia> @time Atri * fullBi;
  0.004084 seconds (9 allocations: 7.630 MB)

julia> @time Atri * Btri;
  0.001264 seconds (9 allocations: 7.630 MB)

julia> @time fullAi * Bi;
  0.002728 seconds (16 allocations: 7.645 MB)

julia> @time Ai * fullBi;
  0.005010 seconds (16 allocations: 7.645 MB)

julia> @time Ai * Bi;
  0.001789 seconds (23 allocations: 7.661 MB)

tkelman · 2016-03-15T07:15:20Z

base/linalg/tridiag.jl

+end
+
+function A_mul_B_td!(C::AbstractMatrix, A::Tridiagonal, B::Tridiagonal)
+


remove the unnecessary empty line

Ref: #15418 and #15439. We need to settle that issue. I think the empty line here is fine.

Concretely, the vast majority of the existing code base does not have blank lines after signature, even when the body starts with a comment. Therefore, if we allow them we lose consistency an leave room for discussions, which wastes time when reviewing PRs. A simple rule without exceptions is better.

That is an open discussion in the issues I linked to.

andreasnoack · 2016-03-15T12:50:29Z

Thanks. It's great with the specialized methods.

jw3126 · 2016-03-15T19:16:21Z

Thanks for all the feedback. Btw have you any idea, why the AppVeyor build failed?

pkofod · 2016-03-15T20:17:21Z

Thanks for all the feedback. Btw have you any idea, why the AppVeyor build failed?

Someone might have a sharper eye than I do, but I think you just lucked out. The AppVeyor plan used allows for the total run time to be no more than 60 minutes. The entire thing on ARCH=x86_64 usually runs at around 52-53 minutes, so I think you were just unlucky.

tkelman · 2016-03-19T03:44:30Z

base/linalg/bidiag.jl

+
+    check_A_mul_B!_sizes(C, A, B)
+    n = size(A,1)
+    n <= 3 && return A_mul_B!(C, full(A), full(B))


I don't think there's any advantage to using full here

There is stuff like C[2,4] = A[2,3]*B[3,4] in the code, which violates bounds of small matrices. Thats why n <= 3 is special.

You'd likely get much better performance by hoisting out and manually doing the transformation between A[2,3] vs Adu = A.du outside the loop then referring to Adu[2] within the loop.

@tkelman I think I already did this. For indexing I used the following rules:

Outside loops explicit indexing like A[1,2] is allowed for clarity. The performance overhead is negligible.

Inside loops do all indexing of BiTriSym matrices manually.

If you ever find code like A[i, j] inside a loop, its because A is not BiTriSym e.g. full or something.

jw3126 · 2016-05-01T10:58:08Z

Rebased and squashed, hopefully ready to merge now.

tkelman · 2016-05-01T11:31:10Z

base/linalg/bidiag.jl

+A_mul_B!(C::AbstractMatrix, A::BiTri, B::BiTriSym) = A_mul_B_td!(C, A, B)
+A_mul_B!(C::AbstractMatrix, A::BiTriSym, B::BiTriSym) = A_mul_B_td!(C, A, B)
+A_mul_B!(C::AbstractMatrix, A::AbstractTriangular, B::BiTriSym) = A_mul_B_td!(C, A, B)
+A_mul_B!(C::AbstractMatrix, A::AbstractMatrix, B::BiTriSym) = A_mul_B_td!(C, A, B)


these 5 lines should probably be expressed with a Union. And the A::BiTri line is redundant

Mhh I don't know a simpler way then these 5 lines. Its easy to produce ambiguous definitions. For example if I remove the BiTri line I get

WARNING: New definition A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:236 is ambiguous with: A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:232. To fix, define A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) before the new definition. WARNING: New definition A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:236 is ambiguous with: A_mul_B!(AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:234. To fix, define A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) before the new definition. WARNING: New definition A_mul_B!(Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:237 is ambiguous with: A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:232. To fix, define A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) before the new definition. WARNING: New definition A_mul_B!(Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:237 is ambiguous with: A_mul_B!(AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:234. To fix, define A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal})

One big union also definitely does not work. In fact I started with AbstractMatrix (which the Union boils down to)

+A_mul_B!(C::AbstractMatrix, A::AbstractMatrix, B::BiTriSym) = A_mul_B_td!(C, A, B)

and these five lines are what I had to do to fix ambiguities.

jw3126 · 2016-05-04T19:44:06Z

@tkelman Because life is too short to be waiting for sparse arrays being filled by inefficient methods.

julia> N = 10^6
1000000

julia> sparr = spzeros(N);

julia> arr = zeros(N);

julia> @time fill!(arr, 2);
  0.001341 seconds (4 allocations: 160 bytes)

julia> @time fill!(sparr, 2);
  0.004634 seconds (6 allocations: 15.259 MB)

tkelman · 2016-05-05T02:16:12Z

base/sparse/sparsevector.jl

+function _fillnonzero!{Tv,Ti}(arr::SparseMatrixCSC{Tv, Ti}, val)
+    m, n = size(arr)
+    arr.colptr = convert(Vector{Ti}, collect(1:m:n*m+1))
+    arr.rowval = convert(Vector{Ti}, vcat([1:m for _ in 1:n]...))


this should actually be in place, not replacing these with new arrays

this and fill! also should not be in this file

You want things to be inplace for another epsilon of performance or is there another reason?

Where should this fill! be? It needs to be at a point, where SparseVector, SparseMatrixCSC type is already defined?

SparseMatrixCSC is likely to become an immutable in the future (SparseVector already is) so that would force in place anyway.

Also the array might be pointed to by other variables and this would decouple them.

Ah thanks for the hint, I changed to inplace. I already wondered why SparseVector is immutable, what is the reason to make these types immutable?

A few reasons, It for example enables various compiler optimizations like hoisting field loads out of loops. If you look at the various SparseMatrix functions now most of them manually hoist the loads.

the terminology is a bit confusing. sparsematrix and sparsevector are composite types, where you very rarely need or want to wholesale replace the arrays they contain. the fields themselves are usually mutable arrays that can be modified, but only as long as it remains a reference to the same object. though being an immutable type would make it tough to modify the dimension parameters which I have needed to do once in a while.

jw3126 · 2016-05-12T17:02:45Z

Any remaining issues about this PR? If not, I would squash the commits again.

tkelman · 2016-05-12T17:08:01Z

base/linalg/bidiag.jl

+                C[i, j] = A[i, j-1] * Bj₋1j + A[i, j]*Bjj + A[i, j+1] * Bj₊1j
+            end
+        end
+    end#inbounds


(nit) looks a little nicer with a space before and after the #

tkelman · 2016-05-12T17:14:06Z

I wish this were achievable in a bit less code, but since it probably isn't, I think this is ready. Couple minor style nits.

jw3126 · 2016-05-12T17:57:32Z

I agree there are some not so elegant places in this code, but I don't see how to make it shorter. Though I think writing a code generator, which creates A_mul_B! methods such as these automatically for lots of sparsity + memory layout patterns is in principle possible. Of course that would be a much much bigger project.

Big thanks @tkelman and the others for all that feedback!

jw3126 · 2016-05-13T17:19:38Z

Meh that travis build timed out, how can I force it to rerun, do I have to push something?

…al, symmetric tridiagonal... matrices

tkelman reviewed Mar 15, 2016
View reviewed changes

tkelman reviewed Mar 19, 2016
View reviewed changes

jw3126 force-pushed the bitrimatmul branch from 917ff54 to 97eb14c Compare May 1, 2016 08:55

tkelman reviewed May 1, 2016
View reviewed changes

jw3126 force-pushed the bitrimatmul branch from 97eb14c to f81513e Compare May 4, 2016 19:35

tkelman reviewed May 5, 2016
View reviewed changes

tkelman reviewed May 12, 2016
View reviewed changes

jw3126 force-pushed the bitrimatmul branch from cfafd1a to 21f12db Compare May 12, 2016 17:41

faster A_mul_B! and * involving combinations of bidiagonal, tridiagon…

7182b87

…al, symmetric tridiagonal... matrices

jw3126 force-pushed the bitrimatmul branch from 21f12db to 7182b87 Compare May 16, 2016 11:59

tkelman merged commit 97dc858 into JuliaLang:master May 16, 2016

andreasnoack mentioned this pull request Jul 19, 2016

* has no method matching *(::Array{Float64,2}, ::Tridiagonal{Float64}) JuliaLang/LinearAlgebra.jl#263

Closed

mcognetta mentioned this pull request Jul 29, 2018

bi/tri/symdiagonal matrix multiplication should return sparse matrices #28343

Closed

mcognetta mentioned this pull request Aug 24, 2018

[WIP] Optimizing {+,-,*} for structured matrices #28883

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster A_mul_B! and * involving bidiagonal and tridiagonal matrices #15505

faster A_mul_B! and * involving bidiagonal and tridiagonal matrices #15505

jw3126 commented Mar 14, 2016

tkelman Mar 15, 2016

andreasnoack Mar 15, 2016

tkelman Mar 15, 2016

andreasnoack Mar 15, 2016

andreasnoack commented Mar 15, 2016

jw3126 commented Mar 15, 2016

pkofod commented Mar 15, 2016

tkelman Mar 19, 2016

jw3126 Mar 19, 2016

tkelman Mar 19, 2016

jw3126 Mar 20, 2016

jw3126 commented May 1, 2016

tkelman May 1, 2016

jw3126 May 1, 2016

jw3126 commented May 4, 2016

tkelman May 5, 2016

tkelman May 5, 2016

jw3126 May 5, 2016

KristofferC May 5, 2016

jw3126 May 5, 2016

KristofferC May 5, 2016

tkelman May 5, 2016

jw3126 commented May 12, 2016

tkelman May 12, 2016

tkelman commented May 12, 2016

jw3126 commented May 12, 2016

jw3126 commented May 13, 2016

		end

		function A_mul_B_td!(C::AbstractMatrix, A::Tridiagonal, B::Tridiagonal)

faster A_mul_B! and * involving bidiagonal and tridiagonal matrices #15505

faster A_mul_B! and * involving bidiagonal and tridiagonal matrices #15505

Conversation

jw3126 commented Mar 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreasnoack commented Mar 15, 2016

jw3126 commented Mar 15, 2016

pkofod commented Mar 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jw3126 commented May 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jw3126 commented May 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jw3126 commented May 12, 2016

Choose a reason for hiding this comment

tkelman commented May 12, 2016

jw3126 commented May 12, 2016

jw3126 commented May 13, 2016