Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster A_mul_B! and * involving bidiagonal and tridiagonal matrices #15505

Merged
merged 1 commit into from
May 16, 2016

Conversation

jw3126
Copy link
Contributor

@jw3126 jw3126 commented Mar 14, 2016

Two A_mul_B!(C, A, B) methods where B or A, B are tridiagonal are implemented. This gives faster matrix multiplication methods for some bi and tridiagonal argument combinations, which would fall back to full matrix multiplication before. See JuliaLang/LinearAlgebra.jl#263 and timings below.

begin
           import Base.LinAlg.A_mul_B!
           k = 1000;
           A = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           B = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           C = randn(k, k);

           Ai = Bidiagonal(randn(k), randn(k-1), rand(Bool));
           Bi = Bidiagonal(randn(k), randn(k-1), rand(Bool));
           fullAi = full(Ai);
           fullBi = full(Bi);
           Atri = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           Btri = Tridiagonal(randn(k-1), randn(k), randn(k-1));
           C = randn(k, k);
       end
julia> @time A_mul_B!(C, fullAi, fullBi);
  0.035776 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, fullAi, Btri);
  0.002102 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, Atri, fullBi);
  0.004037 seconds (4 allocations: 160 bytes)

julia> @time A_mul_B!(C, Atri, Btri);
  0.000815 seconds (4 allocations: 160 bytes)
julia> @time A_mul_B!(C, fullAi, Bi);
  0.001518 seconds (11 allocations: 16.141 KB)

julia> @time A_mul_B!(C, Ai, fullBi);
  0.004141 seconds (11 allocations: 16.141 KB)

julia> @time A_mul_B!(C, Ai, Bi);
  0.001324 seconds (18 allocations: 32.125 KB)
julia> @time fullAi * fullBi;
  0.035390 seconds (9 allocations: 7.630 MB)

julia> @time fullAi * Btri;
  0.001393 seconds (9 allocations: 7.630 MB)

julia> @time Atri * fullBi;
  0.004084 seconds (9 allocations: 7.630 MB)

julia> @time Atri * Btri;
  0.001264 seconds (9 allocations: 7.630 MB)
julia> @time fullAi * Bi;
  0.002728 seconds (16 allocations: 7.645 MB)

julia> @time Ai * fullBi;
  0.005010 seconds (16 allocations: 7.645 MB)

julia> @time Ai * Bi;
  0.001789 seconds (23 allocations: 7.661 MB)

end

function A_mul_B_td!(C::AbstractMatrix, A::Tridiagonal, B::Tridiagonal)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the unnecessary empty line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: #15418 and #15439. We need to settle that issue. I think the empty line here is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concretely, the vast majority of the existing code base does not have blank lines after signature, even when the body starts with a comment. Therefore, if we allow them we lose consistency an leave room for discussions, which wastes time when reviewing PRs. A simple rule without exceptions is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an open discussion in the issues I linked to.

@andreasnoack
Copy link
Member

Thanks. It's great with the specialized methods.

@jw3126
Copy link
Contributor Author

jw3126 commented Mar 15, 2016

Thanks for all the feedback. Btw have you any idea, why the AppVeyor build failed?

@pkofod
Copy link
Contributor

pkofod commented Mar 15, 2016

Thanks for all the feedback. Btw have you any idea, why the AppVeyor build failed?

Someone might have a sharper eye than I do, but I think you just lucked out. The AppVeyor plan used allows for the total run time to be no more than 60 minutes. The entire thing on ARCH=x86_64 usually runs at around 52-53 minutes, so I think you were just unlucky.


check_A_mul_B!_sizes(C, A, B)
n = size(A,1)
n <= 3 && return A_mul_B!(C, full(A), full(B))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any advantage to using full here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is stuff like C[2,4] = A[2,3]*B[3,4] in the code, which violates bounds of small matrices. Thats why n <= 3 is special.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd likely get much better performance by hoisting out and manually doing the transformation between A[2,3] vs Adu = A.du outside the loop then referring to Adu[2] within the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkelman I think I already did this. For indexing I used the following rules:

  1. Outside loops explicit indexing like A[1,2] is allowed for clarity. The performance overhead is negligible.
  2. Inside loops do all indexing of BiTriSym matrices manually.

If you ever find code like A[i, j] inside a loop, its because A is not BiTriSym e.g. full or something.

@jw3126
Copy link
Contributor Author

jw3126 commented May 1, 2016

Rebased and squashed, hopefully ready to merge now.

A_mul_B!(C::AbstractMatrix, A::BiTri, B::BiTriSym) = A_mul_B_td!(C, A, B)
A_mul_B!(C::AbstractMatrix, A::BiTriSym, B::BiTriSym) = A_mul_B_td!(C, A, B)
A_mul_B!(C::AbstractMatrix, A::AbstractTriangular, B::BiTriSym) = A_mul_B_td!(C, A, B)
A_mul_B!(C::AbstractMatrix, A::AbstractMatrix, B::BiTriSym) = A_mul_B_td!(C, A, B)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 5 lines should probably be expressed with a Union. And the A::BiTri line is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh I don't know a simpler way then these 5 lines. Its easy to produce ambiguous definitions. For example if I remove the BiTri line I get

WARNING: New definition 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:236
is ambiguous with: 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:232.
To fix, define 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal})
before the new definition.
WARNING: New definition 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:236
is ambiguous with: 
    A_mul_B!(AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:234.
To fix, define 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal})
before the new definition.
WARNING: New definition 
    A_mul_B!(Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:237
is ambiguous with: 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:232.
To fix, define 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal})
before the new definition.
WARNING: New definition 
    A_mul_B!(Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 1}}) at linalg/bidiag.jl:237
is ambiguous with: 
    A_mul_B!(AbstractArray{T<:Any, 2}, AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal}) at linalg/bidiag.jl:234.
To fix, define 
    A_mul_B!(AbstractArray{T<:Any, 2}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal}, Union{Base.LinAlg.Tridiagonal, Base.LinAlg.Bidiagonal, Base.LinAlg.SymTridiagonal})

One big union also definitely does not work. In fact I started with AbstractMatrix (which the Union boils down to)

 +A_mul_B!(C::AbstractMatrix, A::AbstractMatrix, B::BiTriSym) = A_mul_B_td!(C, A, B) 

and these five lines are what I had to do to fix ambiguities.

@jw3126
Copy link
Contributor Author

jw3126 commented May 4, 2016

@tkelman Because life is too short to be waiting for sparse arrays being filled by inefficient methods.

julia> N = 10^6
1000000

julia> sparr = spzeros(N);

julia> arr = zeros(N);

julia> @time fill!(arr, 2);
  0.001341 seconds (4 allocations: 160 bytes)

julia> @time fill!(sparr, 2);
  0.004634 seconds (6 allocations: 15.259 MB)

function _fillnonzero!{Tv,Ti}(arr::SparseMatrixCSC{Tv, Ti}, val)
m, n = size(arr)
arr.colptr = convert(Vector{Ti}, collect(1:m:n*m+1))
arr.rowval = convert(Vector{Ti}, vcat([1:m for _ in 1:n]...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should actually be in place, not replacing these with new arrays

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and fill! also should not be in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want things to be inplace for another epsilon of performance or is there another reason?

Where should this fill! be? It needs to be at a point, where SparseVector, SparseMatrixCSC type is already defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparseMatrixCSC is likely to become an immutable in the future (SparseVector already is) so that would force in place anyway.

Also the array might be pointed to by other variables and this would decouple them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the hint, I changed to inplace. I already wondered why SparseVector is immutable, what is the reason to make these types immutable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few reasons, It for example enables various compiler optimizations like hoisting field loads out of loops. If you look at the various SparseMatrix functions now most of them manually hoist the loads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the terminology is a bit confusing. sparsematrix and sparsevector are composite types, where you very rarely need or want to wholesale replace the arrays they contain. the fields themselves are usually mutable arrays that can be modified, but only as long as it remains a reference to the same object. though being an immutable type would make it tough to modify the dimension parameters which I have needed to do once in a while.

@jw3126
Copy link
Contributor Author

jw3126 commented May 12, 2016

Any remaining issues about this PR? If not, I would squash the commits again.

C[i, j] = A[i, j-1] * Bj₋1j + A[i, j]*Bjj + A[i, j+1] * Bj₊1j
end
end
end#inbounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) looks a little nicer with a space before and after the #

@tkelman
Copy link
Contributor

tkelman commented May 12, 2016

I wish this were achievable in a bit less code, but since it probably isn't, I think this is ready. Couple minor style nits.

@jw3126
Copy link
Contributor Author

jw3126 commented May 12, 2016

I agree there are some not so elegant places in this code, but I don't see how to make it shorter. Though I think writing a code generator, which creates A_mul_B! methods such as these automatically for lots of sparsity + memory layout patterns is in principle possible. Of course that would be a much much bigger project.

Big thanks @tkelman and the others for all that feedback!

@jw3126
Copy link
Contributor Author

jw3126 commented May 13, 2016

Meh that travis build timed out, how can I force it to rerun, do I have to push something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants