Specialisation of mul! for left multiplication with scalars #173

krcools · 2022-03-31T13:42:47Z

In Base.LinearAlgebra, mul! has specialisations for when s in mul!(C,s,B,alpha,beta) is a scalar. This is needed to make mul! essentially act like inplace addition (add! does not seem to exist and add!(C,B) is equivalent to mul!(C,1,B,1,1)).

I suggest to echo the treatment of this special but important case also in LinearMaps. As a motivation, what I want is the conversion of a LinearCombination to a Matrix, with minimal overhead:

function convert_to_dense(A::LinearMaps.LinearCombination)
    M = zeros(eltype(A), axes(A))
    for map in A.maps
        mul!(M, 1, map, 1, 1)
    end
    return M
end

as opposed to

function convert_to_dense(A::LinearMaps.LinearCombination)
    M = zeros(eltype(A), axes(A))
    for map in A.maps
        M .+= Matrix(map)
    end
    return M
end

The latter does a lot of spurious copying and allocating, and does not allow for clever specialisations in case map is a WrappedMap around a sparse matrix.

Closes #181.

This allows for the linear combination to inherit the potentially rich structure of the terms (e.g. when the terms are BlockArrays)

codecov · 2022-03-31T13:51:43Z

Codecov Report

Merging #173 (c4a9745) into master (98843dd) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #173      +/-   ##
==========================================
+ Coverage   98.40%   98.54%   +0.13%     
==========================================
  Files          15       15              
  Lines        1193     1307     +114     
==========================================
+ Hits         1174     1288     +114     
  Misses         19       19

Impacted Files	Coverage Δ
src/LinearMaps.jl	`100.00% <100.00%> (ø)`
src/blockmap.jl	`99.27% <100.00%> (+0.21%)`	⬆️
src/conversion.jl	`100.00% <100.00%> (ø)`
src/embeddedmap.jl	`68.75% <100.00%> (+6.25%)`	⬆️
src/fillmap.jl	`100.00% <100.00%> (ø)`
src/linearcombination.jl	`100.00% <100.00%> (ø)`
src/scaledmap.jl	`100.00% <100.00%> (ø)`
src/transpose.jl	`100.00% <100.00%> (ø)`
src/uniformscalingmap.jl	`100.00% <100.00%> (ø)`
src/wrappedmap.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98843dd...c4a9745. Read the comment docs.

dkarrasch · 2022-03-31T14:21:57Z

Sounds like a good idea. Perhaps, the 1 should be one(eltype(M)) (or true?). Could you maybe rebase onto current master? The current function Base.Matrix{T}(ΣA::LinearCombination{<:Any, <:Tuple{Vararg{VecOrMatMap}}}) where {T} uses sum on the tuple/vector of matrices. Does that create intermediate arrays while summing up or does it do fancy in-place operations? In other words, would be interesting to see some benchmarks.

dkarrasch · 2022-03-31T14:23:29Z

Also, since this is a new feature, feel free to bump the minor version right away, and leave a comment in the history.md file to briefly mention the feature.

krcools · 2022-03-31T15:29:16Z

I am getting some interesting results. The method at

LinearMaps.jl/src/conversion.jl

Line 76 in f8eaf98

    
           function Base.Matrix{T}(ΣA::LinearCombination{<:Any, <:Tuple{Vararg{VecOrMatMap}}}) where {T}

Is only called when all coefficients in the linear combination are unity and when the linear combination is entered by hand, i.e. when the storage type is Tuple, not Vector. This implementation beats my code path by about 12%. My guess is that it called BLAS.axpy under the hood...

However, the conditions for entering this code path are very brittle, and in all other cases the fallback implementation is 300 slower. In terms of memory my method is on par.

using LinearAlgebra
using LinearMaps
using BenchmarkTools

n = 1024
A = [rand(n,n) for i in 1:10]
L = [LinearMap(a) for a in A]
L2 = [2*LinearMap(a) for a in A]

LC1 = sum(L)
LC2 = sum(L2)

LC3 = L[1] + L[2] + L[3] + L[4] + L[5] + L[6] + L[7] + L[8] + L[9] + L[10]
LC4 = 2L[1] + L[2] + L[3] + L[4] + L[5] + L[6] + L[7] + L[8] + L[9] + L[10]

function convert_to_dense(A::LinearMaps.LinearCombination)
    M = zeros(eltype(A), axes(A))
    for map in A.maps
        mul!(M, 1, map, 1, 1)
    end
    return M
end

@which Matrix{Float64}(LC1)
@which Matrix{Float64}(LC2)
@which Matrix{Float64}(LC3)
@which Matrix{Float64}(LC4)

@btime B1 = Matrix{Float64}(LC1);
@btime B2 = Matrix{Float64}(LC2);
@btime B2 = Matrix{Float64}(LC3);
@btime B3 = Matrix{Float64}(LC4);

@btime C1 = convert_to_dense(LC1);
@btime C2 = convert_to_dense(LC2);
@btime C3 = convert_to_dense(LC3);
@btime C4 = convert_to_dense(LC4);

  2.618 s (10243 allocations: 8.32 MiB)
  2.644 s (11267 allocations: 16.45 MiB)
  7.000 ms (30 allocations: 16.00 MiB)
  2.614 s (1027 allocations: 16.13 MiB)
  8.642 ms (32 allocations: 8.00 MiB)
  8.547 ms (32 allocations: 8.00 MiB)
  8.608 ms (32 allocations: 8.00 MiB)
  8.467 ms (51 allocations: 8.00 MiB)

src/scaledmap.jl

dkarrasch · 2022-04-07T09:29:31Z

This is getting very close to what I imagined. I think we should make the most generic Matrix(::LinearMap) fallback use generic_mapnum_mul! and use exactly the previous code in there. In particular, don't overwrite in each loop iteration the whole vector xi with zeros, but only overwrite the current entry. Then we have a fully operational fallback even for the wildest custom map types. Given that, you may then be able to simplify (or even improve) the code in a few places. For instance, statements like Y .+= Matrix(L) "hurt my eyes", because we allocate the entire matrix on the right, and then copy it to Y. I would guess/hope that we can avoid that if make it write into Y columnwise directly via the generic... code?

krcools · 2022-04-07T09:36:43Z

Fully agree. Came to the same conclusion as I progressed. I will implement these suggestions in the next couple of days.

krcools · 2022-04-08T08:11:54Z

There is actually one thing I can't figure out:

If a user defines a custom LinearMap CustomMap in an external package, the docs say that implementing mul! suffices. However, assume that this custom map is then scaled, giving rise to ScaledMap{T, CustomMap} and mul! is called on this scaled map, the following stack unfolds:

mul!(y, S::ScaledMap, x)
unsafe_mul!(y, S::ScaledMap, x)
unsafe_mul!(y, C::LinearMap, x)
__generic_mapvec_mul!(y, C::CustomMap, x)

In other words, when the custom map appears at a lower level of a more complicated LinearMap, the user supplied mul!(y, C::CustomMap, x) is not called. I encounter this issue also in the context of mul! by scalars, but it is not unique to that situation.

Am I missing something here?

Currently I am working around this by not only supplying mul! but also _unsafe_mul! in LiftedMaps.jl.

dkarrasch · 2022-04-08T09:01:02Z

Can you provide a minimal example? Because of

LinearMaps.jl/src/LinearMaps.jl

Lines 246 to 247 in aa54eb3

    
           _unsafe_mul!(y, A::MapOrVecOrMat, x) = mul!(y, A, x) 
        
           _unsafe_mul!(y, A::AbstractVecOrMat, x, α, β) = mul!(y, A, x, α, β)

a "missing" _unsafe_mul! should be redirected back to mul!. Even if not, then in _generic_mapvec_mul! the 3-arg mul! gets called. OTOH, _generic_mapvec_mul! is called only in the 5-arg case here:

LinearMaps.jl/src/LinearMaps.jl

Lines 248 to 250 in aa54eb3

    
           function _unsafe_mul!(y::AbstractVecOrMat, A::LinearMap, x::AbstractVector, α, β) 
        
               return _generic_mapvec_mul!(y, A, x, α, β) 
        
           end

so, unless I am missing some path, it should really end up with the user-defined mul!.

krcools · 2022-04-08T10:07:42Z

I see... I didn't bother drilling down deeper after encountering __generic_mapvec_mul!.

The problem still exist for mapmat and mapnum I think. For example, __generic_mapmat_mul! is implemented in terms of mul!(x, L, y::AbstractVector), skipping user supplied specialisations of this level BLAS 3 type of operation.

MWE:

using LinearMaps

struct CustomMap{T} <: LinearMap{T}
    A::Matrix{T}
end

LinearMaps.MulStyle(C::CustomMap) = LinearMaps.FiveArg()
LinearMaps.size(C::CustomMap) = size(C.A)
LinearMaps.mul!(y::AbstractVector, C::CustomMap, x::AbstractVector, a::Number, b::Number) = LinearMaps.mul!(y, C.A, x, a, b)
function LinearMaps.mul!(Y::AbstractMatrix, C::CustomMap, X::AbstractMatrix, a::Number, b::Number)
    println("User supplied BLAS-3")
    LinearMaps.mul!(Y, C.A, X, a, b)
end

A = rand(3,3)
C = CustomMap(A)
S = 2C

X = rand(3,4)
Y = zeros(3,4)

LinearMaps.mul!(Y,C,X,1,1);
LinearMaps.mul!(Y,S,X,1,1);

The same is true for matnum, which is written also written in terms of mul!(x, L, y::AbstractVector), skipping mul!(x, L, s::Number).

It seems to me that __generic means something different for mapvec on one hand and for mapmat and matnum on the other. For the former, it exploits special values of alpha and beta, for the latter it is a true fallback that breaks the operation down in primitives that always need to be supplied.

dkarrasch · 2022-04-08T10:44:14Z

Awesome, thanks for finding and sharing that. There are two things that bite each other here. The _unsafe_mul! redirection was meant to save size checks for all "submaps", when you, due to checks at construction, should already know that everything is consistent. If we replaced all those by mul!, then all this effort is in vain. Could we then solve the issue by asking users to define _unsafe_mul! for custom map types? Like tell them to make important checks at construction, then leave the size compatibility to LinearMaps.jl, and then only define the "multiplication kernel" _unsafe_mul!? As you say, the good thing is that this affects currently only the mapmat case (and mapnum in the future).

dkarrasch · 2022-04-08T10:47:24Z

Currently I am working around this by not only supplying mul! but also _unsafe_mul! in LiftedMaps.jl.

Yea, perhaps only unsafe_mul! should be required, and all the "high-level" stuff like allocating the output in A*x or checking sizes in mul! is left to LinearMaps.jl?

krcools · 2022-04-08T15:26:50Z

Yes, that seems to be the way out. Just a shame _unsafe_mul! is not a particularly inviting name to import and extend...

dkarrasch

I left a few remarks and suggestions. This looks pretty good, and we should get it done and out soon (edit: this is referring to myself and not putting pressure on you, of course!).

src/LinearMaps.jl

src/conversion.jl

src/transpose.jl

src/uniformscalingmap.jl

dkarrasch

A few more comments, sorry for being so picky. I'm not 100% sure, but I believe we should provide targetted methods for 3-arg multiplication because some map types may not come with 5-arg implementations. Calling 5-arg methods (even with default values) requires much care that no intermediate arrays are allocated.

src/LinearMaps.jl

docs/src/history.md

src/LinearMaps.jl

src/conversion.jl

src/linearcombination.jl

src/uniformscalingmap.jl

docs/src/custom.jl

dkarrasch · 2022-06-01T08:07:35Z

I need to fix the blockmul case, but otherwise this is really close to being finished.

It is somewhat unfortunate that even for matrices, the blockmul code falls back to _generic_matmatmul! because of the views and the non-stridedness. It would be interesting to have some real-world benchmark case to see if it would be beneficial to compute the individual block muls "out of place" and then copyto! the target. I should perhaps file an issue and reach out to potential users.

dkarrasch · 2022-06-02T08:36:11Z

For me, all your benchmarks from #173 (comment) are now on the order of milliseconds, with very small overhead for the linear combination of the doubled maps. I used this code for benchmarking:

using LinearAlgebra, LinearMaps, BenchmarkTools

n = 1024
A = [rand(n,n) for i in 1:10]

LC1 = sum((LinearMap(a) for a in A)) # creates a LinearMapTuple
LC2 = sum((2*LinearMap(a) for a in A)) # creates a LinearMapTuple
LC3 = sum([LinearMap(a) for a in A]) # creates a LinearMapVector
LC4 = sum([2*LinearMap(a) for a in A]) # creates a LinearMapVector

@btime B1 = Matrix{Float64}(LC1);
@btime B2 = Matrix{Float64}(LC2);
@btime B2 = Matrix{Float64}(LC3);
@btime B3 = Matrix{Float64}(LC4);

  12.059 ms (29 allocations: 8.00 MiB)
  12.252 ms (29 allocations: 8.00 MiB)
  12.147 ms (39 allocations: 8.00 MiB)
  12.884 ms (39 allocations: 8.00 MiB)

I agree that the slight regression for the one special case is acceptable, given that this "recurses" into the _unsafe_mul!(::Matrix, ::LinearMap, ::Number) hierarchy and makes use of possible optimizations at every step of the map hierarchy, hence the very little overhead for the linear combination of scaled matrices. As you say, this will further auto-optimize for sparse matrices etc. I also like that with this approach, we first allocate the target, and then map into it, rather than possibly creating intermediate arrays and then stack them together as needed. Local tests suggested that this is also advantageous for BlockMaps. And finally, we missed to provide some specialized Matrix constructor for our new EmbeddedMaps, so relative to the naive implementation, this will speed up things as we avoid computing all-zero columns.

krcools added 10 commits October 6, 2021 19:53

Overload axes for ScaledMap and LinearCombination

94ff714

This allows for the linear combination to inherit the potentially rich structure of the terms (e.g. when the terms are BlockArrays)

Added axes forwarding for WrappedMap

51ef88b

tests for non-traditional axes

986a055

nontradaxes test do not depend on LiftedMaps

a311ec6

nontradaxes passes on [email protected] and up

c1294a8

conditional testing of nontradaxes (julia 0.1 and up)

360a840

version bump and history stanza

398e601

typos in history.md

6466f60

Merge remote-tracking branch 'origin/master'

2f0613d

specialize mul! for left multiplication by scalars

5a598ee

krcools added 2 commits April 5, 2022 13:32

inplace mul! for scaled maps

7706b34

Merge remote-tracking branch 'origin/master'

e11c47e

dkarrasch reviewed Apr 5, 2022

View reviewed changes

src/scaledmap.jl Outdated Show resolved Hide resolved

krcools added 2 commits April 6, 2022 17:57

unsafe_mul!(mat, map, scalar) for all shipped LinearMaps

83a3418

History entry added

77f1376

krcools added 3 commits April 7, 2022 15:36

No more explicit casts

e338d42

fixed dimension checking

2346173

Matrix always produces a Matrix

de4a3a9

cast to AbstractMatrix preserves axes

186446e

dkarrasch reviewed May 6, 2022

View reviewed changes

krcools added 3 commits May 9, 2022 11:12

docs changes

cbd8e20

implemented comments

bf0f577

Merge remote-tracking branch 'origin/master' into pr/krcools/173

8063de5

dkarrasch reviewed May 9, 2022

View reviewed changes

krcools and others added 11 commits May 11, 2022 13:58

supply threearg versions of _unsafe_mul!

1a8fb52

Updated docs and history

1ed47b6

Merge remote-tracking branch 'origin/master' into pr/krcools/173

cb72d79

add tests

32bb6a2

minor edits

0b5e72b

expand custom map type docs

d280e54

fix test

f523597

actually fix the test

98558a5

revert some things, clean up a bit

fbd705c

optimize conversion, undo another of my changes

f4eed2b

add fillmap test

340a321

dkarrasch added 4 commits June 1, 2022 18:11

optimize embedded maps

67a6d87

beautify

75efe2b

optimize block maps

6e4f67e

fix block diagonal maps, add tests

8f768ed

dkarrasch added 4 commits June 2, 2022 11:13

improve coverage

c7d7991

fix and test

c4462db

fix coverage

17a0da1

add docs, small rearrangements

c4a9745

dkarrasch merged commit 12c13fb into JuliaLinearAlgebra:master Jun 2, 2022

dkarrasch mentioned this pull request Jun 9, 2022

Add slicing functionality #165

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialisation of mul! for left multiplication with scalars #173

Specialisation of mul! for left multiplication with scalars #173

krcools commented Mar 31, 2022 •

edited by dkarrasch

Loading

codecov bot commented Mar 31, 2022 •

edited

Loading

dkarrasch commented Mar 31, 2022

dkarrasch commented Mar 31, 2022

krcools commented Mar 31, 2022

dkarrasch commented Apr 7, 2022

krcools commented Apr 7, 2022

krcools commented Apr 8, 2022

dkarrasch commented Apr 8, 2022

krcools commented Apr 8, 2022

dkarrasch commented Apr 8, 2022

dkarrasch commented Apr 8, 2022

krcools commented Apr 8, 2022

dkarrasch left a comment •

edited

Loading

dkarrasch left a comment

dkarrasch commented Jun 1, 2022

dkarrasch commented Jun 2, 2022

Specialisation of mul! for left multiplication with scalars #173

Specialisation of mul! for left multiplication with scalars #173

Conversation

krcools commented Mar 31, 2022 • edited by dkarrasch Loading

codecov bot commented Mar 31, 2022 • edited Loading

Codecov Report

dkarrasch commented Mar 31, 2022

dkarrasch commented Mar 31, 2022

krcools commented Mar 31, 2022

dkarrasch commented Apr 7, 2022

krcools commented Apr 7, 2022

krcools commented Apr 8, 2022

dkarrasch commented Apr 8, 2022

krcools commented Apr 8, 2022

dkarrasch commented Apr 8, 2022

dkarrasch commented Apr 8, 2022

krcools commented Apr 8, 2022

dkarrasch left a comment • edited Loading

Choose a reason for hiding this comment

dkarrasch left a comment

Choose a reason for hiding this comment

dkarrasch commented Jun 1, 2022

dkarrasch commented Jun 2, 2022

krcools commented Mar 31, 2022 •

edited by dkarrasch

Loading

codecov bot commented Mar 31, 2022 •

edited

Loading

dkarrasch left a comment •

edited

Loading