Add `@inline` for `Diagonal`'s 2-arg `l/rdiv!` to enable auto vectorization #43171

N5N3 · 2021-11-20T14:37:43Z

On master, Diagnal's 2-arg l/rdiv! call their 3-arg version for code reuse.
But as shown in #43153, non-inlined code blocks LLVM's auto vectorization, so the 2-arg l/rdiv! should be slower than before.
This PR first add @inline at the call side to fix the regression. But it turns out that broadcast based ldiv! still blocks vectorization. So I have to replace it with a for loop version.
Some benchmark:

julia> A = randn(128, 128); D = Diagonal(ones(128));
julia> @btime rdiv!($A, $D);
  7.275 μs (0 allocations: 0 bytes) # about 14 μs on master
julia> @btime ldiv!($D, $A);
  7.350 μs (0 allocations: 0 bytes) # about 14 μs on master

…zation

replace broadcast with for loop.

dkarrasch · 2021-11-21T11:56:59Z

I'm not sure it makes a difference, but I haven't seen @inline being written on the rhs of function definitions. Don't we typically write this at the beginning of the line?

N5N3 · 2021-11-21T12:08:29Z

See 1.8 release note:
"@inline and @noinline annotations can now be applied to a function callsite or block to enforce the involved function calls to be (or not to be) inlined. (#41312)"
So ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B) could make 3-arg ldiv! not be force-inlined by the compiler, (as we just want to inline it in 2-arg version)

dkarrasch · 2021-11-21T12:15:02Z

as we just want to inline it in 2-arg version

Yes, that's why I'm used to seeing the pattern

@inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)

In the one-line case, this may not make a difference. I think what you're referring to makes a difference in multi-line code:

function foo(...)
    # some code involving possibly non-inlined function calls
    @inline call_a_function(...) # inline that specific function
    # some other code, possibly non-inlined function calls
    return result
end

N5N3 · 2021-11-21T12:43:19Z

I'm a little confused by the example in #41312, but macroexpand shows that:

julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          #= REPL[2]:1 =#
          begin
              $(Expr(:inline, true))
              local var"#37#val" = ldiv!(B, D, B)
              $(Expr(:inline, false))
              var"#37#val"
          end
      end)

julia> @macroexpand @inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          $(Expr(:meta, :inline))
          #= REPL[3]:1 =#
          ldiv!(B, D, B)
      end)
julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
                 @inline
                 ldiv!(B, D, B)
           end
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          #= REPL[9]:1 =#
          #= REPL[9]:2 =#
          $(Expr(:meta, :inline))
          #= REPL[9]:3 =#
          ldiv!(B, D, B)
      end)

So ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B) only tells the compiler to inline the 3-arg ldiv! in the 2-arg one. And no inline hint is set for 2-arg version?
(Edit: BTW, only the first pattern solves the performance regression)
If I misunderstand, should we replace it with @noinline ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B)?

dkarrasch · 2021-11-22T08:35:21Z

Interestingly, on current nightly I don't see the regression for the out-of-place operation:

julia> using LinearAlgebra, BenchmarkTools

julia> A = randn(128, 128); D = Diagonal(ones(128));

julia> @btime rdiv!($A, $D);
  18.315 μs (0 allocations: 0 bytes)

julia> @btime ldiv!($D, $A);
  18.437 μs (0 allocations: 0 bytes)

julia> @btime $D \ $A;
  9.939 μs (2 allocations: 128.05 KiB)

julia> @btime $A / $D;
  9.614 μs (2 allocations: 128.05 KiB)

I'm on MacOS.

N5N3 · 2021-11-22T08:39:03Z

Yes, the regression is on rdiv!(A, D), ldiv!(D, A) (not ldiv!(B, D, A) or _rdiv!)
Your bench also shows that they are 2-times slower than A/D and D\A, which call 3-arg api.

…zation (JuliaLang#43171)

N5N3 added 2 commits November 20, 2021 21:56

add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectori…

b255933

…zation

Update diagonal.jl

aacdc08

replace broadcast with for loop.

dkarrasch added linear algebra Linear algebra performance Must go faster labels Nov 21, 2021

dkarrasch added the regression Regression in behavior compared to a previous version label Nov 22, 2021

dkarrasch merged commit a40d8c4 into JuliaLang:master Nov 24, 2021

N5N3 deleted the lrdivinline branch November 25, 2021 00:04

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022

Add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectori…

192f0c0

…zation (JuliaLang#43171)

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022

Add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectori…

a6a194c

…zation (JuliaLang#43171)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `@inline` for `Diagonal`'s 2-arg `l/rdiv!` to enable auto vectorization #43171

Add `@inline` for `Diagonal`'s 2-arg `l/rdiv!` to enable auto vectorization #43171

N5N3 commented Nov 20, 2021 •

edited

Loading

dkarrasch commented Nov 21, 2021

N5N3 commented Nov 21, 2021 •

edited

Loading

dkarrasch commented Nov 21, 2021

N5N3 commented Nov 21, 2021 •

edited

Loading

dkarrasch commented Nov 22, 2021

N5N3 commented Nov 22, 2021 •

edited

Loading

Add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectorization #43171

Add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectorization #43171

Conversation

N5N3 commented Nov 20, 2021 • edited Loading

dkarrasch commented Nov 21, 2021

N5N3 commented Nov 21, 2021 • edited Loading

dkarrasch commented Nov 21, 2021

N5N3 commented Nov 21, 2021 • edited Loading

dkarrasch commented Nov 22, 2021

N5N3 commented Nov 22, 2021 • edited Loading

Add `@inline` for `Diagonal`'s 2-arg `l/rdiv!` to enable auto vectorization #43171

Add `@inline` for `Diagonal`'s 2-arg `l/rdiv!` to enable auto vectorization #43171

N5N3 commented Nov 20, 2021 •

edited

Loading

N5N3 commented Nov 21, 2021 •

edited

Loading

N5N3 commented Nov 21, 2021 •

edited

Loading

N5N3 commented Nov 22, 2021 •

edited

Loading