Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectorization #43171

Merged
merged 2 commits into from
Nov 24, 2021

Conversation

N5N3
Copy link
Member

@N5N3 N5N3 commented Nov 20, 2021

On master, Diagnal's 2-arg l/rdiv! call their 3-arg version for code reuse.
But as shown in #43153, non-inlined code blocks LLVM's auto vectorization, so the 2-arg l/rdiv! should be slower than before.
This PR first add @inline at the call side to fix the regression. But it turns out that broadcast based ldiv! still blocks vectorization. So I have to replace it with a for loop version.
Some benchmark:

julia> A = randn(128, 128); D = Diagonal(ones(128));
julia> @btime rdiv!($A, $D);
  7.275 μs (0 allocations: 0 bytes) # about 14 μs on master
julia> @btime ldiv!($D, $A);
  7.350 μs (0 allocations: 0 bytes) # about 14 μs on master

@dkarrasch dkarrasch added linear algebra Linear algebra performance Must go faster labels Nov 21, 2021
@dkarrasch
Copy link
Member

I'm not sure it makes a difference, but I haven't seen @inline being written on the rhs of function definitions. Don't we typically write this at the beginning of the line?

@N5N3
Copy link
Member Author

N5N3 commented Nov 21, 2021

See 1.8 release note:
"@inline and @noinline annotations can now be applied to a function callsite or block to enforce the involved function calls to be (or not to be) inlined. (#41312)"
So ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B) could make 3-arg ldiv! not be force-inlined by the compiler, (as we just want to inline it in 2-arg version)

@dkarrasch
Copy link
Member

as we just want to inline it in 2-arg version

Yes, that's why I'm used to seeing the pattern

@inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)

In the one-line case, this may not make a difference. I think what you're referring to makes a difference in multi-line code:

function foo(...)
    # some code involving possibly non-inlined function calls
    @inline call_a_function(...) # inline that specific function
    # some other code, possibly non-inlined function calls
    return result
end

@N5N3
Copy link
Member Author

N5N3 commented Nov 21, 2021

I'm a little confused by the example in #41312, but macroexpand shows that:

julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          #= REPL[2]:1 =#
          begin
              $(Expr(:inline, true))
              local var"#37#val" = ldiv!(B, D, B)
              $(Expr(:inline, false))
              var"#37#val"
          end
      end)

julia> @macroexpand @inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          $(Expr(:meta, :inline))
          #= REPL[3]:1 =#
          ldiv!(B, D, B)
      end)
julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
                 @inline
                 ldiv!(B, D, B)
           end
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          #= REPL[9]:1 =#
          #= REPL[9]:2 =#
          $(Expr(:meta, :inline))
          #= REPL[9]:3 =#
          ldiv!(B, D, B)
      end)

So ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B) only tells the compiler to inline the 3-arg ldiv! in the 2-arg one. And no inline hint is set for 2-arg version?
(Edit: BTW, only the first pattern solves the performance regression)
If I misunderstand, should we replace it with @noinline ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B)?

@dkarrasch
Copy link
Member

Interestingly, on current nightly I don't see the regression for the out-of-place operation:

julia> using LinearAlgebra, BenchmarkTools

julia> A = randn(128, 128); D = Diagonal(ones(128));

julia> @btime rdiv!($A, $D);
  18.315 μs (0 allocations: 0 bytes)

julia> @btime ldiv!($D, $A);
  18.437 μs (0 allocations: 0 bytes)

julia> @btime $D \ $A;
  9.939 μs (2 allocations: 128.05 KiB)

julia> @btime $A / $D;
  9.614 μs (2 allocations: 128.05 KiB)

I'm on MacOS.

@N5N3
Copy link
Member Author

N5N3 commented Nov 22, 2021

Yes, the regression is on rdiv!(A, D), ldiv!(D, A) (not ldiv!(B, D, A) or _rdiv!)
Your bench also shows that they are 2-times slower than A/D and D\A, which call 3-arg api.

@dkarrasch dkarrasch added the regression Regression in behavior compared to a previous version label Nov 22, 2021
@dkarrasch dkarrasch merged commit a40d8c4 into JuliaLang:master Nov 24, 2021
@N5N3 N5N3 deleted the lrdivinline branch November 25, 2021 00:04
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants