-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add @inline
for Diagonal
's 2-arg l/rdiv!
to enable auto vectorization
#43171
Conversation
replace broadcast with for loop.
I'm not sure it makes a difference, but I haven't seen |
See 1.8 release note: |
Yes, that's why I'm used to seeing the pattern @inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B) In the one-line case, this may not make a difference. I think what you're referring to makes a difference in multi-line code: function foo(...)
# some code involving possibly non-inlined function calls
@inline call_a_function(...) # inline that specific function
# some other code, possibly non-inlined function calls
return result
end |
I'm a little confused by the example in #41312, but macroexpand shows that: julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
#= REPL[2]:1 =#
begin
$(Expr(:inline, true))
local var"#37#val" = ldiv!(B, D, B)
$(Expr(:inline, false))
var"#37#val"
end
end)
julia> @macroexpand @inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
$(Expr(:meta, :inline))
#= REPL[3]:1 =#
ldiv!(B, D, B)
end)
julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
@inline
ldiv!(B, D, B)
end
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
#= REPL[9]:1 =#
#= REPL[9]:2 =#
$(Expr(:meta, :inline))
#= REPL[9]:3 =#
ldiv!(B, D, B)
end) So |
Interestingly, on current nightly I don't see the regression for the out-of-place operation: julia> using LinearAlgebra, BenchmarkTools
julia> A = randn(128, 128); D = Diagonal(ones(128));
julia> @btime rdiv!($A, $D);
18.315 μs (0 allocations: 0 bytes)
julia> @btime ldiv!($D, $A);
18.437 μs (0 allocations: 0 bytes)
julia> @btime $D \ $A;
9.939 μs (2 allocations: 128.05 KiB)
julia> @btime $A / $D;
9.614 μs (2 allocations: 128.05 KiB) I'm on MacOS. |
Yes, the regression is on |
On master,
Diagnal
's 2-argl/rdiv!
call their 3-arg version for code reuse.But as shown in #43153, non-inlined code blocks LLVM's auto vectorization, so the 2-arg
l/rdiv!
should be slower than before.This PR first add
@inline
at the call side to fix the regression. But it turns out thatbroadcast
basedldiv!
still blocks vectorization. So I have to replace it with a for loop version.Some benchmark: