-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlining-related performance regressions #6981
Comments
basing this on the last time i checked (while working on tweaking that threshold), the issue is that it takes a lot of instructions (and a lot more julia box/unbox intrinsics) to describe the behavior of a complex operator as such, i'm looking for suggestions (and evidence) on more advanced heuristics. current thoughts include:
altering this should be relatively simple, if you want to propose some smarter heuristics, since it is handled through |
@simonster it would be good to post the exact code used, and/or add a perf test for this. |
Here's the code I tested: function f(a)
b = 0.0+0.0im
for x in a
b += x*conj(x)
end
b
end
a = complex(rand(10^8), rand(10^8))
f(a)
@time f(a) On my MacBook Pro, this takes ~0.18 seconds on Julia 0.2.1 and ~0.35 seconds for current master. |
Another inlining-related regression case: function f(a)
x = 0
for v in a
x += v
end
x
end
a = randbool(1000000000)
f(a) On my system this takes ~0.44 seconds in Julia 0.2.1 and ~2.6 seconds on current master. The culprit seems to be inlining of |
Since |
I believe that julia> code_typed(f, (BitArray{1},))
1-element Array{Any,1}:
:($(Expr(:lambda, {:a}, {{:x,:#s9,:#s8,:v,:_var0,:_var1},{{:a,BitArray{1},0},{:x,Int64,2},{:#s9,Int64,2},{:#s8,(Bool,Int64),18},{:v,Bool,18},{:_var0,Bool,2},{:_var1,Int64,2}},{}}, quote # none, line 2:
x = 0 # line 3:
#s9 = 0
1:
unless top(box)(Bool,top(not_int)(top(sle_int)(top(getfield)(a::BitArray{1},:len)::Int64,#s9::Int64)::Bool))::Bool goto 2
_var0 = top(box)(Bool,top(not_int)(top(box)(Bool,top(and_int)(top(sle_int)(0,0)::Bool,===(top(box)(Uint64,top(and_int)(arrayref(top(getfield)(a::BitArray{1},:chunks)::Array{Uint64,1},top(box)(Int64,top(add_int)(top(box)(Int64,top(lshr_int)(#s9::Int64,top(box)($(Int32),top(trunc_int)($(Int32),6))::Int32))::Int64,1))::Int64)::Uint64,top(box)(Uint64,top(shl_int)(top(box)($(Uint64),1)::Uint64,top(box)($(Int32),top(trunc_int)($(Int32),top(box)(Int64,top(and_int)(#s9::Int64,63))::Int64))::Int32))::Uint64))::Uint64,top(box)($(Uint64),0)::Uint64)::Bool))::Bool))::Bool
_var1 = top(box)(Int64,top(add_int)(#s9::Int64,1))::Int64
v = _var0::Bool
#s9 = _var1::Int64 # line 4:
x = top(box)(Int64,top(add_int)(x::Int64,top(box)($(Int64),top(zext_int)($(Int64),v::Bool))::Int64))::Int64
3:
goto 1
2:
0: # line 6:
return x::Int64
end))) |
I see. Good catch. It seems small enough to be worthy of inlining even without taking into account that it happens to be called |
Yes; it used to be that any "single expression" function was inlined. While clearly not sufficient, this was actually a surprisingly good heuristic. |
Presumably an |
I had worked on tweaking the performance threshold, but forgot it wasn't on master yet: #6677 no, you don't want a heuristic that only considers SLOC. as part of my changes to inlining, I also significantly improved what is possible to inline into a single line, causing a very significant performance impact if you have it think a single expr is always worth inlining (the example code in the issue would have taken days to compile) |
fixed by #7075 |
I benchmarked a simple loop that does nothing but sum
x*conj(x)
for values in a large vector, and the slowdown is very nearly a factor of 2.This is inlined in Julia 0.2. It looks like the regression happened in e805fd6. I know that there were some tweaks to the inlining threshold in #6605 / 24011fc, but I don't see this getting inlined with that commit or master. cc @vtjnash
The text was updated successfully, but these errors were encountered: