-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimizer: supports callsite annotations of inlining, fixes #18773 #41328
Conversation
9a3673d
to
fe3e8c2
Compare
base/compiler/optimize.jl
Outdated
# when the source isn't available at this moment, try to re-infer and inline it | ||
# NOTE we can make inference try to keep the source if the call is going to be inlined, | ||
# but then inlining will depend on local state of inference and so the first entry | ||
# and the succeeding ones may generate different code; rather we always re-infer | ||
# the source to avoid the problem while it's obviously not most efficient | ||
# HACK disable inlining for the re-inference to avoid cycles by making sure the following inference never comes here again | ||
interp = NativeInterpreter(get_world_counter(interp); opt_params = OptimizationParams(; inlining = false)) | ||
src, rt = typeinf_code(interp, match.method, match.spec_types, match.sparams, true) | ||
return src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the approach to use uncached source when the callsite is forced to be inlined (a1b1435), but it turns out that then inlining will depend o local state of inference and so the first entry and the succeeding ones may generate different code, e.g. from a test case I added:
function multilimited(a)
if @noinline(isType(a))
return @inline(multilimited(a.parameters[1]))
else
return rand(Bool) ? rand(a) : @inline(multilimited(a))
end
end
# output might be different if we depend on inference local state
code_typed(m.multilimited, (Any,))
code_typed(m.multilimited, (Any,))
We might be able to keep source so that it doesn't depend on the local state, but it could be tricky.
Rather I'd like to always just re-infer when a source is unavailable, at least in this first iteration.
/cc @vtjnash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to happen during inference. It seems quite wrong to make a new NativeInterpreter here, and can probably have many other problems. We used to do some amount of recursion here, and we have gotten rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about fea41f2 ? We still need fresh optimization w/o inlining hack because those uncached sources are not optimized.
Well, now I'm wondering if it's really useful. I benchmarked some functions that hit this pass, and I couldn't find any cases where this "single-level-inlining" can help the performance.. Honestly I don't think it's worth the complexity that comes with this support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fe3e8c2
to
830cd9f
Compare
Does callsite |
Callsite inlining will happen in regardless of the cost model, but not "force inlining completely", since there are some cases when a method body itself is unavailable (think when there are recursive calls but annotated as I would appreciate if you could test this PR for your target, though. |
Just tested and it works, so this also closes #40292. Thank you very much! |
0bb8e7d
to
ec1d492
Compare
ec1d492
to
147c1ea
Compare
147c1ea
to
83b77e9
Compare
83b77e9
to
d6942b1
Compare
d6942b1
to
0c338b7
Compare
0c338b7
to
6406503
Compare
c251d3d
to
fea41f2
Compare
fea41f2
to
5557c2f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
After #41328, inference can observe statement flags and try to re-infer a discarded source if it's going to be inlined. The re-inferred source will only be cached into the inference-local cache, and won't be cached globally.
After #41328, inference can observe statement flags and try to re-infer a discarded source if it's going to be inlined. The re-inferred source will only be cached into the inference-local cache, and won't be cached globally.
After #41328, inference can observe statement flags and try to re-infer a discarded source if it's going to be inlined. The re-inferred source will only be cached into the inference-local cache, and won't be cached globally.
#42082 should fix the "serious" flaw with this PR, and now callsite inlining should work more reliably. |
xrefs: - <JuliaLang/julia#41312> - <JuliaLang/julia#41328> Built on top of <#752>.
xrefs: - <JuliaLang/julia#41312> - <JuliaLang/julia#41328> Built on top of <#752>.
@oscardssmith If you haven't tried this out yet: julia> function vexp!(y,x)
@inbounds for i ∈ eachindex(y,x)
y[i] = exp(x[i])
end
end
vexp! (generic function with 1 method)
julia> function vexp_inline!(y,x)
@inbounds for i ∈ eachindex(y,x)
y[i] = @inline exp(x[i])
end
end
vexp_inline! (generic function with 1 method)
julia> x = randn(256); y = similar(x);
julia> @benchmark vexp!($y, $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.234 μs … 2.336 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.329 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.334 μs ± 49.344 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄ ▁
▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅██▁▇▃▁▆▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▆▃▄▄▇▆▅▆ █
1.23 μs Histogram: log(frequency) by time 1.49 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark vexp_inline!($y, $x)
BenchmarkTools.Trial: 10000 samples with 540 evaluations.
Range (min … max): 212.893 ns … 316.361 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 213.315 ns ┊ GC (median): 0.00%
Time (mean ± σ): 214.200 ns ± 6.480 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▇█▇▃ ▁▂▃▂▂▂▂ ▂
▅█████▃▁▃▃▁▁▁▁▁▁▄▁▁▁▁▁▃▄▆████████▇██▇▇▇▆▅▄▃▅▄▅▃▁▄▄▃▄▄▄▄▄▅▃▄▄▅ █
213 ns Histogram: log(frequency) by time 220 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
…#18773 (JuliaLang#41328) * optimizer: supports callsite annotations of inlining, fixes JuliaLang#18773 Enable `@inline`/`@noinline` annotations on function callsites. From JuliaLang#40754. Now `@inline` and `@noinline` can be applied to a code block and then the compiler will try to (not) inline calls within the block: ```julia @inline f(...) # The compiler will try to inline `f` @inline f(...) + g(...) # The compiler will try to inline `f`, `g` and `+` @inline f(args...) = ... # Of course annotations on a definition is still allowed ``` Here are couple of notes on how those callsite annotations will work: - callsite annotation always has the precedence over the annotation applied to the definition of the called function, whichever we use `@inline`/`@noinline`: ```julia @inline function explicit_inline(args...) # body end let @noinline explicit_inline(args...) # this call will not be inlined end ``` - when callsite annotations are nested, the innermost annotations has the precedence ```julia @noinline let a0, b0 = ... a = @inline f(a0) # the compiler will try to inline this call b = notinlined(b0) # the compiler will NOT try to inline this call return a, b end ``` They're both tested and included in documentations. * set ssaflags on `CodeInfo` construction * try to keep source if it will be force-inlined * give inlining when source isn't available * style nits * Update base/compiler/ssair/inlining.jl Co-authored-by: Jameson Nash <[email protected]> * Update src/method.c Co-authored-by: Jameson Nash <[email protected]> * fixup - remove preprocessed flags from `jl_code_info_set_ir` - fix duplicated definition warning - add and fix comments * more clean up * add caveat about the recursive call limitation * update NEWS.md Co-authored-by: Jameson Nash <[email protected]>
…liaLang#42082) After JuliaLang#41328, inference can observe statement flags and try to re-infer a discarded source if it's going to be inlined. The re-inferred source will only be cached into the inference-local cache, and won't be cached globally.
…#18773 (JuliaLang#41328) * optimizer: supports callsite annotations of inlining, fixes JuliaLang#18773 Enable `@inline`/`@noinline` annotations on function callsites. From JuliaLang#40754. Now `@inline` and `@noinline` can be applied to a code block and then the compiler will try to (not) inline calls within the block: ```julia @inline f(...) # The compiler will try to inline `f` @inline f(...) + g(...) # The compiler will try to inline `f`, `g` and `+` @inline f(args...) = ... # Of course annotations on a definition is still allowed ``` Here are couple of notes on how those callsite annotations will work: - callsite annotation always has the precedence over the annotation applied to the definition of the called function, whichever we use `@inline`/`@noinline`: ```julia @inline function explicit_inline(args...) # body end let @noinline explicit_inline(args...) # this call will not be inlined end ``` - when callsite annotations are nested, the innermost annotations has the precedence ```julia @noinline let a0, b0 = ... a = @inline f(a0) # the compiler will try to inline this call b = notinlined(b0) # the compiler will NOT try to inline this call return a, b end ``` They're both tested and included in documentations. * set ssaflags on `CodeInfo` construction * try to keep source if it will be force-inlined * give inlining when source isn't available * style nits * Update base/compiler/ssair/inlining.jl Co-authored-by: Jameson Nash <[email protected]> * Update src/method.c Co-authored-by: Jameson Nash <[email protected]> * fixup - remove preprocessed flags from `jl_code_info_set_ir` - fix duplicated definition warning - add and fix comments * more clean up * add caveat about the recursive call limitation * update NEWS.md Co-authored-by: Jameson Nash <[email protected]>
…liaLang#42082) After JuliaLang#41328, inference can observe statement flags and try to re-infer a discarded source if it's going to be inlined. The re-inferred source will only be cached into the inference-local cache, and won't be cached globally.
Enable
@inline
/@noinline
annotations on function callsites.From #40754.
Now
@inline
and@noinline
can be applied to a code block and thenthe compiler will try to (or try not to) inline calls within the block:
Here are couple of notes on how those callsite annotations will work:
applied to the definition of the called method, whichever we use
@inline
/@noinline
:the precedence
They're both tested and included in documentations.
Co-authored-by: Joseph Tan [email protected]
@inline
and@noinline
#40754