-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference regression because of semi-concrete interpretation #47349
Comments
Cthulhu doesn't help here, on both sides of the semi-concrete merge the Julia and LLVM IR it reports is identical, and always contain static calls to |
@brenhinkeller This might be what we are seeing in StaticCompiler.jl with the overlays on. |
Oh! Interesting.. |
It looks like this issue is because of semi-concrete interpretation on overlayed methods (although I'm still debugging why that happens). function Core.Compiler.concrete_eval_eligible(interp::CustomInterpreter,
@nospecialize(f), result::Core.Compiler.MethodCallResult, arginfo::Core.Compiler.ArgInfo)
ret = @invoke Core.Compiler.concrete_eval_eligible(interp::AbstractInterpreter,
f::Any, result::Core.Compiler.MethodCallResult, arginfo::Core.Compiler.ArgInfo)
ret === false && return nothing # XXX JuliaLang/julia#47349
return ret
end and get an (seemingly) optimal LLVM IR. |
Update: the root problem here seems to be a general precision issue of semi-concrete interpretation: More specifically semi-concrete interpretation is enabled for With this overload, we can see there are some precision issue within semi-concrete eval: @eval Core.Compiler function abstract_call_method_with_const_args(interp::AbstractInterpreter,
result::MethodCallResult, @nospecialize(f), arginfo::ArgInfo, si::StmtInfo, match::MethodMatch,
sv::InferenceState, invokecall::Union{Nothing,InvokeCall}=nothing)
if !const_prop_enabled(interp, sv, match)
return nothing
end
if is_removable_if_unused(result.effects)
if isa(result.rt, Const) || call_result_unused(si)
add_remark!(interp, sv, "[constprop] No more information to be gained")
return nothing
end
end
res = concrete_eval_call(interp, f, result, arginfo, si, sv, invokecall)
isa(res, ConstCallResults) && return res
mi = maybe_get_const_prop_profitable(interp, result, f, arginfo, si, match, sv)
mi === nothing && return nothing
# try semi-concrete evaluation
local semiresult = nothing
if res::Bool && !has_conditional(arginfo)
mi_cache = WorldView(code_cache(interp), sv.world)
code = get(mi_cache, mi, nothing)
if code !== nothing
ir = codeinst_to_ir(interp, code)
if isa(ir, IRCode)
irsv = IRInterpretationState(interp, ir, mi, sv.world, arginfo.argtypes)
rt = ir_abstract_constant_propagation(interp, irsv)
if !isa(rt, Type) || typeintersect(rt, Bool) === Union{}
semiresult = (rt, mi, ir)
end
end
end
end
# try constant prop'
inf_cache = get_inference_cache(interp)
inf_result = cache_lookup(typeinf_lattice(interp), mi, arginfo.argtypes, inf_cache)
if inf_result === nothing
# if there might be a cycle, check to make sure we don't end up
# calling ourselves here.
if result.edgecycle && (result.edgelimited ?
is_constprop_method_recursed(match.method, sv) :
# if the type complexity limiting didn't decide to limit the call signature (`result.edgelimited = false`)
# we can relax the cycle detection by comparing `MethodInstance`s and allow inference to
# propagate different constant elements if the recursion is finite over the lattice
is_constprop_edge_recursed(mi, sv))
add_remark!(interp, sv, "[constprop] Edge cycle encountered")
return nothing
end
inf_result = InferenceResult(mi, (arginfo, sv))
if !any(inf_result.overridden_by_const)
add_remark!(interp, sv, "[constprop] Could not handle constant info in matching_cache_argtypes")
return nothing
end
frame = InferenceState(inf_result, #=cache=#:local, interp)
if frame === nothing
add_remark!(interp, sv, "[constprop] Could not retrieve the source")
return nothing # this is probably a bad generated function (unsound), but just ignore it
end
frame.parent = sv
if !typeinf(interp, frame)
add_remark!(interp, sv, "[constprop] Fresh constant inference hit a cycle")
return nothing
end
@assert !isa(inf_result.result, InferenceState)
else
if isa(inf_result.result, InferenceState)
add_remark!(interp, sv, "[constprop] Found cached constant inference in a cycle")
return nothing
end
end
if semiresult !== nothing
(rt, mi, ir) = semiresult
if inf_result.result ⊏ rt
println("semi concrete result is worse than constprop': ", sv.linfo, mi, sv.linfo.def)
println(rt, " vs. ", inf_result.result)
end
end
return ConstCallResults(inf_result.result, ConstPropResult(inf_result), inf_result.ipo_effects, mi)
end
I will try to come up with a minimum target and fix the precision issue, but for the meanwhile external |
#45459 moved `:static_parameter` always to statement position as our optimizer assumes `:static_parameter` in value position effect-free. But it turns out that it can cause precision issue for semi-concrete interpretation as discovered at #47349, since the type of `:static_parameter` in statement position is widened when converted to compressed IR for cache. This commit follows up #45459 so that we inline effect-free `:static_parameter` during IR conversion and get a more reasonable semi-concrete interpretation.
#45459 moved `:static_parameter` always to statement position as our optimizer assumes `:static_parameter` in value position effect-free. But it turns out that it can cause precision issue for semi-concrete interpretation as discovered at #47349, since the type of `:static_parameter` in statement position is widened when converted to compressed IR for cache. This commit follows up #45459 so that we inline effect-free `:static_parameter` during IR conversion and get a more reasonable semi-concrete interpretation.
This particular issue got fixed, but I think there are other inference precision issue within semi-concrete interpretation at this moment. So I suggest you keep JuliaGPU/GPUCompiler.jl#369 for the meanwhile. I will ping you once I get satisfied with incoming fixups (@maleadt). |
This should let us generate a smaller IR in general, as well as reducing chances that the semi-concrete interpretation causes the precision issue that is caused by the current limitation that a compressed IR can't keep propagate a constant type information in statement position to the semi-concrete interpretation (xref: #47349).
This should let us generate a smaller IR in general, as well as reducing chances that the semi-concrete interpretation causes the precision issue that is caused by the current limitation that a compressed IR can't keep propagate a constant type information in statement position to the semi-concrete interpretation (xref: #47349).
#47994 fixed most cases where semi-concrete interpretation ended up with a wider result than constant propagation. It may be okay to turn on semi-concrete interpretation in GPUCompiler after 1.10.0-DEV.203. However, there are still a few known remaining cases, so you may want to keep the overload unless you are willing to optimize compilation performance at the risk of inference regression. |
I have a fairly involved MWE, reduced from a CUDA.jl bug report, where a StaticArrays-based kernel (doing broadcast with
floor
) stop generating static code when combined with a method overlay definition ofBase.isnan
. Essentially:On 1.8, the generated IR is fully static and contains calls to the floor intrinsic:
On 1.9 however, the IR is full of dynamic dispatches and no
floor
to be seen, even though thecode_warntype
looks similar. I've bisected this to #44803, but have yet to look into this any closer.The full MWE, reduced from GPUCompiler.jl:
This also serves as a "minimal" example how to generate LLVM IR using a custom interpreter and cache, which is significantly slower on 1.9, taking ~7 seconds where it only used to take about 3, so I guess it also demonstrates #47296.
The text was updated successfully, but these errors were encountered: