-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix performance issue of @nospecialize
-d keyword func call
#47059
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
base/compiler/tfuncs.jl
Outdated
elseif isa(appl, DataType) && appl.name === _NAMEDTUPLE_NAME && appl.parameters[1] === () | ||
# if the first parameter of `NamedTuple` is known to be empty tuple, | ||
# the second argument should also be empty tuple type, | ||
# so refine it here | ||
return Const(NamedTuple{(),Tuple{}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is unnecessary, but I added this while working on this PR, and I think this strictly improves the inference accuracy. Test cases added.
This comment was marked as outdated.
This comment was marked as outdated.
bf0ce6a
to
9754b6d
Compare
@nanosoldier |
@nospecialize
-d keyword func call@nospecialize
-d keyword func call
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
base/compiler/ssair/passes.jl
Outdated
if is_known_call(argexpr, tuple, compact) && length(ns) == length(argexpr.args)-1 | ||
# ok, we know this NamedTuple construction is nothrow, | ||
# let's mark this NamedTuple as DCE-eligible | ||
compact[leaf::AnySSAValue][:flag] |= IR_FLAG_EFFECT_FREE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we know this from interprocedural analysis?
Why doesn't this get inlined into |
9754b6d
to
deb87b9
Compare
This call is from Line 618 in deb87b9
and the redirected constructor call is union split to:
And the latter contains dynamic dispatch since @nanosoldier |
Can we use the regular error check pattern to switch that around? I.e.
And then throw an |
Well, the problematic dynamic dispatch is not that error path but Lines 411 to 420 in deb87b9
and the latter split is confused for abstract tuple input. |
Ah, ok, so the issue is that we have: NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args) but inference looses the type constraint that the second type parameter is typeequal to diff --git a/base/boot.jl b/base/boot.jl
index 5f3b99df1c..4e02725fc3 100644
--- a/base/boot.jl
+++ b/base/boot.jl
@@ -615,7 +615,8 @@ end
NamedTuple() = NamedTuple{(),Tuple{}}(())
-NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)
+eval(Core, :(NamedTuple{names}(args::Tuple) =
+ $(Expr(:splatnew, :(NamedTuple{names,typeof(args)}), :args))))
using .Intrinsics: sle_int, add_int That should also save us some inference time by not having to infer through the useless unionsplit. |
That sounds quite simple and better! I will work on implementing SROA for |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
deb87b9
to
44d92cd
Compare
Okay, this PR should be ready. |
@nanosoldier |
a86b36b
to
d55f00c
Compare
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
d55f00c
to
5dd7a0b
Compare
The benchmark results look promising. Going to merge once confirm successful CI. |
@nanosoldier |
This commit tries to fix and improve performance for calling keyword funcs whose arguments types are not fully known but `@nospecialize`-d. The final result would look like (this particular example is taken from our Julia-level compiler implementation): ```julia abstract type CallInfo end struct NoCallInfo <: CallInfo end struct NewInstruction stmt::Any type::Any info::CallInfo line::Union{Int32,Nothing} # if nothing, copy the line from previous statement in the insertion location flag::Union{UInt8,Nothing} # if nothing, IR flags will be recomputed on insertion function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo), line::Union{Int32,Nothing}, flag::Union{UInt8,Nothing}) return new(stmt, type, info, line, flag) end end @nospecialize function NewInstruction(newinst::NewInstruction; stmt=newinst.stmt, type=newinst.type, info::CallInfo=newinst.info, line::Union{Int32,Nothing}=newinst.line, flag::Union{UInt8,Nothing}=newinst.flag) return NewInstruction(stmt, type, info, line, flag) end @Specialize using BenchmarkTools struct VirtualKwargs stmt::Any type::Any info::CallInfo end vkws = VirtualKwargs(nothing, Any, NoCallInfo()) newinst = NewInstruction(nothing, Any, NoCallInfo(), nothing, nothing) runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info) @benchmark runner($newinst, $vkws) ``` > on master ``` BenchmarkTools.Trial: 10000 samples with 186 evaluations. Range (min … max): 559.898 ns … 4.173 μs ┊ GC (min … max): 0.00% … 85.29% Time (median): 605.608 ns ┊ GC (median): 0.00% Time (mean ± σ): 638.170 ns ± 125.080 ns ┊ GC (mean ± σ): 0.06% ± 0.85% █▇▂▆▄ ▁█▇▄▂ ▂ ██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █ 560 ns Histogram: log(frequency) by time 1.23 μs < Memory estimate: 32 bytes, allocs estimate: 2. ``` > on this commit ```julia BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 3.080 ns … 83.177 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 3.098 ns ┊ GC (median): 0.00% Time (mean ± σ): 3.118 ns ± 0.885 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▂▅▇█▆▅▄▂ ▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃ 3.08 ns Histogram: frequency by time 3.19 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` So for this particular case it achieves roughly 200x speed up. This is because this commit allows inlining of a call to keyword sorter as well as removal of `NamedTuple` call. Especially this commit is composed of the following improvements: - Add early return case for `structdiff`: This change improves the return type inference for a case when compared `NamedTuple`s are type unstable but there is no difference in their names, e.g. given two `NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}`s. And in such case the optimizer will remove `structdiff` and succeeding `pairs` calls, letting the keyword sorter to be inlined. - Tweak the core `NamedTuple{names}(args::Tuple)` constructor so that it directly forms `:splatnew` allocation rather than redirects to the general `NamedTuple` constructor, that could be confused for abstract input tuple type. - Improve `nfields_tfunc` accuracy as for abstract `NamedTuple` types. This improvement lets `inline_splatnew` to handle more abstract `NamedTuple`s, especially whose names are fully known but its fields tuple type is abstract. Those improvements are combined to allow our SROA pass to optimize away `NamedTuple` and `tuple` calls generated for keyword argument handling. E.g. the IR for the example `NewInstruction` constructor is now fairly optimized, like: ```julia julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info NewInstruction(newinst; stmt, type, info) end |> only 2 1 ── %1 = Base.getfield(_2, :line)::Union{Nothing, Int32} │╻╷ Type##kw │ %2 = Base.getfield(_2, :flag)::Union{Nothing, UInt8} ││┃ getproperty │ %3 = (isa)(%1, Nothing)::Bool ││ │ %4 = (isa)(%2, Nothing)::Bool ││ │ %5 = (Core.Intrinsics.and_int)(%3, %4)::Bool ││ └─── goto #3 if not %5 ││ 2 ── %7 = %new(Main.NewInstruction, _3, _4, _5, nothing, nothing)::NewInstruction NewInstruction └─── goto #10 ││ 3 ── %9 = (isa)(%1, Int32)::Bool ││ │ %10 = (isa)(%2, Nothing)::Bool ││ │ %11 = (Core.Intrinsics.and_int)(%9, %10)::Bool ││ └─── goto #5 if not %11 ││ 4 ── %13 = π (%1, Int32) ││ │ %14 = %new(Main.NewInstruction, _3, _4, _5, %13, nothing)::NewInstruction│││╻ NewInstruction └─── goto #10 ││ 5 ── %16 = (isa)(%1, Nothing)::Bool ││ │ %17 = (isa)(%2, UInt8)::Bool ││ │ %18 = (Core.Intrinsics.and_int)(%16, %17)::Bool ││ └─── goto #7 if not %18 ││ 6 ── %20 = π (%2, UInt8) ││ │ %21 = %new(Main.NewInstruction, _3, _4, _5, nothing, %20)::NewInstruction│││╻ NewInstruction └─── goto #10 ││ 7 ── %23 = (isa)(%1, Int32)::Bool ││ │ %24 = (isa)(%2, UInt8)::Bool ││ │ %25 = (Core.Intrinsics.and_int)(%23, %24)::Bool ││ └─── goto #9 if not %25 ││ 8 ── %27 = π (%1, Int32) ││ │ %28 = π (%2, UInt8) ││ │ %29 = %new(Main.NewInstruction, _3, _4, _5, %27, %28)::NewInstruction │││╻ NewInstruction └─── goto #10 ││ 9 ── Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{} └─── unreachable ││ 10 ┄ %33 = φ (#2 => %7, #4 => %14, #6 => %21, #8 => %29)::NewInstruction ││ └─── goto #11 ││ 11 ─ return %33 │ => NewInstruction ```
5dd7a0b
to
b8a6b10
Compare
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
# if the first/second parameter of `NamedTuple` is known to be empty, | ||
# the second/first argument should also be empty tuple type, | ||
# so refine it here | ||
return Const(NamedTuple{(),Tuple{}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning seems faulty, since couldn't the parameter also be a TypeVar of any sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're quite right:
julia> (()->NamedTuple{(), <:Any})()
NamedTuple{(), Tuple{}}
This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but
@nospecialize
-d.The final result would look like (this particular example is taken from
our Julia-level compiler implementation):
So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of
NamedTuple
call.Especially this commit is composed of the following improvements:
structdiff
:This change improves the return type inference for a case when
compared
NamedTuple
s are type unstable but there is no differencein their names, e.g. given two
NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}
s.And in such case the optimizer will remove
structdiff
and succeedingpairs
calls, letting the keyword sorter to be inlined.NamedTuple{names}(args::Tuple)
constructor so that itdirectly forms
:splatnew
allocation rather than redirects to thegeneral
NamedTuple
constructor, that could be confused for abstractinput tuple type.
nfields_tfunc
accuracy as for abstractNamedTuple
types.This improvement lets
inline_splatnew
to handle more abstractNamedTuple
s, especially whose names are fully known but its fieldstuple type is abstract.
Those improvements are combined to allow our SROA pass to optimize away
NamedTuple
andtuple
calls generated for keyword argument handling.E.g. the IR for the example
NewInstruction
constructor is now fairlyoptimized, like: