fix performance issue of `@nospecialize`-d keyword func call #47059

aviatesk · 2022-10-05T12:52:15Z

This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but @nospecialize-d.

The final result would look like (this particular example is taken from
our Julia-level compiler implementation):

abstract type CallInfo end
struct NoCallInfo <: CallInfo end
struct NewInstruction
    stmt::Any
    type::Any
    info::CallInfo
    line::Int32
    flag::UInt8
    function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo),
                            line::Int32, flag::UInt8)
        return new(stmt, type, info, line, flag)
    end
end
@nospecialize
function NewInstruction(newinst::NewInstruction;
    stmt=newinst.stmt,
    type=newinst.type,
    info::CallInfo=newinst.info,
    line::Int32=newinst.line,
    flag::UInt8=newinst.flag)
    return NewInstruction(stmt, type, info, line, flag)
end
@specialize

using BenchmarkTools
struct VirtualKwargs
    stmt::Any
    type::Any
    info::CallInfo
end
vkws = VirtualKwargs(nothing, Any, NoCallInfo())
newinst = NewInstruction(nothing, Any, NoCallInfo(), zero(Int32), zero(UInt8))
runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info)
@benchmark runner($newinst, $vkws)

on master

BenchmarkTools.Trial: 10000 samples with 186 evaluations.
 Range (min … max):  559.898 ns …   4.173 μs  ┊ GC (min … max): 0.00% … 85.29%
 Time  (median):     605.608 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   638.170 ns ± 125.080 ns  ┊ GC (mean ± σ):  0.06% ±  0.85%

  █▇▂▆▄  ▁█▇▄▂                                                  ▂
  ██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █
  560 ns        Histogram: log(frequency) by time       1.23 μs <

 Memory estimate: 32 bytes, allocs estimate: 2.

on this commit

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  3.080 ns … 83.177 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.098 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.118 ns ±  0.885 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂▅▇█▆▅▄▂
  ▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃
  3.08 ns        Histogram: frequency by time        3.19 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of NamedTuple call.

Especially this commit is composed of the following improvements:

Add early return case for structdiff:
This change improves the return type inference for a case when
compared NamedTuples are type unstable but there is no difference
in their names, e.g. given two NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}s.
And in such case the optimizer will remove structdiff and succeeding
pairs calls, letting the keyword sorter to be inlined.
Tweak the core NamedTuple{names}(args::Tuple) constructor so that it
directly forms :splatnew allocation rather than redirects to the
general NamedTuple constructor, that could be confused for abstract
input tuple type.
Improve nfields_tfunc accuracy as for abstract NamedTuple types.
This improvement lets inline_splatnew to handle more abstract
NamedTuples, especially whose names are fully known but its fields
tuple type is abstract.

Those improvements are combined to allow our SROA pass to optimize away
NamedTuple and tuple calls generated for keyword argument handling.
E.g. the IR for the example NewInstruction constructor is now fairly
optimized, like:

julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info
           NewInstruction(newinst; stmt, type, info)
       end |> only
2 1 ─ %1 = Base.getfield(_2, :line)::Int32                  │╻╷  Type##kw
  │   %2 = Base.getfield(_2, :flag)::UInt8                  ││┃   getproperty
  │   %3 = %new(Main.NewInstruction, _3, _4, _5, %1, %2)::NewInstructionstruction
  └──      return %3                                        │   
 => NewInstruction

aviatesk · 2022-10-05T12:58:43Z

base/compiler/tfuncs.jl

+    elseif isa(appl, DataType) && appl.name === _NAMEDTUPLE_NAME && appl.parameters[1] === ()
+        # if the first parameter of `NamedTuple` is known to be empty tuple,
+        # the second argument should also be empty tuple type,
+        # so refine it here
+        return Const(NamedTuple{(),Tuple{}})


This part is unnecessary, but I added this while working on this PR, and I think this strictly improves the inference accuracy. Test cases added.

aviatesk · 2022-10-05T13:04:24Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2022-10-05T18:52:52Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Keno · 2022-10-05T19:04:12Z

base/compiler/ssair/passes.jl

+                            if is_known_call(argexpr, tuple, compact) && length(ns) == length(argexpr.args)-1
+                                # ok, we know this NamedTuple construction is nothrow,
+                                # let's mark this NamedTuple as DCE-eligible
+                                compact[leaf::AnySSAValue][:flag] |= IR_FLAG_EFFECT_FREE


Why don't we know this from interprocedural analysis?

Keno · 2022-10-05T19:11:46Z

%2 = NamedTuple{(:a, :b, :c)(%1)::NamedTuple{(:a, :b, :c), _A} where _A<:Tuple{Any, Any, Any}

Why doesn't this get inlined into splatnew? Are we just missing a case for it in the inlining cost model? We're also probably missing SROA support for splatnew, but I'd rather fix that than special-casing NamedTuple here.

aviatesk · 2022-10-07T09:00:07Z

Why doesn't this get inlined into splatnew? Are we just missing a case for it in the inlining cost model?

This call is from

julia/base/boot.jl

Line 618 in deb87b9

    
           NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)

and the redirected constructor call is union split to:

julia/base/boot.jl

Lines 622 to 623 in deb87b9

    
           eval(Core, :(NamedTuple{names,T}(args::T) where {names, T <: Tuple} = 
        
                        $(Expr(:splatnew, :(NamedTuple{names,T}), :args))))

julia/base/namedtuple.jl

Lines 90 to 97 in deb87b9

    
           @eval function NamedTuple{names,T}(args::Tuple) where {names, T <: Tuple} 
        
               if length(args) != length(names::Tuple) 
        
                   throw(ArgumentError("Wrong number of arguments to named tuple constructor.")) 
        
               end 
        
               # Note T(args) might not return something of type T; e.g. 
        
               # Tuple{Type{Float64}}((Float64,)) returns a Tuple{DataType} 
        
               $(Expr(:splatnew, :(NamedTuple{names,T}), :(T(args)))) 
        
           end

And the latter contains dynamic dispatch since args is abstract tuple type (and somehow the former is also judged as non-inlineable somehow).

@nanosoldier runbenchmarks("inference", vs=":master")

Keno · 2022-10-07T09:09:00Z

And the latter contains dynamic dispatch since args is abstract tuple type (and somehow the former is also judged as non-inlineable somehow).

Can we use the regular error check pattern to switch that around? I.e.

@noinline function _check_nt_length(names, args)
     if length(args) != length(names::Tuple) 
         throw(ArgumentError("Wrong number of arguments to named tuple constructor.")) 
     end
     return nothing 
end

And then throw an @inline on the function containing the splatnew? Inlining those kinds of things is usually quite valuable.

aviatesk · 2022-10-07T09:13:37Z

Can we use the regular error check pattern to switch that around?

Well, the problematic dynamic dispatch is not that error path but T(args), which is union-split to

julia/base/essentials.jl

Lines 411 to 420 in deb87b9

    
           convert(::Type{T}, x::T) where {T<:Tuple} = x 
        
           function convert(::Type{T}, x::NTuple{N,Any}) where {N, T<:Tuple} 
        
               # First see if there could be any conversion of the input type that'd be a subtype of the output. 
        
               # If not, we'll throw an explicit MethodError (otherwise, it might throw a typeassert). 
        
               if typeintersect(NTuple{N,Any}, T) === Union{} 
        
                   _tuple_error(T, x) 
        
               end 
        
               cvt1(n) = (@inline; convert(fieldtype(T, n), getfield(x, n, #=boundscheck=#false))) 
        
               return ntuple(cvt1, Val(N))::NTuple{N,Any} 
        
           end

and the latter split is confused for abstract tuple input.

Keno · 2022-10-07T09:20:57Z

Ah, ok, so the issue is that we have:

NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)

but inference looses the type constraint that the second type parameter is typeequal to typeof(args), so it tries to unionsplit it into that mess. How about the following change then:

diff --git a/base/boot.jl b/base/boot.jl
index 5f3b99df1c..4e02725fc3 100644
--- a/base/boot.jl
+++ b/base/boot.jl
@@ -615,7 +615,8 @@ end
 
 NamedTuple() = NamedTuple{(),Tuple{}}(())
 
-NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)
+eval(Core, :(NamedTuple{names}(args::Tuple) =
+             $(Expr(:splatnew, :(NamedTuple{names,typeof(args)}), :args))))
 
 using .Intrinsics: sle_int, add_int

That should also save us some inference time by not having to infer through the useless unionsplit.

aviatesk · 2022-10-07T09:26:34Z

That sounds quite simple and better! I will work on implementing SROA for :splatnew.

nanosoldier · 2022-10-07T09:55:00Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

aviatesk · 2022-10-08T01:43:19Z

Okay, this PR should be ready.
I found we actually have inline_splatnew! pass within the inlining that transforms :splatnew to :new expression. And the root problem turned out to be that our nfields_tfunc isn't accurate enough to handle abstract NamedTuple whose names are nevertheless fully known and thus the number of their fields could be known.
This PR improves that accuracy, and now our SROA pass optimizes away more NamedTuples constructed for keyword argument handling.

aviatesk · 2022-10-08T01:53:35Z

@nanosoldier runbenchmarks("inference", vs=":master")

base/compiler/tfuncs.jl

nanosoldier · 2022-10-08T02:48:11Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

aviatesk · 2022-10-08T03:59:14Z

The benchmark results look promising. Going to merge once confirm successful CI.

aviatesk · 2022-10-08T04:01:00Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@benchmark

This commit tries to fix and improve performance for calling keyword funcs whose arguments types are not fully known but `@nospecialize`-d. The final result would look like (this particular example is taken from our Julia-level compiler implementation): ```julia abstract type CallInfo end struct NoCallInfo <: CallInfo end struct NewInstruction stmt::Any type::Any info::CallInfo line::Union{Int32,Nothing} # if nothing, copy the line from previous statement in the insertion location flag::Union{UInt8,Nothing} # if nothing, IR flags will be recomputed on insertion function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo), line::Union{Int32,Nothing}, flag::Union{UInt8,Nothing}) return new(stmt, type, info, line, flag) end end @nospecialize function NewInstruction(newinst::NewInstruction; stmt=newinst.stmt, type=newinst.type, info::CallInfo=newinst.info, line::Union{Int32,Nothing}=newinst.line, flag::Union{UInt8,Nothing}=newinst.flag) return NewInstruction(stmt, type, info, line, flag) end @Specialize using BenchmarkTools struct VirtualKwargs stmt::Any type::Any info::CallInfo end vkws = VirtualKwargs(nothing, Any, NoCallInfo()) newinst = NewInstruction(nothing, Any, NoCallInfo(), nothing, nothing) runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info) @benchmark runner($newinst, $vkws) ``` > on master ``` BenchmarkTools.Trial: 10000 samples with 186 evaluations. Range (min … max): 559.898 ns … 4.173 μs ┊ GC (min … max): 0.00% … 85.29% Time (median): 605.608 ns ┊ GC (median): 0.00% Time (mean ± σ): 638.170 ns ± 125.080 ns ┊ GC (mean ± σ): 0.06% ± 0.85% █▇▂▆▄ ▁█▇▄▂ ▂ ██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █ 560 ns Histogram: log(frequency) by time 1.23 μs < Memory estimate: 32 bytes, allocs estimate: 2. ``` > on this commit ```julia BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 3.080 ns … 83.177 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 3.098 ns ┊ GC (median): 0.00% Time (mean ± σ): 3.118 ns ± 0.885 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▂▅▇█▆▅▄▂ ▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃ 3.08 ns Histogram: frequency by time 3.19 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` So for this particular case it achieves roughly 200x speed up. This is because this commit allows inlining of a call to keyword sorter as well as removal of `NamedTuple` call. Especially this commit is composed of the following improvements: - Add early return case for `structdiff`: This change improves the return type inference for a case when compared `NamedTuple`s are type unstable but there is no difference in their names, e.g. given two `NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}`s. And in such case the optimizer will remove `structdiff` and succeeding `pairs` calls, letting the keyword sorter to be inlined. - Tweak the core `NamedTuple{names}(args::Tuple)` constructor so that it directly forms `:splatnew` allocation rather than redirects to the general `NamedTuple` constructor, that could be confused for abstract input tuple type. - Improve `nfields_tfunc` accuracy as for abstract `NamedTuple` types. This improvement lets `inline_splatnew` to handle more abstract `NamedTuple`s, especially whose names are fully known but its fields tuple type is abstract. Those improvements are combined to allow our SROA pass to optimize away `NamedTuple` and `tuple` calls generated for keyword argument handling. E.g. the IR for the example `NewInstruction` constructor is now fairly optimized, like: ```julia julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info NewInstruction(newinst; stmt, type, info) end |> only 2 1 ── %1 = Base.getfield(_2, :line)::Union{Nothing, Int32} │╻╷ Type##kw │ %2 = Base.getfield(_2, :flag)::Union{Nothing, UInt8} ││┃ getproperty │ %3 = (isa)(%1, Nothing)::Bool ││ │ %4 = (isa)(%2, Nothing)::Bool ││ │ %5 = (Core.Intrinsics.and_int)(%3, %4)::Bool ││ └─── goto #3 if not %5 ││ 2 ── %7 = %new(Main.NewInstruction, _3, _4, _5, nothing, nothing)::NewInstruction NewInstruction └─── goto #10 ││ 3 ── %9 = (isa)(%1, Int32)::Bool ││ │ %10 = (isa)(%2, Nothing)::Bool ││ │ %11 = (Core.Intrinsics.and_int)(%9, %10)::Bool ││ └─── goto #5 if not %11 ││ 4 ── %13 = π (%1, Int32) ││ │ %14 = %new(Main.NewInstruction, _3, _4, _5, %13, nothing)::NewInstruction│││╻ NewInstruction └─── goto #10 ││ 5 ── %16 = (isa)(%1, Nothing)::Bool ││ │ %17 = (isa)(%2, UInt8)::Bool ││ │ %18 = (Core.Intrinsics.and_int)(%16, %17)::Bool ││ └─── goto #7 if not %18 ││ 6 ── %20 = π (%2, UInt8) ││ │ %21 = %new(Main.NewInstruction, _3, _4, _5, nothing, %20)::NewInstruction│││╻ NewInstruction └─── goto #10 ││ 7 ── %23 = (isa)(%1, Int32)::Bool ││ │ %24 = (isa)(%2, UInt8)::Bool ││ │ %25 = (Core.Intrinsics.and_int)(%23, %24)::Bool ││ └─── goto #9 if not %25 ││ 8 ── %27 = π (%1, Int32) ││ │ %28 = π (%2, UInt8) ││ │ %29 = %new(Main.NewInstruction, _3, _4, _5, %27, %28)::NewInstruction │││╻ NewInstruction └─── goto #10 ││ 9 ── Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{} └─── unreachable ││ 10 ┄ %33 = φ (#2 => %7, #4 => %14, #6 => %21, #8 => %29)::NewInstruction ││ └─── goto #11 ││ 11 ─ return %33 │ => NewInstruction ```

nanosoldier · 2022-10-08T09:50:50Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

vtjnash · 2022-10-26T18:39:20Z

base/compiler/tfuncs.jl

+        # if the first/second parameter of `NamedTuple` is known to be empty,
+        # the second/first argument should also be empty tuple type,
+        # so refine it here
+        return Const(NamedTuple{(),Tuple{}})


This reasoning seems faulty, since couldn't the parameter also be a TypeVar of any sort?

You're quite right:

julia> (()->NamedTuple{(), <:Any})() NamedTuple{(), Tuple{}}

…all (#47059)" This reverts commit 95cfd62.

aviatesk added keyword arguments f(x; keyword=arguments) compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Oct 5, 2022

aviatesk requested review from vtjnash and JeffBezanson October 5, 2022 12:52

This comment was marked as outdated.

Sign in to view

aviatesk commented Oct 5, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from bf0ce6a to 9754b6d Compare October 5, 2022 13:01

aviatesk changed the title ~~improve performance issue of @nospecialize-d keyword func call~~ fix performance issue of @nospecialize-d keyword func call Oct 5, 2022

Keno reviewed Oct 5, 2022

View reviewed changes

aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from 9754b6d to deb87b9 Compare October 7, 2022 08:33

aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from deb87b9 to 44d92cd Compare October 8, 2022 01:40

Keno reviewed Oct 8, 2022

View reviewed changes

base/compiler/tfuncs.jl Show resolved Hide resolved

aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch 2 times, most recently from a86b36b to d55f00c Compare October 8, 2022 02:36

aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from d55f00c to 5dd7a0b Compare October 8, 2022 03:58

aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from 5dd7a0b to b8a6b10 Compare October 8, 2022 04:22

aviatesk merged commit 95cfd62 into master Oct 8, 2022

aviatesk deleted the avi/nospecialized-keywordfunc-perf branch October 8, 2022 06:39

vtjnash reviewed Oct 26, 2022

View reviewed changes

maleadt mentioned this pull request Nov 7, 2022

Performance regression in master when splatting tuple of pairs #47479

Closed

KristofferC pushed a commit that referenced this pull request Dec 20, 2022

Revert "improve performance issue of @nospecialize-d keyword func c…

51f1df2

…all (#47059)" This reverts commit 95cfd62.

KristofferC mentioned this pull request Dec 20, 2022

incorrect inference of NamedTuple{(), <:Any} #47481

Closed

KristofferC pushed a commit that referenced this pull request Dec 20, 2022

Revert "improve performance issue of @nospecialize-d keyword func c…

5a684f0

…all (#47059)" This reverts commit 95cfd62.

aviatesk mentioned this pull request Dec 21, 2022

fix incorrect inference of NamedTuple{(), <:Any} #47947

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix performance issue of `@nospecialize`-d keyword func call #47059

fix performance issue of `@nospecialize`-d keyword func call #47059

aviatesk commented Oct 5, 2022 •

edited

Loading

This comment was marked as outdated.

aviatesk Oct 5, 2022

This comment was marked as outdated.

aviatesk commented Oct 5, 2022

nanosoldier commented Oct 5, 2022

Keno Oct 5, 2022

Keno commented Oct 5, 2022

aviatesk commented Oct 7, 2022

Keno commented Oct 7, 2022

aviatesk commented Oct 7, 2022

Keno commented Oct 7, 2022 •

edited

Loading

aviatesk commented Oct 7, 2022

nanosoldier commented Oct 7, 2022

aviatesk commented Oct 8, 2022

aviatesk commented Oct 8, 2022

nanosoldier commented Oct 8, 2022

aviatesk commented Oct 8, 2022

aviatesk commented Oct 8, 2022

nanosoldier commented Oct 8, 2022

vtjnash Oct 26, 2022

JeffBezanson Nov 7, 2022

fix performance issue of @nospecialize-d keyword func call #47059

fix performance issue of @nospecialize-d keyword func call #47059

Conversation

aviatesk commented Oct 5, 2022 • edited Loading

This comment was marked as outdated.

aviatesk Oct 5, 2022

Choose a reason for hiding this comment

This comment was marked as outdated.

aviatesk commented Oct 5, 2022

nanosoldier commented Oct 5, 2022

Keno Oct 5, 2022

Choose a reason for hiding this comment

Keno commented Oct 5, 2022

aviatesk commented Oct 7, 2022

Keno commented Oct 7, 2022

aviatesk commented Oct 7, 2022

Keno commented Oct 7, 2022 • edited Loading

aviatesk commented Oct 7, 2022

nanosoldier commented Oct 7, 2022

aviatesk commented Oct 8, 2022

aviatesk commented Oct 8, 2022

nanosoldier commented Oct 8, 2022

aviatesk commented Oct 8, 2022

aviatesk commented Oct 8, 2022

nanosoldier commented Oct 8, 2022

vtjnash Oct 26, 2022

Choose a reason for hiding this comment

JeffBezanson Nov 7, 2022

Choose a reason for hiding this comment

fix performance issue of `@nospecialize`-d keyword func call #47059

fix performance issue of `@nospecialize`-d keyword func call #47059

aviatesk commented Oct 5, 2022 •

edited

Loading

Keno commented Oct 7, 2022 •

edited

Loading