Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix performance issue of @nospecialize-d keyword func call #47059

Merged
merged 1 commit into from
Oct 8, 2022

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Oct 5, 2022

This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but @nospecialize-d.

The final result would look like (this particular example is taken from
our Julia-level compiler implementation):

abstract type CallInfo end
struct NoCallInfo <: CallInfo end
struct NewInstruction
    stmt::Any
    type::Any
    info::CallInfo
    line::Int32
    flag::UInt8
    function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo),
                            line::Int32, flag::UInt8)
        return new(stmt, type, info, line, flag)
    end
end
@nospecialize
function NewInstruction(newinst::NewInstruction;
    stmt=newinst.stmt,
    type=newinst.type,
    info::CallInfo=newinst.info,
    line::Int32=newinst.line,
    flag::UInt8=newinst.flag)
    return NewInstruction(stmt, type, info, line, flag)
end
@specialize

using BenchmarkTools
struct VirtualKwargs
    stmt::Any
    type::Any
    info::CallInfo
end
vkws = VirtualKwargs(nothing, Any, NoCallInfo())
newinst = NewInstruction(nothing, Any, NoCallInfo(), zero(Int32), zero(UInt8))
runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info)
@benchmark runner($newinst, $vkws)

on master

BenchmarkTools.Trial: 10000 samples with 186 evaluations.
 Range (min … max):  559.898 ns …   4.173 μs  ┊ GC (min … max): 0.00% … 85.29%
 Time  (median):     605.608 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   638.170 ns ± 125.080 ns  ┊ GC (mean ± σ):  0.06% ±  0.85%

  █▇▂▆▄  ▁█▇▄▂                                                  ▂
  ██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █
  560 ns        Histogram: log(frequency) by time       1.23 μs <

 Memory estimate: 32 bytes, allocs estimate: 2.

on this commit

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  3.080 ns  83.177 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.098 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.118 ns ±  0.885 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂▅▇█▆▅▄▂
  ▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃
  3.08 ns        Histogram: frequency by time        3.19 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of NamedTuple call.

Especially this commit is composed of the following improvements:

  • Add early return case for structdiff:
    This change improves the return type inference for a case when
    compared NamedTuples are type unstable but there is no difference
    in their names, e.g. given two NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}s.
    And in such case the optimizer will remove structdiff and succeeding
    pairs calls, letting the keyword sorter to be inlined.
  • Tweak the core NamedTuple{names}(args::Tuple) constructor so that it
    directly forms :splatnew allocation rather than redirects to the
    general NamedTuple constructor, that could be confused for abstract
    input tuple type.
  • Improve nfields_tfunc accuracy as for abstract NamedTuple types.
    This improvement lets inline_splatnew to handle more abstract
    NamedTuples, especially whose names are fully known but its fields
    tuple type is abstract.

Those improvements are combined to allow our SROA pass to optimize away
NamedTuple and tuple calls generated for keyword argument handling.
E.g. the IR for the example NewInstruction constructor is now fairly
optimized, like:

julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info
           NewInstruction(newinst; stmt, type, info)
       end |> only
2 1%1 = Base.getfield(_2, :line)::Int32                  │╻╷  Type##kw%2 = Base.getfield(_2, :flag)::UInt8                  ││┃   getproperty
  │   %3 = %new(Main.NewInstruction, _3, _4, _5, %1, %2)::NewInstructionstruction
  └──      return %3=> NewInstruction

@aviatesk aviatesk added keyword arguments f(x; keyword=arguments) compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Oct 5, 2022
@aviatesk

This comment was marked as outdated.

Comment on lines 1599 to 1610
elseif isa(appl, DataType) && appl.name === _NAMEDTUPLE_NAME && appl.parameters[1] === ()
# if the first parameter of `NamedTuple` is known to be empty tuple,
# the second argument should also be empty tuple type,
# so refine it here
return Const(NamedTuple{(),Tuple{}})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is unnecessary, but I added this while working on this PR, and I think this strictly improves the inference accuracy. Test cases added.

@nanosoldier

This comment was marked as outdated.

@aviatesk aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from bf0ce6a to 9754b6d Compare October 5, 2022 13:01
@aviatesk
Copy link
Member Author

aviatesk commented Oct 5, 2022

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@aviatesk aviatesk changed the title improve performance issue of @nospecialize-d keyword func call fix performance issue of @nospecialize-d keyword func call Oct 5, 2022
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

if is_known_call(argexpr, tuple, compact) && length(ns) == length(argexpr.args)-1
# ok, we know this NamedTuple construction is nothrow,
# let's mark this NamedTuple as DCE-eligible
compact[leaf::AnySSAValue][:flag] |= IR_FLAG_EFFECT_FREE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we know this from interprocedural analysis?

@Keno
Copy link
Member

Keno commented Oct 5, 2022

%2 = NamedTuple{(:a, :b, :c)(%1)::NamedTuple{(:a, :b, :c), _A} where _A<:Tuple{Any, Any, Any}

Why doesn't this get inlined into splatnew? Are we just missing a case for it in the inlining cost model? We're also probably missing SROA support for splatnew, but I'd rather fix that than special-casing NamedTuple here.

@aviatesk aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from 9754b6d to deb87b9 Compare October 7, 2022 08:33
@aviatesk
Copy link
Member Author

aviatesk commented Oct 7, 2022

Why doesn't this get inlined into splatnew? Are we just missing a case for it in the inlining cost model?

This call is from

NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)

and the redirected constructor call is union split to:

  • julia/base/boot.jl

    Lines 622 to 623 in deb87b9

    eval(Core, :(NamedTuple{names,T}(args::T) where {names, T <: Tuple} =
    $(Expr(:splatnew, :(NamedTuple{names,T}), :args))))
  • julia/base/namedtuple.jl

    Lines 90 to 97 in deb87b9

    @eval function NamedTuple{names,T}(args::Tuple) where {names, T <: Tuple}
    if length(args) != length(names::Tuple)
    throw(ArgumentError("Wrong number of arguments to named tuple constructor."))
    end
    # Note T(args) might not return something of type T; e.g.
    # Tuple{Type{Float64}}((Float64,)) returns a Tuple{DataType}
    $(Expr(:splatnew, :(NamedTuple{names,T}), :(T(args))))
    end

And the latter contains dynamic dispatch since args is abstract tuple type (and somehow the former is also judged as non-inlineable somehow).


@nanosoldier runbenchmarks("inference", vs=":master")

@Keno
Copy link
Member

Keno commented Oct 7, 2022

And the latter contains dynamic dispatch since args is abstract tuple type (and somehow the former is also judged as non-inlineable somehow).

Can we use the regular error check pattern to switch that around? I.e.

@noinline function _check_nt_length(names, args)
     if length(args) != length(names::Tuple) 
         throw(ArgumentError("Wrong number of arguments to named tuple constructor.")) 
     end
     return nothing 
end

And then throw an @inline on the function containing the splatnew? Inlining those kinds of things is usually quite valuable.

@aviatesk
Copy link
Member Author

aviatesk commented Oct 7, 2022

Can we use the regular error check pattern to switch that around?

Well, the problematic dynamic dispatch is not that error path but T(args), which is union-split to

julia/base/essentials.jl

Lines 411 to 420 in deb87b9

convert(::Type{T}, x::T) where {T<:Tuple} = x
function convert(::Type{T}, x::NTuple{N,Any}) where {N, T<:Tuple}
# First see if there could be any conversion of the input type that'd be a subtype of the output.
# If not, we'll throw an explicit MethodError (otherwise, it might throw a typeassert).
if typeintersect(NTuple{N,Any}, T) === Union{}
_tuple_error(T, x)
end
cvt1(n) = (@inline; convert(fieldtype(T, n), getfield(x, n, #=boundscheck=#false)))
return ntuple(cvt1, Val(N))::NTuple{N,Any}
end

and the latter split is confused for abstract tuple input.

@Keno
Copy link
Member

Keno commented Oct 7, 2022

Ah, ok, so the issue is that we have:

NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)

but inference looses the type constraint that the second type parameter is typeequal to typeof(args), so it tries to unionsplit it into that mess. How about the following change then:

diff --git a/base/boot.jl b/base/boot.jl
index 5f3b99df1c..4e02725fc3 100644
--- a/base/boot.jl
+++ b/base/boot.jl
@@ -615,7 +615,8 @@ end
 
 NamedTuple() = NamedTuple{(),Tuple{}}(())
 
-NamedTuple{names}(args::Tuple) where {names} = NamedTuple{names,typeof(args)}(args)
+eval(Core, :(NamedTuple{names}(args::Tuple) =
+             $(Expr(:splatnew, :(NamedTuple{names,typeof(args)}), :args))))
 
 using .Intrinsics: sle_int, add_int

That should also save us some inference time by not having to infer through the useless unionsplit.

@aviatesk
Copy link
Member Author

aviatesk commented Oct 7, 2022

That sounds quite simple and better! I will work on implementing SROA for :splatnew.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from deb87b9 to 44d92cd Compare October 8, 2022 01:40
@aviatesk
Copy link
Member Author

aviatesk commented Oct 8, 2022

Okay, this PR should be ready.
I found we actually have inline_splatnew! pass within the inlining that transforms :splatnew to :new expression. And the root problem turned out to be that our nfields_tfunc isn't accurate enough to handle abstract NamedTuple whose names are nevertheless fully known and thus the number of their fields could be known.
This PR improves that accuracy, and now our SROA pass optimizes away more NamedTuples constructed for keyword argument handling.

@aviatesk
Copy link
Member Author

aviatesk commented Oct 8, 2022

@nanosoldier runbenchmarks("inference", vs=":master")

@aviatesk aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch 2 times, most recently from a86b36b to d55f00c Compare October 8, 2022 02:36
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from d55f00c to 5dd7a0b Compare October 8, 2022 03:58
@aviatesk
Copy link
Member Author

aviatesk commented Oct 8, 2022

The benchmark results look promising. Going to merge once confirm successful CI.

@aviatesk
Copy link
Member Author

aviatesk commented Oct 8, 2022

@nanosoldier runbenchmarks(!"scalar", vs=":master")

This commit tries to fix and improve performance for calling keyword
funcs whose arguments types are not fully known but `@nospecialize`-d.

The final result would look like (this particular example is taken from
our Julia-level compiler implementation):
```julia
abstract type CallInfo end
struct NoCallInfo <: CallInfo end
struct NewInstruction
    stmt::Any
    type::Any
    info::CallInfo
    line::Union{Int32,Nothing} # if nothing, copy the line from previous statement in the insertion location
    flag::Union{UInt8,Nothing} # if nothing, IR flags will be recomputed on insertion
    function NewInstruction(@nospecialize(stmt), @nospecialize(type), @nospecialize(info::CallInfo),
                            line::Union{Int32,Nothing}, flag::Union{UInt8,Nothing})
        return new(stmt, type, info, line, flag)
    end
end
@nospecialize
function NewInstruction(newinst::NewInstruction;
    stmt=newinst.stmt,
    type=newinst.type,
    info::CallInfo=newinst.info,
    line::Union{Int32,Nothing}=newinst.line,
    flag::Union{UInt8,Nothing}=newinst.flag)
    return NewInstruction(stmt, type, info, line, flag)
end
@Specialize

using BenchmarkTools
struct VirtualKwargs
    stmt::Any
    type::Any
    info::CallInfo
end
vkws = VirtualKwargs(nothing, Any, NoCallInfo())
newinst = NewInstruction(nothing, Any, NoCallInfo(), nothing, nothing)
runner(newinst, vkws) = NewInstruction(newinst; vkws.stmt, vkws.type, vkws.info)
@benchmark runner($newinst, $vkws)
```

> on master
```
BenchmarkTools.Trial: 10000 samples with 186 evaluations.
 Range (min … max):  559.898 ns …   4.173 μs  ┊ GC (min … max): 0.00% … 85.29%
 Time  (median):     605.608 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   638.170 ns ± 125.080 ns  ┊ GC (mean ± σ):  0.06% ±  0.85%

  █▇▂▆▄  ▁█▇▄▂                                                  ▂
  ██████▅██████▇▇▇██████▇▇▇▆▆▅▄▅▄▂▄▄▅▇▆▆▆▆▆▅▆▆▄▄▅▅▄▃▄▄▄▅▃▅▅▆▅▆▆ █
  560 ns        Histogram: log(frequency) by time       1.23 μs <

 Memory estimate: 32 bytes, allocs estimate: 2.
```

> on this commit
```julia
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  3.080 ns … 83.177 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.098 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.118 ns ±  0.885 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂▅▇█▆▅▄▂
  ▂▄▆▆▇████████▆▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂ ▃
  3.08 ns        Histogram: frequency by time        3.19 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

So for this particular case it achieves roughly 200x speed up.
This is because this commit allows inlining of a call to keyword sorter
as well as removal of `NamedTuple` call.

Especially this commit is composed of the following improvements:
- Add early return case for `structdiff`:
  This change improves the return type inference for a case when
  compared `NamedTuple`s are type unstable but there is no difference
  in their names, e.g. given two `NamedTuple{(:a,:b),T} where T<:Tuple{Any,Any}`s.
  And in such case the optimizer will remove `structdiff` and succeeding
  `pairs` calls, letting the keyword sorter to be inlined.
- Tweak the core `NamedTuple{names}(args::Tuple)` constructor so that it
  directly forms `:splatnew` allocation rather than redirects to the
  general `NamedTuple` constructor, that could be confused for abstract
  input tuple type.
- Improve `nfields_tfunc` accuracy as for abstract `NamedTuple` types.
  This improvement lets `inline_splatnew` to handle more abstract
  `NamedTuple`s, especially whose names are fully known but its fields
  tuple type is abstract.

Those improvements are combined to allow our SROA pass to optimize away
`NamedTuple` and `tuple` calls generated for keyword argument handling.
E.g. the IR for the example `NewInstruction` constructor is now fairly
optimized, like:
```julia
julia> Base.code_ircode((NewInstruction,Any,Any,CallInfo)) do newinst, stmt, type, info
           NewInstruction(newinst; stmt, type, info)
       end |> only
2 1 ── %1  = Base.getfield(_2, :line)::Union{Nothing, Int32}                    │╻╷  Type##kw
  │    %2  = Base.getfield(_2, :flag)::Union{Nothing, UInt8}                    ││┃   getproperty
  │    %3  = (isa)(%1, Nothing)::Bool                                           ││
  │    %4  = (isa)(%2, Nothing)::Bool                                           ││
  │    %5  = (Core.Intrinsics.and_int)(%3, %4)::Bool                            ││
  └───       goto #3 if not %5                                                  ││
  2 ── %7  = %new(Main.NewInstruction, _3, _4, _5, nothing, nothing)::NewInstruction   NewInstruction
  └───       goto #10                                                           ││
  3 ── %9  = (isa)(%1, Int32)::Bool                                             ││
  │    %10 = (isa)(%2, Nothing)::Bool                                           ││
  │    %11 = (Core.Intrinsics.and_int)(%9, %10)::Bool                           ││
  └───       goto #5 if not %11                                                 ││
  4 ── %13 = π (%1, Int32)                                                      ││
  │    %14 = %new(Main.NewInstruction, _3, _4, _5, %13, nothing)::NewInstruction│││╻   NewInstruction
  └───       goto #10                                                           ││
  5 ── %16 = (isa)(%1, Nothing)::Bool                                           ││
  │    %17 = (isa)(%2, UInt8)::Bool                                             ││
  │    %18 = (Core.Intrinsics.and_int)(%16, %17)::Bool                          ││
  └───       goto #7 if not %18                                                 ││
  6 ── %20 = π (%2, UInt8)                                                      ││
  │    %21 = %new(Main.NewInstruction, _3, _4, _5, nothing, %20)::NewInstruction│││╻   NewInstruction
  └───       goto #10                                                           ││
  7 ── %23 = (isa)(%1, Int32)::Bool                                             ││
  │    %24 = (isa)(%2, UInt8)::Bool                                             ││
  │    %25 = (Core.Intrinsics.and_int)(%23, %24)::Bool                          ││
  └───       goto #9 if not %25                                                 ││
  8 ── %27 = π (%1, Int32)                                                      ││
  │    %28 = π (%2, UInt8)                                                      ││
  │    %29 = %new(Main.NewInstruction, _3, _4, _5, %27, %28)::NewInstruction    │││╻   NewInstruction
  └───       goto #10                                                           ││
  9 ──       Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{}
  └───       unreachable                                                        ││
  10 ┄ %33 = φ (#2 => %7, #4 => %14, #6 => %21, #8 => %29)::NewInstruction      ││
  └───       goto #11                                                           ││
  11 ─       return %33                                                         │
   => NewInstruction
```
@aviatesk aviatesk force-pushed the avi/nospecialized-keywordfunc-perf branch from 5dd7a0b to b8a6b10 Compare October 8, 2022 04:22
@aviatesk aviatesk merged commit 95cfd62 into master Oct 8, 2022
@aviatesk aviatesk deleted the avi/nospecialized-keywordfunc-perf branch October 8, 2022 06:39
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

# if the first/second parameter of `NamedTuple` is known to be empty,
# the second/first argument should also be empty tuple type,
# so refine it here
return Const(NamedTuple{(),Tuple{}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning seems faulty, since couldn't the parameter also be a TypeVar of any sort?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're quite right:

julia> (()->NamedTuple{(), <:Any})()
NamedTuple{(), Tuple{}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) keyword arguments f(x; keyword=arguments)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants