-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow @nospecialize
-d push!
to take arbitrary items
#45790
Conversation
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
test/compiler/inline.jl
Outdated
@test count(iscall((src, push!)), src.code) == 0 | ||
@test count(src.code) do @nospecialize x | ||
isa(x, Core.GotoNode) || isa(x, Core.GotoIfNot) | ||
end == 0 # the loop should be optimized away for a single item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be failing on many CI bots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like push!(::Vector{Any}, ::Any)
produces the following IR:
julia> code_typed((Vector{Any},Any)) do xs, x
push!(xs, x)
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.arraylen(xs)::Int64
│ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
└── goto #3 if not true
2 ─ %4 = Base.add_int(%1, 1)::Int64
└── Base.arrayset(true, xs, x, %4)::Vector{Any}
3 ┄ goto #4
4 ─ return xs
) => Vector{Any}
So the loop can't be fully optimized on Julia level actually, but I confirmed LLVM can further optimize it down the road and the performance doesn't regress.
EDIT: A nanosolidier result suggested that loop can be left unoptimized. For now, I'd like to specialize the single-arg case and optimize it manually.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only take a single item and we will end up with runtime dispatch when we try to call it with multiple items: ```julia julia> code_typed(push!, (Vector{Any}, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing │ %2 = Base.arraylen(a)::Int64 │ Base.arrayset(true, a, item, %2)::Vector{Any} └── return a ) => Vector{Any} julia> code_typed(push!, (Vector{Any}, Any, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.append!(a, iter)::Vector{Any} └── return %1 ) => Vector{Any} ``` This commit extends it so that it can take arbitrary-length items. Our compiler should still be able to optimize the single-input case as before by unrolling using the constant item length information: ```julia julia> code_typed(push!, (Vector{Any}, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing │ %2 = Base.arraylen(a)::Int64 │ Base.arrayset(true, a, item, %2)::Vector{Any} └── return a ) => Vector{Any} julia> code_typed(push!, (Vector{Any}, Any, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.arraylen(a)::Int64 │ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing └── goto #7 if not true 2 ┄ %4 = φ (#1 => 1, #6 => %14)::Int64 │ %5 = φ (#1 => 1, #6 => %15)::Int64 │ %6 = Base.getfield(x, %4, true)::Any │ %7 = Base.add_int(%1, %4)::Int64 │ Base.arrayset(true, a, %6, %7)::Vector{Any} │ %9 = (%5 === 2)::Bool └── goto #4 if not %9 3 ─ goto #5 4 ─ %12 = Base.add_int(%5, 1)::Int64 └── goto #5 5 ┄ %14 = φ (#4 => %12)::Int64 │ %15 = φ (#4 => %12)::Int64 │ %16 = φ (#3 => true, #4 => false)::Bool │ %17 = Base.not_int(%16)::Bool └── goto #7 if not %17 6 ─ goto #2 7 ┄ return a ) => Vector{Any} ```
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
For `Vector{Any}` to avoid runtime dispatch.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
…Lang#45790) Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only take a single item and we will end up with runtime dispatch when we try to call it with multiple items: ```julia julia> code_typed(push!, (Vector{Any}, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing │ %2 = Base.arraylen(a)::Int64 │ Base.arrayset(true, a, item, %2)::Vector{Any} └── return a ) => Vector{Any} julia> code_typed(push!, (Vector{Any}, Any, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.append!(a, iter)::Vector{Any} └── return %1 ) => Vector{Any} ``` This commit adds a new specialization that it can take arbitrary-length items. Our compiler should still be able to optimize the single-input case as before via the dispatch mechanism. ```julia julia> code_typed(push!, (Vector{Any}, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing │ %2 = Base.arraylen(a)::Int64 │ Base.arrayset(true, a, item, %2)::Vector{Any} └── return a ) => Vector{Any} julia> code_typed(push!, (Vector{Any}, Any, Any)) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = Base.arraylen(a)::Int64 │ $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing └── goto JuliaLang#7 if not true 2 ┄ %4 = φ (JuliaLang#1 => 1, JuliaLang#6 => %14)::Int64 │ %5 = φ (JuliaLang#1 => 1, JuliaLang#6 => %15)::Int64 │ %6 = Base.getfield(x, %4, true)::Any │ %7 = Base.add_int(%1, %4)::Int64 │ Base.arrayset(true, a, %6, %7)::Vector{Any} │ %9 = (%5 === 2)::Bool └── goto JuliaLang#4 if not %9 3 ─ goto JuliaLang#5 4 ─ %12 = Base.add_int(%5, 1)::Int64 └── goto JuliaLang#5 5 ┄ %14 = φ (JuliaLang#4 => %12)::Int64 │ %15 = φ (JuliaLang#4 => %12)::Int64 │ %16 = φ (JuliaLang#3 => true, JuliaLang#4 => false)::Bool │ %17 = Base.not_int(%16)::Bool └── goto JuliaLang#7 if not %17 6 ─ goto JuliaLang#2 7 ┄ return a ) => Vector{Any} ``` This commit also adds the equivalent implementations for `pushfirst!`.
Currently the
@nospecialize
-dpush!(::Vector{Any}, ...)
can onlytake a single item and we will end up with runtime dispatch when we try
to call it with multiple items:
This commit extends it so that it can take arbitrarily long items.
Our compiler should still be able to optimize the single-input case as
before by unrolling using the constant item length information:
@nanosoldier
runbenchmarks("inference", vs=":master")