Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow @nospecialize-d push! to take arbitrary items #45790

Merged
merged 2 commits into from
Jun 27, 2022
Merged

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Jun 23, 2022

Currently the @nospecialize-d push!(::Vector{Any}, ...) can only
take a single item and we will end up with runtime dispatch when we try
to call it with multiple items:

julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1$(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing%2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1%1 = Base.append!(a, iter)::Vector{Any}
└──      return %1
) => Vector{Any}

This commit extends it so that it can take arbitrarily long items.
Our compiler should still be able to optimize the single-input case as
before by unrolling using the constant item length information:

julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1$(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing%2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1%1  = Base.arraylen(a)::Int64$(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing
└──       goto #7 if not true
2%4  = φ (#1 => 1, #6 => %14)::Int64%5  = φ (#1 => 1, #6 => %15)::Int64%6  = Base.getfield(x, %4, true)::Any%7  = Base.add_int(%1, %4)::Int64
│         Base.arrayset(true, a, %6, %7)::Vector{Any}%9  = (%5 === 2)::Bool
└──       goto #4 if not %9
3 ─       goto #5
4%12 = Base.add_int(%5, 1)::Int64
└──       goto #5
5%14 = φ (#4 => %12)::Int64%15 = φ (#4 => %12)::Int64%16 = φ (#3 => true, #4 => false)::Bool%17 = Base.not_int(%16)::Bool
└──       goto #7 if not %17
6 ─       goto #2
7return a
) => Vector{Any}

@nanosoldier runbenchmarks("inference", vs=":master")

@aviatesk aviatesk requested a review from vtjnash June 23, 2022 09:08
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@test count(iscall((src, push!)), src.code) == 0
@test count(src.code) do @nospecialize x
isa(x, Core.GotoNode) || isa(x, Core.GotoIfNot)
end == 0 # the loop should be optimized away for a single item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be failing on many CI bots

Copy link
Member Author

@aviatesk aviatesk Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like push!(::Vector{Any}, ::Any) produces the following IR:

julia> code_typed((Vector{Any},Any)) do xs, x
           push!(xs, x)
       end
1-element Vector{Any}:
 CodeInfo(
1%1 = Base.arraylen(xs)::Int64$(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
└──      goto #3 if not true
2%4 = Base.add_int(%1, 1)::Int64
└──      Base.arrayset(true, xs, x, %4)::Vector{Any}
3 ┄      goto #4
4return xs
) => Vector{Any}

So the loop can't be fully optimized on Julia level actually, but I confirmed LLVM can further optimize it down the road and the performance doesn't regress.

EDIT: A nanosolidier result suggested that loop can be left unoptimized. For now, I'd like to specialize the single-arg case and optimize it manually.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only
take a single item and we will end up with runtime dispatch when we try
to call it with multiple items:
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│   %2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.append!(a, iter)::Vector{Any}
└──      return %1
) => Vector{Any}
```

This commit extends it so that it can take arbitrary-length items.
Our compiler should still be able to optimize the single-input case as
before by unrolling using the constant item length information:
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│   %2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1  = Base.arraylen(a)::Int64
│         $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing
└──       goto #7 if not true
2 ┄ %4  = φ (#1 => 1, #6 => %14)::Int64
│   %5  = φ (#1 => 1, #6 => %15)::Int64
│   %6  = Base.getfield(x, %4, true)::Any
│   %7  = Base.add_int(%1, %4)::Int64
│         Base.arrayset(true, a, %6, %7)::Vector{Any}
│   %9  = (%5 === 2)::Bool
└──       goto #4 if not %9
3 ─       goto #5
4 ─ %12 = Base.add_int(%5, 1)::Int64
└──       goto #5
5 ┄ %14 = φ (#4 => %12)::Int64
│   %15 = φ (#4 => %12)::Int64
│   %16 = φ (#3 => true, #4 => false)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto #7 if not %17
6 ─       goto #2
7 ┄       return a
) => Vector{Any}
```
@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

For `Vector{Any}` to avoid runtime dispatch.
@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk merged commit c686e4a into master Jun 27, 2022
@aviatesk aviatesk deleted the avi/anypush! branch June 27, 2022 00:34
pcjentsch pushed a commit to pcjentsch/julia that referenced this pull request Aug 18, 2022
…Lang#45790)

Currently the `@nospecialize`-d `push!(::Vector{Any}, ...)` can only
take a single item and we will end up with runtime dispatch when we try
to call it with multiple items:
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│   %2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.append!(a, iter)::Vector{Any}
└──      return %1
) => Vector{Any}
```

This commit adds a new specialization that it can take arbitrary-length
items. Our compiler should still be able to optimize the single-input 
case as before via the dispatch mechanism.
```julia
julia> code_typed(push!, (Vector{Any}, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000001, 0x0000000000000001))::Nothing
│   %2 = Base.arraylen(a)::Int64
│        Base.arrayset(true, a, item, %2)::Vector{Any}
└──      return a
) => Vector{Any}

julia> code_typed(push!, (Vector{Any}, Any, Any))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1  = Base.arraylen(a)::Int64
│         $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), Core.Argument(2), 0x0000000000000002, 0x0000000000000002))::Nothing
└──       goto JuliaLang#7 if not true
2 ┄ %4  = φ (JuliaLang#1 => 1, JuliaLang#6 => %14)::Int64
│   %5  = φ (JuliaLang#1 => 1, JuliaLang#6 => %15)::Int64
│   %6  = Base.getfield(x, %4, true)::Any
│   %7  = Base.add_int(%1, %4)::Int64
│         Base.arrayset(true, a, %6, %7)::Vector{Any}
│   %9  = (%5 === 2)::Bool
└──       goto JuliaLang#4 if not %9
3 ─       goto JuliaLang#5
4 ─ %12 = Base.add_int(%5, 1)::Int64
└──       goto JuliaLang#5
5 ┄ %14 = φ (JuliaLang#4 => %12)::Int64
│   %15 = φ (JuliaLang#4 => %12)::Int64
│   %16 = φ (JuliaLang#3 => true, JuliaLang#4 => false)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto JuliaLang#7 if not %17
6 ─       goto JuliaLang#2
7 ┄       return a
) => Vector{Any}
```

This commit also adds the equivalent implementations for `pushfirst!`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants