Performance regression with StaticArray broadcast #30124

mohamed82008 · 2018-11-23T04:13:33Z

EDIT by @maleadt: see #30124 (comment) for a non-CUDAnative specific reproducer.

The following code works on Julia 1.0.2 and fails in Julia Commit 8a4f20b (2018-11-19 01:50 UTC). The issue first started at JuliaGPU/CUDAnative.jl#291, and @maleadt found the causing commit to be 9e98386. If you replace the multiplication line by the next commented one, it works fine.

using CUDAnative, CuArrays, StaticArrays

Ks = [SMatrix{1, 1, Float64}(rand(1,1))] |> CuArrays.CuArray
fs = [SVector{1, Float64}(rand(1))] |> CuArrays.CuArray

function kernel(fs::AbstractVector{TV}, Ks) where {N, T, TV<:SVector{N,T}}
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    m = 2.0
    if i == 1
	    fs[i] = m * (Ks[i] * fs[i])
	    #fs[i] = SVector{1,T}((m,)) .* (Ks[i] * fs[i])
    end
    return
end

@cuda blocks=1 threads=1 kernel(fs,  Ks)

#=
ERROR: InvalidIRError: compiling kernel(CuDeviceArray{SArray{Tuple{3},Float64,1,3},1,CUDAnative.AS.Global}, CuDeviceArray{SArray{Tuple{3,3},Float64,2,9},1,CUDAnative.AS.Global}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_invoke)
Stacktrace:
 [1] Type at broadcast.jl:141
 [2] result_style at broadcast.jl:397
 [3] combine_styles at broadcast.jl:390
 [4] broadcasted at broadcast.jl:1162
 [5] broadcast at broadcast.jl:702
 [6] * at /home/mohd/.julia/packages/StaticArrays/WmJnA/src/linalg.jl:25
 [7] kernel at REPL[18]:5
Stacktrace:
 [1] check_ir(::CUDAnative.CompilerContext, ::LLVM.Module) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/validation.jl:123
 [2] #compile#80(::Bool, ::Function, ::CUDAnative.CompilerContext) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:74
 [3] compile at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:49 [inlined]
 [4] #compile#79(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Any, ::Any) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:28
 [5] compile at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/compiler/driver.jl:16 [inlined]
 [6] macro expansion at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:255 [inlined]
 [7] #cufunction#91(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(cufunction), ::typeof(kernel), ::Type{Tuple{CuDeviceArray{SArray{Tuple{3},Float64,1,3},1,CUDAnative.AS.Global},CuDeviceArray{SArray{Tuple{3,3},Float64,2,9},1,CUDAnative.AS.Global}}}) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:230
 [8] cufunction(::Function, ::Type) at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:230
 [9] top-level scope at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:198
 [10] top-level scope at gcutils.jl:87
 [11] top-level scope at /home/mohd/.julia/packages/CUDAnative/5H3Dk/src/execution.jl:195
=#

My versioninfo() is:

Julia Version 1.1.0-DEV.681
Commit 8a4f20b887 (2018-11-19 01:50 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Good luck!

The text was updated successfully, but these errors were encountered:

maleadt · 2018-11-23T07:30:34Z

Good luck!

...

That's not what I meant with a reduced version. Try inspecting the typed IR (@device_code_warntype) on a working and failing set-up, ideally reproducing the issue without StaticArrays or CUDAnative. There's probably an inference issue at the bottom of this.

mohamed82008 · 2018-11-23T07:38:12Z

That's not what I meant with a reduced version. Try inspecting the typed IR (@device_code_warntype) on a working and failing set-up, ideally reproducing the issue without StaticArrays or CUDAnative. There's probably an inference issue at the bottom of this.

Oops, sorry that's all I have time for at the moment. Inspecting the typed IR is beyond my comfort zone at this point.

maleadt · 2018-11-29T12:47:50Z

Further reduced (had already been bisected to 9e98386 #29843):

using StaticArrays # 0.10.0

K = SMatrix{1,1}(rand(1,1))

using InteractiveUtils
versioninfo()
@code_warntype 2.0 * K

using BenchmarkTools
@btime 2.0 * $K

Before and after the offending commit:

Julia Version 1.1.0-DEV.590
Commit 817f6fc7ee* (2018-11-01 16:31 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, znver1)
Environment:
  JULIA_DEPOT_PATH = /tmp/x
Body::SArray{Tuple{1,1},Float64,2,1}
25 1 ─ %1 = (Base.getfield)(b, :data)::Tuple{Float64}                                │╻╷╷╷╷╷╷ broadcast
   │   %2 = (Base.getfield)(%1, 1, false)::Float64                                   ││╻       materialize
   │   %3 = (Base.mul_float)(a, %2)::Float64                                         │││╻       copy
   │   %4 = (StaticArrays.tuple)(%3)::Tuple{Float64}                                 ││││┃│      _broadcast
   │   %5 = %new(SArray{Tuple{1,1},Float64,2,1}, %4)::SArray{Tuple{1,1},Float64,2,1} │││││╻       macro expansion
   └──      return %5                                                                │       
  0.020 ns (0 allocations: 0 bytes)

Julia Version 1.1.0-DEV.591
Commit 9e98386bea* (2018-11-01 16:31 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, znver1)
Environment:
  JULIA_DEPOT_PATH = /tmp/x
Body::SArray{Tuple{1,1},Float64,2,1}
25 1 ─ %1 = Base.typename::typeof(Base.typename)                                     │╻╷╷╷╷   broadcast
   │        invoke %1(Base.Broadcast.DefaultArrayStyle{0}::DataType)                 ││┃│││    broadcasted
   │   %3 = Base.typename::typeof(Base.typename)                                     │││┃││     combine_styles
   │        invoke %3(StaticArrays.StaticArrayStyle{2}::DataType)                    ││││┃│      result_style
   │   %5 = (Base.getfield)(b, :data)::Tuple{Float64}                                │││╻╷╷╷╷   copy
   │   %6 = (Base.getfield)(%5, 1, false)::Float64                                   ││││╻       _broadcast
   │   %7 = (Base.mul_float)(a, %6)::Float64                                         │││││╻       macro expansion
   │   %8 = (StaticArrays.tuple)(%7)::Tuple{Float64}                                 ││││││  
   │   %9 = %new(SArray{Tuple{1,1},Float64,2,1}, %8)::SArray{Tuple{1,1},Float64,2,1} ││││││╻       Type
   └──      return %9                                                                │       
  6.712 ns (0 allocations: 0 bytes)

So not limited to CUDAnative's Julia support, but a serious broadcast/StaticArrays regression instead.

mbauman · 2018-11-29T23:37:06Z

Nice work reducing and bisecting to the commit. My hunch is that the cause is the removal of the @pure here: https://github.com/JuliaLang/julia/pull/29843/files#diff-8e941cd1878722cc12a54e01537b8e18L137. It looks like Julia is just calling typename for the sake of a possible side-effect? I had hoped that throwing a @pure annotation on just the typename methods would be sufficient, but that doesn't appear to do the trick interactively.

JeffBezanson · 2018-12-12T23:22:59Z

Bump. I assume this is a blocker for 1.1?

…on (#30420)" This reverts commit e6938a0.

(cherry picked from commit e6938a0)

maleadt added regression Regression in behavior compared to a previous version gpu Affects running Julia on a GPU labels Nov 23, 2018

maleadt added performance Must go faster broadcast Applying a function over a collection and removed gpu Affects running Julia on a GPU labels Nov 29, 2018

maleadt changed the title ~~Scalar multiplying static vector in CUDAnative kernel makes unsupported call to Julia runtime in Julia master but not 1.0.2~~ Performance regression with StaticArray broadcast Nov 29, 2018

KristofferC added this to the 1.1 milestone Nov 30, 2018

JeffBezanson added a commit that referenced this issue Dec 17, 2018

fix #30124, broadcast regression due to removed pure annotation

a386acc

JeffBezanson mentioned this issue Dec 17, 2018

fix #30124, broadcast regression due to removed pure annotation #30420

Merged

JeffBezanson closed this as completed in #30420 Dec 18, 2018

JeffBezanson added a commit that referenced this issue Dec 18, 2018

fix #30124, broadcast regression due to removed pure annotation (#30420)

e6938a0

ViralBShah added a commit that referenced this issue Dec 18, 2018

Revert "fix #30124, broadcast regression due to removed pure annotati…

f9c841d

…on (#30420)" This reverts commit e6938a0.

KristofferC pushed a commit that referenced this issue Dec 20, 2018

fix #30124, broadcast regression due to removed pure annotation (#30420)

898bf8f

(cherry picked from commit e6938a0)

KristofferC mentioned this issue Dec 20, 2018

Backports for 1.1.0 #30309

Merged

52 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression with StaticArray broadcast #30124

Performance regression with StaticArray broadcast #30124

mohamed82008 commented Nov 23, 2018 •

edited by maleadt

Loading

maleadt commented Nov 23, 2018

mohamed82008 commented Nov 23, 2018

maleadt commented Nov 29, 2018 •

edited by vchuravy

Loading

mbauman commented Nov 29, 2018

JeffBezanson commented Dec 12, 2018

Performance regression with StaticArray broadcast #30124

Performance regression with StaticArray broadcast #30124

Comments

mohamed82008 commented Nov 23, 2018 • edited by maleadt Loading

maleadt commented Nov 23, 2018

mohamed82008 commented Nov 23, 2018

maleadt commented Nov 29, 2018 • edited by vchuravy Loading

mbauman commented Nov 29, 2018

JeffBezanson commented Dec 12, 2018

mohamed82008 commented Nov 23, 2018 •

edited by maleadt

Loading

maleadt commented Nov 29, 2018 •

edited by vchuravy

Loading