BVH building is not compatible with CUDA.jl #10

ZenanH · 2024-11-21T10:45:01Z

Hi, I just tested the code from README, but it has error with CUDA.jl:

using CUDA

using ImplicitBVH
using ImplicitBVH: BBox, BSphere

# Generate some simple bounding spheres; save them in a GPU array
bounding_spheres = CuArray([
    BSphere{Float32}([0., 0., 0.], 0.5),
    BSphere{Float32}([0., 0., 1.], 0.6),
    BSphere{Float32}([0., 0., 2.], 0.5),
    BSphere{Float32}([0., 0., 3.], 0.4),
    BSphere{Float32}([0., 0., 4.], 0.6),
])

# Build BVH
bvh = BVH(bounding_spheres, BBox{Float32}, UInt32)

Error info on the main branch (v0.5.0):

ERROR: GPU compilation of MethodInstance for AcceleratedKernels.gpu__foreachindex_global!(::KernelAbstractions.CompilerMetadata{…}, ::ImplicitBVH.var"#12#13"{…}, ::Base.OneTo{…}) failed
KernelError: passing and using non-bitstype argument

Argument 3 to your kernel function is of type ImplicitBVH.var"#12#13"{CuDeviceVector{BBox{Float32}, 1}, CuDeviceVector{BSphere{Float32}, 1}, CuDeviceVector{Int32, 1}, Int32, Int32, Type{Int32}}, which is not isbits:
  .I is of type Type{Int32} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/validation.jl:92
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:92 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:253 [inlined]
  [4] 
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:90
  [5] codegen
    @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:82 [inlined]
  [6] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:79
  [7] compile
    @ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:74 [inlined]
  [8] #1145
    @ ~/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:250 [inlined]
  [9] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:34
 [10] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:25
 [11] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:249
 [12] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:237
 [13] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:151
 [14] macro expansion
    @ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:380 [inlined]
 [15] macro expansion
    @ ./lock.jl:273 [inlined]
 [16] cufunction(f::typeof(AcceleratedKernels.gpu__foreachindex_global!), tt::Type{…}; kwargs::@Kwargs{…})
    @ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:375
 [17] macro expansion
    @ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:112 [inlined]
 [18] (::KernelAbstractions.Kernel{…})(::Function, ::Vararg{…}; ndrange::Int64, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/2kjXI/src/CUDAKernels.jl:103
 [19] _foreachindex_gpu(f::Function, itr::UnitRange{Int32}, backend::CUDABackend; block_size::Int64)
    @ AcceleratedKernels ~/.julia/packages/AcceleratedKernels/gP4jS/src/foreachindex.jl:17
 [20] _foreachindex_gpu
    @ ~/.julia/packages/AcceleratedKernels/gP4jS/src/foreachindex.jl:8 [inlined]
 [21] #foreachindex#9
    @ ~/.julia/packages/AcceleratedKernels/gP4jS/src/foreachindex.jl:160 [inlined]
 [22] foreachindex
    @ ~/.julia/packages/AcceleratedKernels/gP4jS/src/foreachindex.jl:141 [inlined]
 [23] aggregate_last_level!
    @ ~/.julia/packages/ImplicitBVH/0kmLq/src/build.jl:383 [inlined]
 [24] aggregate_oibvh!(bvh_nodes::CuArray{…}, bvh_leaves::CuArray{…}, tree::ImplicitTree{…}, order::CuArray{…}, built_level::Int32, options::BVHOptions{…})
    @ ImplicitBVH ~/.julia/packages/ImplicitBVH/0kmLq/src/build.jl:310
 [25] BVH(bounding_volumes::CuArray{…}, node_type::Type{…}, morton_type::Type{…}, built_level::Int64; options::BVHOptions{…})
    @ ImplicitBVH ~/.julia/packages/ImplicitBVH/0kmLq/src/build.jl:205
 [26] BVH
    @ ~/.julia/packages/ImplicitBVH/0kmLq/src/build.jl:160 [inlined]
 [27] BVH(bounding_volumes::CuArray{BSphere{Float32}, 1, CUDA.DeviceMemory}, node_type::Type{BBox{Float32}}, morton_type::Type{UInt32})
    @ ImplicitBVH ~/.julia/packages/ImplicitBVH/0kmLq/src/build.jl:160
 [28] top-level scope
    @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

Error info on v0.4.1:

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:50 [inlined]
  [6] bounding_volumes_extrema(bounding_volumes::CuArray{BSphere{Float32}, 1, CUDA.DeviceMemory})
    @ ImplicitBVH ~/.julia/packages/ImplicitBVH/00zCQ/src/morton.jl:125
  [7] #morton_encode!#5
    @ ~/.julia/packages/ImplicitBVH/00zCQ/src/morton.jl:216 [inlined]
  [8] morton_encode!
    @ ~/.julia/packages/ImplicitBVH/00zCQ/src/morton.jl:209 [inlined]
  [9] BVH(bounding_volumes::CuArray{…}, node_type::Type{…}, morton_type::Type{…}, built_level::Int64; num_threads::Int64)
    @ ImplicitBVH ~/.julia/packages/ImplicitBVH/00zCQ/src/build.jl:170
 [10] BVH
    @ ~/.julia/packages/ImplicitBVH/00zCQ/src/build.jl:134 [inlined]
 [11] BVH(bounding_volumes::CuArray{BSphere{…}, 1, CUDA.DeviceMemory}, node_type::Type{BBox{…}}, morton_type::Type{UInt32})
    @ ImplicitBVH ~/.julia/packages/ImplicitBVH/00zCQ/src/build.jl:134
 [12] top-level scope
    @ REPL[11]:1
Some type information was truncated. Use `show(err)` to see complete types.

Julia Version 1.11.1
CUDA v5.5.2
ImplicitBVH v0.5.0 `https://github.com/StellaOrg/ImplicitBVH.jl.git#main`
or ImplicitBVH v0.4.1

The text was updated successfully, but these errors were encountered:

anicusan · 2024-11-23T21:32:38Z

Hi, thanks for the report! I finally got on a cluster with Nvidia GPUs. Julia had some odd lambda capturing (it was trying to pass Type{Int32} to the kernel); separating that into another function fixed it.

I tested and benchmarked the GPU backend via CUDA for building, contact-detection and ray-tracing, and it seems to work.
Would you mind testing the #main version again?

For the other GPU backends, we are waiting on atomics for KernelAbstractions.jl kernels; it should already work on AMD GPUs (though I don't have one on hand to try). On Apple Metal it will be available in the next version of Atomix after a pull request I made; we're waiting on merging the oneAPI atomics extension - then all backends should work :)

anicusan · 2024-12-02T14:35:56Z

I will close this issue now. If you find any other issues, please feel free to reopen this discussion or another one.

anicusan closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BVH building is not compatible with CUDA.jl #10

BVH building is not compatible with CUDA.jl #10

ZenanH commented Nov 21, 2024

anicusan commented Nov 23, 2024

anicusan commented Dec 2, 2024

BVH building is not compatible with CUDA.jl #10

BVH building is not compatible with CUDA.jl #10

Comments

ZenanH commented Nov 21, 2024

anicusan commented Nov 23, 2024

anicusan commented Dec 2, 2024