Skip to content

Commit

Permalink
before AK=0.2.1 is registered, separate sortperm implementations
Browse files Browse the repository at this point in the history
  • Loading branch information
anicusan committed Nov 23, 2024
1 parent bd1ebdd commit a5725d3
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ vision applications, ~~we spent a stupid amount of time optimising~~ this implem
- **Building the BVH on an Nvidia A100 takes 409.58 μs**!
- Contact detection (`traverse`) of the same 249,882 `BSphere{Float32}` for the triangles (aggregated into `BBox{Float32}` parents) takes 107.25 ms single-threaded on an Intel IceLake 8570 and 37.25 ms with 4 threads, at 72% strong scaling.
- **Traversing the BVH on an Nvidia A100 takes 1.14 ms**!
- Ray-tracing of 100,000 random rays over the same 249,882 `BSphere{Float32}` for the triangles (aggregated into `BBox{Float32}` parents) takes 671.01 ms single-threaded on an Intel IceLake 8570 and 216.99 ms with 4 threads, at 77% strong scaling.
- Ray-tracing (`traverse_rays`) of 100,000 random rays over the same 249,882 `BSphere{Float32}` for the triangles (aggregated into `BBox{Float32}` parents) takes 671.01 ms single-threaded on an Intel IceLake 8570 and 216.99 ms with 4 threads, at 77% strong scaling.
- Ray-tracing on an Nvidia A100 takes 2.00 ms.

Only fundamental Julia types are used - e.g. `struct`, `Tuple`, `UInt`, `Float64` - which can be straightforwardly inlined, unrolled and fused by the compiler. These types are also straightforward to transpile to accelerators via [`KernelAbstractions.jl`](https://github.com/JuliaGPU/KernelAbstractions.jl) such as `CUDA`, `AMDGPU`, `oneAPI`, `Apple Metal`.
Expand Down
6 changes: 5 additions & 1 deletion src/build.jl
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,11 @@ function BVH(

# Compute indices that sort codes along the Z-curve - closer objects have closer Morton codes
order = similar(mortons, I)
AK.sortperm!(order, mortons, block_size=options.block_size)
if mortons isa AbstractGPUVector
AK.sortperm!(order, mortons, block_size=options.block_size)
else
sortperm!(order, mortons)
end

# Pre-allocate vector of bounding volumes for the real nodes above the bottom level
bvh_nodes = similar(bounding_volumes, N, Int(tree.real_nodes - tree.real_leaves))
Expand Down

0 comments on commit a5725d3

Please sign in to comment.