Bad performance #1

maleadt · 2024-03-06T12:09:41Z

Apparently this doesn't result in the expected speed-ups. Unsure why: JuliaGPU/GPUArrays.jl#454 (comment)

Simply providing the cartesian index statically like so is sufficient, at least with Metal's back-end compiler:

struct StaticCartesianIndices{N, I} end

StaticCartesianIndices(iter::CartesianIndices{N}) where {N} =
    StaticCartesianIndices{N, iter.indices}()
StaticCartesianIndices(x) = StaticCartesianIndices(CartesianIndices(x))

Base.CartesianIndices(iter::StaticCartesianIndices{N, I}) where {N, I} =
    CartesianIndices{N, typeof(I)}(I)

Base.@propagate_inbounds Base.getindex(I::StaticCartesianIndices, i::Int) =
    CartesianIndices(I)[i]
Base.length(I::StaticCartesianIndices) = length(CartesianIndices(I))

function Base.show(io::IO, I::StaticCartesianIndices)
    print(io, "Static")
    show(io, CartesianIndices(I))
end

Only relying on LLVM isn't sufficient, see JuliaGPU/GPUArrays.jl#454 (comment) and JuliaGPU/GPUArrays.jl#454 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance #1

Bad performance #1

maleadt commented Mar 6, 2024

Bad performance #1

Bad performance #1

Comments

maleadt commented Mar 6, 2024