-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid IR with StaticArrays.jl #363
Comments
Probably not something we can do much about in GPUCompiler, as this seems like a Julia 'regression' (and not even strictly so, because Julia is a dynamic language, so not all code is expected to compile statically). Especially the If you care about this pattern, what I'd recommend you to do is to create a reproducer without GPUCompiler that emits static code with plain |
Thanks for suggestions!
after several iterations (< 10) in my code. |
It's interesting that this happens after a couple of iterations, as |
Reducing number of threads does seem to help, but I should probably look into occupancy API as you've suggested. However, on AMDGPU reducing number of threads (even setting as low as 1) only seems to delay when the similar error occurs. There it progressively trims the maximum number of concurrent waves to allow scratch to fit until the error occurs. And all that comes with performance hit anyway. |
Just a follow-up, it happens after few iterations, because I start passing new variable which was |
Do you have a CPU-based MWE (i.e. just doing code_llvm with minimal dependencies)? I have some bisection infrastructure ready, so could give that a go. |
Not really... On CPU Smallest I've got is this: using StaticArrays
using CUDA
function f(x)
width, height = size(x)
xy = SVector{2, Float32}(0.5f0, 0.5f0)
res = SVector{2, UInt32}(width, height)
floor.(UInt32, max.(0f0, xy) .* res)
nothing
end
function main()
x = CUDA.ones(Float32, (8, 8))
@cuda threads=1 f(x)
end
main() Also if you replace |
Hmm yes, this does seem limited to CUDA.jl (or probably, GPUCompiler.jl). |
Opened an issue on GPUCompiler: #366 |
@maleadt, RE How would you suggest I debug what's causing the issue? |
That is a separate issue. Can you create an MWE? To debug this, you can try introspecting the kernel: get a hold of the kernel object, and call e.g. But again, you should be using the occupancy API to be resilient against changes like this (so that your application doesn't crash, at least), also because you want your kernels to be generic and e.g. support different element types (which may result in generated code that requires more registers). |
I'll try.
Output on CUDA.registers(kerr) = 157
CUDA.memory(kerr) = (local = 1224, shared = 0, constant = 0)
CUDA.maxthreads(kerr) = 384 vs CUDA.registers(kerr) = 122
CUDA.memory(kerr) = (local = 1224, shared = 0, constant = 0)
CUDA.maxthreads(kerr) = 512 |
Yeah, that's a regression. Could you file that as an issue on CUDA.jl (i.e., the MWE calling |
Opened JuliaGPU/CUDA.jl#1673. |
Hi! Not really sure where to post this issue, but following kernel produces
InvalidIRError
on Julia 1.9 master, while on 1.8.2 it works fine:However, if I split
xy_res
calculation into several steps, it works fine:Error:
The text was updated successfully, but these errors were encountered: