Regression when using AMDGPU and CUDA in same session #47331
Labels
gpu
Affects running Julia on a GPU
regression
Regression in behavior compared to a previous version
types and dispatch
Types, subtyping and method dispatch
Milestone
When loading AMDGPU.jl and CUDA.jl (in that order) on Julia master breaks usage of AMDGPU. It appears that when AMDGPU calls into GPUCompiler's
cached_compilation
function, and various GPUCompiler interface functions are called (which have overloads defined in AMDGPU), those interface function calls fall back to their non-specialized base-case method defined in GPUCompiler. Specifically, the call:GPUCompiler.runtime_module(job)
where
job
is a:GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams}
does not call the method https://github.com/JuliaGPU/AMDGPU.jl/blob/bf66efcc5541319ae44d3ff45669abebb133d421/src/compiler.jl#L59, but instead calls https://github.com/JuliaGPU/GPUCompiler.jl/blob/18163a561a169934844d493e4fcd3238439c3965/src/interface.jl#L188.
I can use reflection within GPUCompiler to see that AMDGPU's method does in fact exist and matches the signature for
runtime_module
called withjob
.If CUDA is not loaded, or before CUDA is loaded, the GPUCompiler overloads in AMDGPU are called correctly. Loading CUDA before AMDGPU also avoids the problem.
git bisect
points to:The commit immediately prior does not exhibit the problematic behavior.
Currently the failing AMDGPU code paths can't be exercised without access to a supported AMD GPU; I am going to implement a way to call the desired methods without hardware access, as well as an MWE, to make it easier to reproduce.
@vtjnash
Ref #46920
The text was updated successfully, but these errors were encountered: