Failed to compile PTX code #133

casper2002casper · 2022-02-16T13:24:46Z

After updating to the latest version, GPU executing stopped working for me. I have not created a minimal reproducible example yet, but this is the error I'm facing.

Failed to compile PTX code (ptxas exited with code 255)
ptxas /tmp/jl_rh605v.ptx, line 313; error   : Instruction 'atom.cas.b16.global' requires .target sm_70 or higher
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_rh605v.ptx
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:399
  [3] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1Ajz2/src/cache.jl:90
  [4] cufunction(f::typeof(GraphNeuralNetworks.scatter_scalar_kernel!), tt::Type{Tuple{typeof(+), CUDA.CuDeviceVector{UInt16, 1}, Int64, CUDA.CuDeviceVector{Int64, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:297
  [5] cufunction(f::typeof(GraphNeuralNetworks.scatter_scalar_kernel!), tt::Type{Tuple{typeof(+), CUDA.CuDeviceVector{UInt16, 1}, Int64, CUDA.CuDeviceVector{Int64, 1}}})
    @ CUDA ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:291
  [6] macro expansion
    @ ~/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:102 [inlined]
  [7] scatter!(op::Function, dst::CUDA.CuArray{UInt16, 1, CUDA.Mem.DeviceBuffer}, src::Int64, idx::CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/utils.jl:52
  [8] degree(g::GraphNeuralNetworks.GNNGraphs.GNNGraph{Tuple{CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}}, T::Type{UInt16}; dir::Symbol, edge_weight::Nothing)
    @ GraphNeuralNetworks.GNNGraphs ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/GNNGraphs/query.jl:212
  [9] (::GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)})(g::GraphNeuralNetworks.GNNGraphs.GNNGraph{Tuple{CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}}, x::CUDA.CuArray{UInt16, 2, CUDA.Mem.DeviceBuffer}, edge_weight::Nothing)
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/layers/conv.jl:95
 [10] (::GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)})(g::GraphNeuralNetworks.GNNGraphs.GNNGraph{Tuple{CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}}, x::CUDA.CuArray{UInt16, 2, CUDA.Mem.DeviceBuffer})
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/layers/conv.jl:80
 [11] (::GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)})(g::GraphNeuralNetworks.GNNGraphs.GNNGraph{Tuple{CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}})
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/layers/basic.jl:12
 [12] applylayer
    @ ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/layers/basic.jl:121 [inlined]
 [13] applychain
    @ ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/layers/basic.jl:133 [inlined]
 [14] (::GraphNeuralNetworks.GNNChain{Tuple{GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, GraphNeuralNetworks.GCNConv{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, typeof(NNlib.relu)}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}})(g::GraphNeuralNetworks.GNNGraphs.GNNGraph{Tuple{CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}})
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/Hv1up/src/layers/basic.jl:140

CarloLucibello · 2022-02-16T17:19:07Z

I cannot reproduce this error. Can you try to run the node and graph classification examples in the examples/ directory?

For me they work fine on gpu with the following package versions:

pkg> st
      Status `~/.julia/dev/GraphNeuralNetworks/examples/Project.toml`
  [052768ef] CUDA v3.8.1
  [aae7a2af] DiffEqFlux v1.45.1
  [0c46a032] DifferentialEquations v7.1.0
  [587475ba] Flux v0.12.8
  [7e08b658] GeometricFlux v0.8.0
  [cffab07f] GraphNeuralNetworks v0.3.13
  [3ebe565e] GraphSignals v0.3.9
  [86223c79] Graphs v1.6.0
  [eb30cadb] MLDatasets v0.5.15
  [872c559c] NNlib v0.7.34
  [a00861dc] NNlibCUDA v0.1.11

casper2002casper · 2022-02-17T11:59:02Z

It's caused by using UInt16 node data. I might have used a different data type last time I tried using the GPU, so I guess that broke it for me. Could it be possible my GPU doesn't support 16-bit data types CUDA operations?

using GraphNeuralNetworks
using Flux

nn = GNNChain(GCNConv(2 => 2, relu))
x = rand_graph(5,10, ndata=rand(UInt16,2,5))
println("CPU")
println(nn(x))
println("GPU")
nn = Flux.gpu(nn)
x = Flux.gpu(x)
println(nn(x))

CPU
GNNGraph:
    num_nodes = 5
    num_edges = 10
    ndata:
        x => (2, 5)
GPU
ERROR: LoadError: Failed to compile PTX code (ptxas exited with code 255)
ptxas /tmp/jl_JXO7fH.ptx, line 314; error   : Instruction 'atom.cas.b16.global' requires .target sm_70 or higher
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_JXO7fH.ptx

casper2002casper · 2022-02-17T12:59:20Z

Seems like my GPU doesn't support 16 bit operations, based on https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#maximize-instruction-throughput with my CC 5.0 GPU.

casper2002casper closed this as completed Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to compile PTX code #133

Failed to compile PTX code #133

casper2002casper commented Feb 16, 2022

CarloLucibello commented Feb 16, 2022

casper2002casper commented Feb 17, 2022 •

edited

Loading

casper2002casper commented Feb 17, 2022

Failed to compile PTX code #133

Failed to compile PTX code #133

Comments

casper2002casper commented Feb 16, 2022

CarloLucibello commented Feb 16, 2022

casper2002casper commented Feb 17, 2022 • edited Loading

casper2002casper commented Feb 17, 2022

casper2002casper commented Feb 17, 2022 •

edited

Loading