CUDA Error #57

Pandabear314 · 2021-08-13T14:14:43Z

While atempting to utalize AlphaZero for tetris I keep running into this error when running it on the GPU. I have reproduced this error on two separate machines, and happens consistently when launching a checkpoint evaluation. I am wondering if someone has insight into what might be causing this.

Repo:
https://gitlab.com/samdickinson314/tetrisai
include("runner.jl")

    Launching a checkpoint evaluation

CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:22
  [2] macro expansion
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:39 [inlined]
  [3] cudnnActivationForward(handle::Ptr{Nothing}, activationDesc::CUDA.CUDNN.cudnnActivationDescriptor, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
    @ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\utils\call.jl:26
  [4] #cudnnActivationForwardAD#657
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:48 [inlined]
  [5] #cudnnActivationForwardWithDefaults#656
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:42 [inlined]
  [6] #cudnnActivationForward!#653
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:22 [inlined]
  [7] #35
    @ C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:13 [inlined]
  [8] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(NNlib.relu), Tuple{CUDA.CuArray{Float32, 4}}})
    @ NNlibCUDA C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:30
  [9] (::Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4}, cache::Nothing)
    @ Flux.CUDAint C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:9
 [10] BatchNorm
    @ C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:6 [inlined]
 [11] applychain(fs::Tuple{Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}, x::CUDA.CuArray{Float32, 4}) (repeats 2 times)
    @ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:37
 [12] (::Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}})(x::CUDA.CuArray{Float32, 4})
    @ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:39
 [13] forward(nn::ResNet, state::CUDA.CuArray{Float32, 4})
    @ AlphaZero.FluxLib C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\flux.jl:142
 [14] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4}, actions_mask::CUDA.CuArray{Float32, 2})
    @ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:264
 [15] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
    @ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:312
 [16] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}}; batch_size::Int64, fill_batches::Bool)
    @ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:32
 [17] (::AlphaZero.var"#36#37"{Int64, Bool, ResNet})(batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
    @ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:54
 [18] macro expansion
    @ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\batchifier.jl:68 [inlined]
 [19] macro expansion
    @ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\util.jl:20 [inlined]
 [20] (::AlphaZero.Batchifier.var"#2#4"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
    @ AlphaZero.Batchifier C:\Users\dickisp1\.julia\packages\ThreadPools\ROFEh\src\macros.jl:261Interrupted by the user

1

The text was updated successfully, but these errors were encountered:

SheldonCurtiss · 2021-08-13T15:04:53Z

Is this on a local machine?
I'd say reboot which has fixed this in the past for me, if not ensure you have the proper drivers which can be quite a nightmare in some cases.

Pandabear314 · 2021-08-13T15:11:43Z

Yes, I have rebooted both machines as well as reinstalled/recompiled all my Julia packages to clear any bad versions. The two machines (one a laptop and the other a deskop) have different NVidea GPU's and drivers so I do not think it is a driver issue though that can never be ruled out. I will try messing with that in the mean time.

jonathan-laurent · 2021-08-13T16:05:51Z

What version of CUDA.jl are you using? Can I see the result of CUDA.versioninfo() on your machine?
Also, are you using the Knet or Flux backend?

I have seen many different problems resulting in Code 8 errors, including Out Of Memory errors (are you sure you have enough memory on your GPU to accommodate your network) and bugs in CUDA.jl (AlphaZero.jl is a stress test for CUDA.jl).

jonathan-laurent · 2021-08-13T16:10:12Z

PS: I love the idea of using AlphaZero on Tetris!
I am looking forward to hearing more about your experiment and I would happily accept a PR to add Tetris to AlphaZero.Examples.

Pandabear314 · 2021-08-13T16:59:41Z

from CUDA.versioninfo():

CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.4.0
NVIDIA driver 471.68.0

Libraries:
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+471.68
- CUDNN: 8.20.2 (for CUDA 11.4.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce MX130 (sm_50, 1.963 GiB / 2.000 GiB available)

This is from my laptop. So it does not have a lot of memory but my GPU on my desktop has 8GB. Based on the stacktrace im pretty sure I am using the Flux backend.

I had tried to reduce the memory usage by reducing the number of boards stored not by the network size. I can try that next.

jonathan-laurent · 2021-08-14T11:57:12Z

I see nothing wrong with your versioninfo().
The most useful hyperparameters to tweak if you lack GPU memory are network size and batch size.
Reducing the number of boards stored in memory mostly has impact on RAM usage (and I guess a rather small one in most cases).

There are scripts in the script/profile directory that you can use to profile inference and self-play. It could be helpful in figuring out the best hyperparameters.

Note that it is also possible that what you are observing comes from a problem with CUDA.jl, as I've seen it happen in the past.

SheldonCurtiss · 2021-08-16T14:03:56Z

100% due to memory constraints on the gpu. I agree with his suggestion to lower batch size. How much vram do you have? Unsure but I’d assume the size of your vectorized states would effect the sizes of this as well.

recently I’ve been trying to get the most out of both my cpu and gpu and it’s typically very much a trail and error balancing act from my experience.

SheldonCurtiss · 2021-08-16T14:08:51Z

Another thing to note is that alphazero.Jl appears to preallocate all available gpu vram so that’s not a good way to measure.
Unsure if there’s a way to disable that or measure the usage similar to tensorflow.

Pandabear314 · 2021-08-16T14:13:41Z

Thank you all for you help so far. I just need to get some simple results this week so ill just run this on the CPU for now, but will be back in a couple weeks to work through this, then maybe set up a PR for Tetris.

jonathan-laurent · 2021-08-16T14:25:07Z

If you want to get results on CPU, you probably need to simplify the problem somehow (for example by looking at a smaller grid). I suspect that original Tetris is too complicated for AlphaZero to learn the game in a reasonable amount of time without a GPU. That being said, I may be wrong here. In any case, you will need to use a much smaller network if you want to train your agent on CPU.

jonathan-laurent · 2021-08-17T14:27:50Z

@Pandabear314 One thing you may also want to do is to update all dependencies using Pkg.update. Indeed, based on your version info, you are not using the latest version of CUDNN.

Pandabear314 · 2021-08-30T14:48:43Z

@SheldonCurtiss was correct with the batch size being the culprit of me running out of VRAM, and everything runs correctly once I reduce that.

Also the PR may take some time as will have to reformulate how Tetris is run by AlphaZero as my current implementation does not learn, but I have a few ideas to try yet.

Pandabear314 closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Error #57

CUDA Error #57

Pandabear314 commented Aug 13, 2021

SheldonCurtiss commented Aug 13, 2021

Pandabear314 commented Aug 13, 2021

jonathan-laurent commented Aug 13, 2021 •

edited

Loading

jonathan-laurent commented Aug 13, 2021

Pandabear314 commented Aug 13, 2021

jonathan-laurent commented Aug 14, 2021

SheldonCurtiss commented Aug 16, 2021

SheldonCurtiss commented Aug 16, 2021

Pandabear314 commented Aug 16, 2021

jonathan-laurent commented Aug 16, 2021

jonathan-laurent commented Aug 17, 2021

Pandabear314 commented Aug 30, 2021

CUDA Error #57

CUDA Error #57

Comments

Pandabear314 commented Aug 13, 2021

SheldonCurtiss commented Aug 13, 2021

Pandabear314 commented Aug 13, 2021

jonathan-laurent commented Aug 13, 2021 • edited Loading

jonathan-laurent commented Aug 13, 2021

Pandabear314 commented Aug 13, 2021

jonathan-laurent commented Aug 14, 2021

SheldonCurtiss commented Aug 16, 2021

SheldonCurtiss commented Aug 16, 2021

Pandabear314 commented Aug 16, 2021

jonathan-laurent commented Aug 16, 2021

jonathan-laurent commented Aug 17, 2021

Pandabear314 commented Aug 30, 2021

jonathan-laurent commented Aug 13, 2021 •

edited

Loading