Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

matlabninja · 2020-05-20T22:09:40Z

Using apex in a docker container with CUDA 10.1, cudnn 7.6.5.32, ubuntu18.04, and pytorch 1.4.0 (derived from nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04), I'm met with the title error during the scaled_loss.backward() call. The docker container is being scheduled via Kubernetes on a DGX-1. Following the advice in #528, I've set the environment variable for TORCH_CUDA_ARCH_LIST to include cc 7.0 before installing.

From the dockerfile

WORKDIR /apex-master
RUN export TORCH_CUDA_ARCH_LIST="6.0;7.0"
RUN pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

I'm including 6.0 here, as we have some P100 nodes available as well. The job that I am running appears to do fine with opt_lv O0 on the DGX in the container. It also was doing fine running on the localhost environment (non-docker, P100 node) with both opt_lv O0 and O1.

Full stack trace:
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Traceback (most recent call last):
File "segFp16.py", line 300, in
scaled_loss.backward()
File "/usr/lib/python3.6/contextlib.py", line 88, in exit
next(self.gen)
File "/usr/local/lib/python3.6/dist-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/usr/local/lib/python3.6/dist-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "/usr/local/lib/python3.6/dist-packages/apex/amp/_process_optimizer.py", line 128, in post_backward_models_are_masters
scale_override=grads_have_scale/out_scale)
File "/usr/local/lib/python3.6/dist-packages/apex/amp/scaler.py", line 117, in unscale
1./scale)
File "/usr/local/lib/python3.6/dist-packages/apex/multi_tensor_apply/multi_tensor_apply.py", line 30, in call
*args)
RuntimeError: CUDA error: no kernel image is available for execution on the device (multi_tensor_apply at csrc/multi_tensor_apply.cuh:108)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fbff3dbe193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: void multi_tensor_apply<2, ScaleFunctor<float, float>, float>(int, int, at::Tensor const&, std::vector<std::vector<at::Tensor, std::allocatorat::Tensor >, std::allocator<std::vector<at::Tensor, std::allocatorat::Tensor > > > const&, ScaleFunctor<float, float>, float) + 0xf83 (0x7fbfc7ae9fe3 in /usr/local/lib/python3.6/dist-packages/amp_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: multi_tensor_scale_cuda(int, at::Tensor, std::vector<std::vector<at::Tensor, std::allocatorat::Tensor >, std::allocator<std::vector<at::Tensor, std::allocatorat::Tensor > > >, float) + 0xcfe (0x7fbfc7ae7a8e in /usr/local/lib/python3.6/dist-packages/amp_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x20627 (0x7fbfc7adc627 in /usr/local/lib/python3.6/dist-packages/amp_C.cpython-36m-x86_64-linux-gnu.so)
frame #4: + 0x1af8c (0x7fbfc7ad6f8c in /usr/local/lib/python3.6/dist-packages/amp_C.cpython-36m-x86_64-linux-gnu.so)

frame #7: python3() [0x5081d5]
frame #9: python3() [0x5951c1]
frame #10: python3() [0x54ac01]
frame #12: python3() [0x50ab53]
frame #14: python3() [0x5081d5]
frame #15: python3() [0x50a020]
frame #16: python3() [0x50aa1d]
frame #18: python3() [0x5081d5]
frame #19: python3() [0x50a020]
frame #20: python3() [0x50aa1d]
frame #22: python3() [0x509ce8]
frame #23: python3() [0x50aa1d]
frame #25: python3() [0x58ee33]
frame #26: python3() [0x51412f]
frame #27: python3() [0x50a84f]
frame #30: python3() [0x5951c1]
frame #34: python3() [0x5081d5]
frame #36: python3() [0x635082]
frame #41: __libc_start_main + 0xe7 (0x7fbff96f2b97 in /lib/x86_64-linux-gnu/libc.so.6)

The text was updated successfully, but these errors were encountered:

matlabninja · 2020-05-21T14:47:35Z

Update: I realized my mistake with
RUN export TORCH_CUDA_ARCH_LIST="6.0;7.0"

I have updated my Dockerfile to
ENV TORCH_CUDA_ARCH_LIST 6.0;7.0

Now AMP works as expected.

matlabninja closed this as completed May 21, 2020

atemate mentioned this issue Oct 5, 2020

Try fix apex non-working on all presets neuro-inc/neuro-base-environment#127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

matlabninja commented May 20, 2020 •

edited

Loading

matlabninja commented May 21, 2020

Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

Comments

matlabninja commented May 20, 2020 • edited Loading

matlabninja commented May 21, 2020

matlabninja commented May 20, 2020 •

edited

Loading