Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

RuntimeError: CUDA error: a PTX JIT compilation failed #519

Open
rudgerte opened this issue Mar 1, 2019 · 7 comments
Open

RuntimeError: CUDA error: a PTX JIT compilation failed #519

rudgerte opened this issue Mar 1, 2019 · 7 comments

Comments

@rudgerte
Copy link

rudgerte commented Mar 1, 2019

❓ Questions and Help

I try to run the webcam.py demo and get the following error:

...
File "/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms
keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: CUDA error: a PTX JIT compilation failed (launch_kernel at /opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/ATen/native/cuda/Loops.cuh:62)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fb8b1b2fcf5 in /home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: void at::native::gpu_index_kernel<__nv_dl_wrapper_t<__nv_dl_tag<void ()(at::TensorIterator&, c10::ArrayRef, c10::ArrayRef), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef, c10::ArrayRef)), 1u>> >(at::TensorIterator&, c10::ArrayRef, c10::ArrayRef, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::TensorIterator&, c10::ArrayRef, c10::ArrayRef), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef, c10::ArrayRef)), 1u>> const&) + 0x339 (0x7fb8b73763f9 in /home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: + 0x27999b6 (0x7fb8b73719b6 in /home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: + 0x279a1e5 (0x7fb8b73721e5 in /home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: + 0x6847da (0x7fb8b23c57da in /home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
...

Collected environment information:
PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.1.105
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 415.27
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.14.3
[pip] numpydoc==0.8.0
[conda] blas 1.0 mkl
[conda] mkl 2018.0.2 intel_1 intel
[conda] mkl_fft 1.0.1 np114py36_intel_0 [intel] intel
[conda] mkl_random 1.0.1 np114py36_intel_0 [intel] intel

@fmassa
Copy link
Contributor

fmassa commented Mar 1, 2019

you probably have a version mismatch between CUDA and your environment? Can you remove the build folder and try compiling maskrcnn-benchmark again?

@rudgerte
Copy link
Author

rudgerte commented Mar 1, 2019

Ok, I removed build folder and ran
python setup.py build_ext install

Here's what I've got
creating build/temp.linux-x86_64-3.6/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda
gcc -pthread -B /home/mvp/anaconda3/envs/mrcnn/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/csrc -I/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/include -I/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/include/TH -I/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/home/mvp/anaconda3/pkgs/cudnn-5.1-0/lib/libcudnn.so.5/include -I/home/mvp/anaconda3/envs/mrcnn/include/python3.6m -c /home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/csrc/vision.cpp -o build/temp.linux-x86_64-3.6/home/mvp/anaconda3/envs/mrcnn/lib/python3.6/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: error: /home/mvp/anaconda3/pkgs/cudnn-5.1-0/lib/libcudnn.so.5/include: Not a directory

@fmassa
Copy link
Contributor

fmassa commented Mar 1, 2019

Can you try following the installation instructions in INSTALL.md:

python setup.py build develop

@rudgerte
Copy link
Author

rudgerte commented Mar 1, 2019

I created the new conda environment and followed the instructions from #509
Got the following error:
File "/home/mvp/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms
keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: Not compiled with GPU support (nms at /home/mvp/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/maskrcnn-benchmark/maskrcnn_benchmark/csrc/nms.h:22)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fa370674cf5 in /home/mvp/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/lib/libc10.so)

The setup.py imports function _find_cuda_home() from torch/utils/cpp_extension.py, where the line 38 is
cuda_home = '/usr/local/cuda'
Should I install local cuda-toolkit-9.0 or the anaconda's toolkit files should be enough? I didn't find nvcc in anaconda.

@fmassa
Copy link
Contributor

fmassa commented Mar 5, 2019

Hey,

I would make sure that your installation of CUDA / nvcc are visible from the default paths. If _find_cuda_home() is not working for you, you can comment out the part CUDA_HOME is not None in

if torch.cuda.is_available() and CUDA_HOME is not None:

and recompile again. But this might be a sign that your install of CUDA is not in a standard place, or that you don't have all the environment variables setup properly

@den-run-ai
Copy link

i had this error when cupy is imported and some of its methods called, then pytorch failes. when cupy is removed from the runtime, then pytorch executes normally @fmassa

@den-run-ai
Copy link

related: pytorch/pytorch#21004

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants