Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

ImportError: .../maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration #48

Closed
laonb opened this issue Oct 27, 2018 · 10 comments

Comments

@laonb
Copy link

laonb commented Oct 27, 2018

❓ Questions and Help

Traceback (most recent call last):
File "webcam.py", line 6, in
from predictor import COCODemo
File "/home/laonb/github/maskrcnn-benchmark/demo/predictor.py", line 6, in
from maskrcnn_benchmark.modeling.detector import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/init.py", line 2, in
from .detectors import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in
from .generalized_rcnn import GeneralizedRCNN
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in
from ..backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/init.py", line 2, in
from .backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in
from . import resnet
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/resnet.py", line 19, in
from maskrcnn_benchmark.layers import FrozenBatchNorm2d
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

@laonb
Copy link
Author

laonb commented Oct 27, 2018

environment:
Ubuntu 16.04
Pytorch 1.0.0.dev20181027
cuda 9.0
cuDNN 7.1.4.18

@fmassa
Copy link
Contributor

fmassa commented Oct 28, 2018

Hi,
I've never seen this error before.
From looking around o the internet, some solutions have pointed out to a few wrong installations in the system, see e.g., horovod/horovod#274

Can you try doing

ldd maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so

From the maskrcnn folder?

@laonb
Copy link
Author

laonb commented Oct 29, 2018

@fmassa i run the ldd
laonb@LAONB-GPU:~/github/maskrcnn-benchmark/maskrcnn_benchmark$ ldd _C.cpython-36m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffde48d5000)
libcudart.so.9.0 => /home/laonb/anaconda3/lib/libcudart.so.9.0 (0x00007f2ae6743000)
libstdc++.so.6 => /home/laonb/anaconda3/lib/libstdc++.so.6 (0x00007f2ae6409000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2ae6100000)
libgcc_s.so.1 => /home/laonb/anaconda3/lib/libgcc_s.so.1 (0x00007f2ae5eee000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2ae5cd1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2ae5907000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2ae6c1b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2ae5703000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f2ae54fb000)

========
then rerun the python webcam.py --min-image-size 800
The error is the same as before.
Traceback (most recent call last):
File "webcam.py", line 6, in
from predictor import COCODemo
File "/home/laonb/github/maskrcnn-benchmark/demo/predictor.py", line 6, in
from maskrcnn_benchmark.modeling.detector import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/init.py", line 2, in
from .detectors import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in
from .generalized_rcnn import GeneralizedRCNN
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in
from ..backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/init.py", line 2, in
from .backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in
from . import resnet
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/resnet.py", line 19, in
from maskrcnn_benchmark.layers import FrozenBatchNorm2d
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

@fmassa
Copy link
Contributor

fmassa commented Oct 29, 2018

Could you please copy and paste the output from the environment collection script from PyTorch (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

I think there might be a clash between multiple CUDA versions in your machine

@soumith
Copy link
Member

soumith commented Oct 29, 2018

@laonb this happens because you have to conflicting versions of CUDA on your machine.

The output of:

nvcc --version

and the output of:

conda list |grep cuda

these both will determine the answer

@laonb
Copy link
Author

laonb commented Oct 29, 2018

@fmassa
laonb@LAONB-GPU:~/github$ python collect_env.py
Collecting environment information...
PyTorch version: 1.0.0.dev20181027
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 396.44
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.4
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch 0.4.0 py36hdf912b8_0 defaults
[conda] pytorch-nightly 1.0.0.dev20181027 py3.6_cuda9.0.176_cudnn7.1.2_0 pytorch
[conda] torchvision 0.2.1 py36_1 pytorch

@laonb
Copy link
Author

laonb commented Oct 29, 2018

@soumith

laonb@LAONB-GPU:~/github$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
laonb@LAONB-GPU:~/github$ conda list |grep cuda
cudatoolkit               9.0                  h13b8566_0    defaults
cudnn                     7.1.2                 cuda9.0_0    defaults
nccl                      1.3.5                 cuda9.0_0    defaults
pytorch-nightly           1.0.0.dev20181027 py3.6_cuda9.0.176_cudnn7.1.2_0    pytorch

@laonb
Copy link
Author

laonb commented Oct 29, 2018

I reinstall from scratch. And conda create a new env is running successful.
Failure before running must be caused by conflict.

@member123456
Copy link

member123456 commented Mar 23, 2019

I faced a similar problem while importing predictor.
>>> import predictor Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/demo/predictor.py", line 6, in <module> from maskrcnn_benchmark.modeling.detector import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/__init__.py", line 2, in <module> from .detectors import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in <module> from .generalized_rcnn import GeneralizedRCNN File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in <module> from ..backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/__init__.py", line 2, in <module> from .backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in <module> from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/make_layers.py", line 10, in <module> from maskrcnn_benchmark.layers import Conv2d File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 9, in <module> from .nms import nms File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in <module> from maskrcnn_benchmark import _C ImportError: /home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE

I am running this on a shared server and cannot install conda from scratch. Although I did create a new environment as mentioned in INSTALL.md. Please let me know what can I do here.

@P-DX
Copy link

P-DX commented Jun 27, 2019

I faced a similar problem while importing predictor.
>>> import predictor Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/demo/predictor.py", line 6, in <module> from maskrcnn_benchmark.modeling.detector import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/__init__.py", line 2, in <module> from .detectors import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in <module> from .generalized_rcnn import GeneralizedRCNN File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in <module> from ..backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/__init__.py", line 2, in <module> from .backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in <module> from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/make_layers.py", line 10, in <module> from maskrcnn_benchmark.layers import Conv2d File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 9, in <module> from .nms import nms File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in <module> from maskrcnn_benchmark import _C ImportError: /home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE

I am running this on a shared server and cannot install conda from scratch. Although I did create a new environment as mentioned in INSTALL.md. Please let me know what can I do here.

@member123456 i faced the same problem.
have you solved this problem? how to find out whats wrong with it?
thanks a lot!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants