Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu #37

happinesslz · 2020-07-22T10:03:13Z

I also meet the same bugs with #21

Tai-Wang · 2020-07-23T01:21:11Z

Please follow the template for "error report" to describe your issue. We need more information to debug.

BTW, the referenced issue is simply an "out of memory" error caused by adding codes unrelated to the codebase. You can check whether there is any possibility leading to this runtime error case.

happinesslz · 2020-07-23T02:06:08Z

@Tai-Wang Thanks for your reply.

Describe the bug
I can only run the pointpillar successfully without spconv. Maybe the spconv causes the error. I also try to reduce the number of "samples_per_gpu" to 1, but still get the same error. I run the code on Titan V+CUDA 10.1+pytorch=1.5.1/1.3.1+mmcv-full=1.0.3/1.0.2.

Reproduction
Did you make any modifications on the code or config? Did you understand what you have modified?
No.

Environment
sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.168
GPU 0,1: TITAN V
GCC: gcc (GCC) 5.4.0
PyTorch: 1.5.1
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.6.0a0+35d732a
OpenCV: 4.3.0
MMCV: 1.0.2
MMDetection: 2.3.0rc0+3c21dd0
MMDetection3D: 0.5.0+unknown
MMDetection3D Compiler: GCC 5.4
MMDetection3D CUDA Compiler: 10.1

Error traceback

2020-07-23 09:57:07,433 - mmdet - INFO - Start running, host: zliu@1035, work_dir: /home/zliu/mmdetection3d_1035/work_dirs/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class
2020-07-23 09:57:07,433 - mmdet - INFO - workflow: [('train', 1)], max: 40 epochs
Traceback (most recent call last):
  File "./tools/train.py", line 166, in <module>
    main()
  File "./tools/train.py", line 162, in main
    meta=meta)
  File "/home/zliu/mmdetection3d_1035/mmdetection/mmdet/apis/train.py", line 128, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in train
    **kwargs)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 31, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/zliu/mmdetection3d_1035/mmdetection/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/models/detectors/base.py", line 60, in forward
    return self.forward_train(**kwargs)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/models/detectors/mvx_two_stage.py", line 266, in forward_train
    points, img=img, img_metas=img_metas)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/models/detectors/mvx_two_stage.py", line 201, in extract_feat
    pts_feats = self.extract_pts_feat(points, img_feats, img_metas)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/models/detectors/mvx_faster_rcnn.py", line 54, in extract_pts_feat
    x = self.pts_middle_encoder(voxel_features, feature_coors, batch_size)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/models/middle_encoders/sparse_encoder.py", line 97, in forward
    x = self.conv_input(input_sp_tensor)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/ops/spconv/modules.py", line 130, in forward
    input = module(input)
  File "/home/zliu/anaconda3/envs/mmdetection3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/ops/spconv/conv.py", line 168, in forward
    grid=input.grid)
  File "/home/zliu/mmdetection3d_1035/mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    int(transpose))
RuntimeError: /home/zliu/mmdetection3d_1035/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2

Tai-Wang · 2020-07-23T02:12:23Z

I think there are no problems in the provided information. Could you please observe the CPU/GPU memory cost when running the code? Just to confirm it is an out of memory error. Also please tell us which config you are using.

happinesslz · 2020-07-23T02:22:55Z

@Tai-Wang
The config file is dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py in configs/mvxnet.

I also observe the CPU/GPU memory cost is normal, which just costs about 6G for GPU memory. Then this error occurred. I think 12G GPU memory is enough for running the config. I also reduce the number of "samples_per_gpu" to 1, but still obtain the same error.

Tai-Wang · 2020-07-23T03:02:29Z

I have just checked this config, and it runs well on my 2080Ti with about 8.5G for GPU memory.

Maybe you can first check whether the code still takes up the memory after the error happens. Then please try other configs using spconv, like configs with hard voxelization hv_second_secfpn_6x8_80e_kitti-3d-3class.py in the configs/second.

happinesslz · 2020-07-23T03:29:12Z

Other configs with spconv will meet similar bugs. Can you provide your conda environment in details? Thanks.

Tai-Wang · 2020-07-23T04:06:27Z

Other configs with spconv will meet similar bugs. Can you provide your conda environment in details? Thanks.

You can refer to this comment for my environment. From my experience, cuda 10.0-10.2, pytorch 1.4.0-1.5.1, python 3.7 and gcc 5.4/5.5 should be ok.

happinesslz · 2020-07-23T05:06:16Z

Thanks, I will have a try on RTX2080Ti.

niezhongliang · 2020-07-29T08:52:24Z

Thanks, I will have a try on RTX2080Ti.

Hi, I meet the same question with you, when I run pcd_demo.py in the server. However, I tested it on my personal desktop, it could run. Have you solved it?

WWW2323 · 2021-06-13T03:36:21Z

Thanks, I will have a try on RTX2080Ti.

Hi, did you run this project on Titan V or 1080Ti before? I want to figure out if it is caused by GPU type, because i can run this project on 2080Ti, but fail on Titan V.

ArthDh · 2021-08-26T03:21:52Z

Hi,
Has there been any fix regarding this? Facing similar issue. I am able to train configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-car.py
However, run into:

Traceback (most recent call last):
  File "tools/train.py", line 233, in <module>
    main()
  File "tools/train.py", line 222, in main
    train_model(
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/apis/train.py", line 27, in train_model
    train_detector(
  File "/opt/conda/lib/python3.8/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/opt/conda/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/models/detectors/base.py", line 58, in forward
    return self.forward_train(**kwargs)
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 272, in forward_train
    img_feats, pts_feats = self.extract_feat(
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 207, in extract_feat
    pts_feats = self.extract_pts_feat(points, img_feats, img_metas)
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/models/detectors/mvx_faster_rcnn.py", line 56, in extract_pts_feat
    x = self.pts_middle_encoder(voxel_features, feature_coors, batch_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/models/middle_encoders/sparse_encoder.py", line 112, in forward
    x = self.conv_input(input_sp_tensor)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/ops/spconv/modules.py", line 130, in forward
    input = module(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/ops/spconv/conv.py", line 183, in forward
    out_features = Fsp.indice_subm_conv(features, self.weight,
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/ops/spconv/functional.py", line 64, in forward
    return ops.indice_conv(features, filters, indice_pairs,
  File "/home/svcl-oowl/arth/SVCL/mmdetection3d/mmdet3d/ops/spconv/ops.py", line 116, in indice_conv
    return sparse_conv_ext.indice_conv_fp32(features, filters,
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdca365a2f2 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fdca365767b in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7fdca38b21f9 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fdca36423a4 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e473a (0x7fdd1746673a in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e47d1 (0x7fdd174667d1 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #26: __libc_start_main + 0xe7 (0x7fdd28f9cbf7 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

When trying to run:
configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py

My environment info is as follows:

2021-08-26 03:17:52,042 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
CUDA available: True
GPU 0,1: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.0+cu111
OpenCV: 4.5.3
MMCV: 1.3.10
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.15.0
MMSegmentation: 0.16.0
MMDetection3D: 0.15.0+1b2e64c

Any help would be appreciated!

* [Feature] Add test tool to evaluate backend models on det and cls datasets (open-mmlab#26) * add test tool and re-orgnize apis.utils * handle topk and refine codes * add cls export and test support * fix lint * move ort into wrapper * resolve conflicts * resolve comments * resolve conflicts * resolve comments and padding mrcnn * resolve comments * Fix: [0, ...] tensor bug * check the format Co-authored-by: AllentDan <[email protected]> Co-authored-by: zhouyifan <PJLAB\[email protected]>

happinesslz closed this as completed Jul 23, 2020

Tai-Wang mentioned this issue Aug 26, 2020

RuntimeError:mmdet3d/ops/spconv/src/indice_cuda.cu 124 #80

Closed

Tai-Wang mentioned this issue Sep 12, 2021

RuntimeError: mmdet3d/ops/spconv/src/indice_cuda.cu 124 #929

Closed

jacoblambert mentioned this issue Dec 9, 2021

Does Waymo use multiple sweeps? How to work with multisweep models. #1089

Open

jumptiger66 mentioned this issue Apr 25, 2023

Setting --gpu-id 3 when i train ScanNet on Votenet ，i got error #2471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu #37

Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu #37

happinesslz commented Jul 22, 2020

Tai-Wang commented Jul 23, 2020

happinesslz commented Jul 23, 2020

Tai-Wang commented Jul 23, 2020 •

edited

Loading

happinesslz commented Jul 23, 2020

Tai-Wang commented Jul 23, 2020

happinesslz commented Jul 23, 2020

Tai-Wang commented Jul 23, 2020

happinesslz commented Jul 23, 2020

niezhongliang commented Jul 29, 2020

WWW2323 commented Jun 13, 2021

ArthDh commented Aug 26, 2021 •

edited

Loading

Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu #37

Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu #37

Comments

happinesslz commented Jul 22, 2020

Tai-Wang commented Jul 23, 2020

happinesslz commented Jul 23, 2020

Tai-Wang commented Jul 23, 2020 • edited Loading

happinesslz commented Jul 23, 2020

Tai-Wang commented Jul 23, 2020

happinesslz commented Jul 23, 2020

Tai-Wang commented Jul 23, 2020

happinesslz commented Jul 23, 2020

niezhongliang commented Jul 29, 2020

WWW2323 commented Jun 13, 2021

ArthDh commented Aug 26, 2021 • edited Loading

Tai-Wang commented Jul 23, 2020 •

edited

Loading

ArthDh commented Aug 26, 2021 •

edited

Loading