Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: an illegal memory access was encountered #1077

Open
tdzdog opened this issue Nov 20, 2024 · 4 comments
Open

RuntimeError: CUDA error: an illegal memory access was encountered #1077

tdzdog opened this issue Nov 20, 2024 · 4 comments

Comments

@tdzdog
Copy link

tdzdog commented Nov 20, 2024

Recently, I met a strange error. The gaussian splatting fails during runing as follows:

Optimizing
Output folder: ./output/e0d441e3-b [20/11 10:29:01]
Tensorboard not available: not logging progress [20/11 10:29:01]
Found transforms_train.json file, assuming Blender data set! [20/11 10:29:01]
Reading Training Transforms [20/11 10:29:01]
Reading Test Transforms [20/11 10:29:05]
Loading Training Cameras [20/11 10:29:13]
Loading Test Cameras [20/11 10:29:21]
Number of points at initialisation : 100000 [20/11 10:29:21]
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 282, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 111, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, bg, use_trained_exp=dataset.train_test_exp, separate_sh=SPARSE_ADAM_AVAILABLE)
File "/home/fengxy1/vscode-projects/research/3DGS/gaussian-splatting/gaussian_renderer/init.py", line 119, in render
rendered_image = rendered_image.clamp(0, 1)
RuntimeError: CUDA error: an illegal memory access was encountered
Training progress: 0%| | 0/30000 [00:01<?, ?it/s]

The GPU is RTX 3090 with cuda=11.3. The conda environment dependence is:

Package Version


Brotli 1.0.9
certifi 2024.8.30
charset-normalizer 3.3.2
diff_gaussian_rasterization 0.0.0
fused_ssim 0.0.0
idna 3.7
joblib 1.4.2
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
numpy 1.24.3
opencv-python 4.10.0.84
pillow 10.4.0
pip 24.2
plyfile 0.8.1
PySocks 1.7.1
requests 2.32.3
setuptools 75.1.0
simple_knn 0.0.0
six 1.16.0
torch 1.12.1
torchaudio 0.12.1
torchvision 0.13.1
tqdm 4.67.0
typing_extensions 4.11.0
urllib3 2.2.3
wheel 0.44.0

I have tried to establish an entirely new environment but the error still exists. Can anyone help?

@rylynchen
Copy link

rylynchen commented Nov 22, 2024

I got same error when switch submodule diff-gaussian-rasterization to branch 3dgs_accel.
Anyone have a goods idea ?

xxx/lib/models/gaussian_renderers/base_gaussian_renderer.py", line 218, in _render_kernel
    xyz = torch.cat([v for v in renderer_data['xyz'].values()]).float().cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

And when I set raster_settings debug=True, got more debug info :

[CUDA ERROR] in xx/submodules/diff-gaussian-rasterization-accel/cuda_rasterizer/rasterizer_impl.cu
Line 462: an illegal memory access was encountered
An error occured in forward. Please forward snapshot_fw.dump for debugging.
Traceback (most recent call last):
  File "train.py", line 464, in <module>
    training()
  File "train.py", line 168, in training
    render_pkg = gaussians_renderer.render(viewpoint_camera=viewpoint_cam, pc=gaussians_nsg,
  File "xx/lib/models/gaussian_renderers/base_gaussian_renderer.py", line 89, in render
    obj_render_pkg = self._render_kernel(
  File "xx/lib/models/gaussian_renderers/base_gaussian_renderer.py", line 266, in _render_kernel
    rendered_color, radii, rendered_depth = rasterizer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 228, in forward
    return rasterize_gaussians(
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 33, in rasterize_gaussians
    return _RasterizeGaussians.apply(
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 96, in forward
    raise ex
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 92, in forward
    num_rendered, num_buckets, color, invdepths, radii, geomBuffer, binningBuffer, imgBuffer, sampleBuffer = _C.rasterize_gaussians(*args)
RuntimeError: an illegal memory access was encountered
  6%|▌         | 3430/60000 [1:21:24<22:22:41,  1.42s/it, Exp=xx, Loss=0.1596127,, PSNR=18.9039]

@unanan
Copy link

unanan commented Dec 3, 2024

@rylynchen have you solved it?

@rylynchen
Copy link

@rylynchen have you solved it?

No yet, finally revert branch to inv-depth

@king1111sadjfoisja
Copy link

I got same error when switch submodule diff-gaussian-rasterization to branch 3dgs_accel. Anyone have a goods idea ?

xxx/lib/models/gaussian_renderers/base_gaussian_renderer.py", line 218, in _render_kernel
    xyz = torch.cat([v for v in renderer_data['xyz'].values()]).float().cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

And when I set raster_settings debug=True, got more debug info :

[CUDA ERROR] in xx/submodules/diff-gaussian-rasterization-accel/cuda_rasterizer/rasterizer_impl.cu
Line 462: an illegal memory access was encountered
An error occured in forward. Please forward snapshot_fw.dump for debugging.
Traceback (most recent call last):
  File "train.py", line 464, in <module>
    training()
  File "train.py", line 168, in training
    render_pkg = gaussians_renderer.render(viewpoint_camera=viewpoint_cam, pc=gaussians_nsg,
  File "xx/lib/models/gaussian_renderers/base_gaussian_renderer.py", line 89, in render
    obj_render_pkg = self._render_kernel(
  File "xx/lib/models/gaussian_renderers/base_gaussian_renderer.py", line 266, in _render_kernel
    rendered_color, radii, rendered_depth = rasterizer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 228, in forward
    return rasterize_gaussians(
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 33, in rasterize_gaussians
    return _RasterizeGaussians.apply(
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 96, in forward
    raise ex
  File "xx/lib/python3.8/site-packages/diff_gaussian_rasterization/__init__.py", line 92, in forward
    num_rendered, num_buckets, color, invdepths, radii, geomBuffer, binningBuffer, imgBuffer, sampleBuffer = _C.rasterize_gaussians(*args)
RuntimeError: an illegal memory access was encountered
  6%|▌         | 3430/60000 [1:21:24<22:22:41,  1.42s/it, Exp=xx, Loss=0.1596127,, PSNR=18.9039]

Hi,when switching submodule diff-gaussian-rasterization to branch 3dgs_accel,I got the following error:

Image
In English,it means that:
fatal: not a git repository (or any of the parent directories): .git
Have you encountered a situation like this before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants