Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM during training #4

Open
yejr0229 opened this issue Jul 5, 2024 · 4 comments
Open

OOM during training #4

yejr0229 opened this issue Jul 5, 2024 · 4 comments

Comments

@yejr0229
Copy link

yejr0229 commented Jul 5, 2024

I meet OOM problem in refining, Here is my detailed error:
Refining...: 0%| | 0/1000 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/home/yejr/AIGC/Director3D-main/inference.py", line 93, in
result = system_gm_ldm.inference(sparse_cameras, text, dense_cameras=cameras, use_3d_mode_every_m_steps=args.use_3d_mode_every_m_steps, refiner=refiner)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yejr/AIGC/Director3D-main/system_gm_ldm.py", line 112, in inference
gaussians = refiner.refine_gaussians(result['gaussians'], text, dense_cameras=dense_cameras)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yejr/AIGC/Director3D-main/modules/refiners/sds_pp_refiner.py", line 242, in refine_gaussians
loss_latent_sds, loss_img_sds, loss_embedding = self.train_step(images_pred.squeeze(0), t, text_embeddings, uncond_text_embeddings, learnable_text_embeddings)
File "/home/yejr/AIGC/Director3D-main/modules/refiners/sds_pp_refiner.py", line 175, in train_step
images_pred = self.decode_latent(latents_pred).clamp(-1, 1)
File "/home/yejr/AIGC/Director3D-main/modules/refiners/sds_pp_refiner.py", line 126, in decode_latent
images = self.vae.decode(latents).sample
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 314, in decode
decoded = self._decode(z).sample
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 285, in _decode
dec = self.decoder(z)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/autoencoders/vae.py", line 337, in forward
sample = up_block(sample, latent_embeds)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 2746, in forward
hidden_states = resnet(hidden_states, temb=temb)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/resnet.py", line 327, in forward
hidden_states = self.norm1(hidden_states)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.69 GiB total capacity; 22.04 GiB already allocated; 168.88 MiB free; 22.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I use one RTX 3090 to run this command:
python inference.py --export_all --text "a delicious hamburger on a wooden table."
Could you please tell me how to solve this problem?

@imlixinyang
Copy link
Owner

We have tested some cases on a single RTX 3090. The memory cost is very close to the maximum so we gave a simple solution. You can try running the refining separately to avoid OOM. For example:

  1. generate camera and 3DGS without refining:
python inference.py --export_all --text '{text}' --num_refine_steps 0 --num_samples 4
  1. see the results in exps/tmp/videos and choose a sample (filename) for separate refining:
python refine.py --ply 'exps/tmp/ply/{filename}.ply'  --camera 'exps/tmp/camera/{filename}.npy' --export_all --text '{text}' --num_refine_steps 1000

This has been tested on a single T4 GPU (16 GB). Let me know if it works!

@yejr0229
Copy link
Author

yejr0229 commented Jul 5, 2024

Thanks for replying, it works just fine!

@githubnameoo
Copy link

We have tested some cases on a single RTX 3090. The memory cost is very close to the maximum so we gave a simple solution. You can try running the refining separately to avoid OOM. For example:

  1. generate camera and 3DGS without refining:
python inference.py --export_all --text '{text}' --num_refine_steps 0 --num_samples 4
  1. see the results in exps/tmp/videos and choose a sample (filename) for separate refining:
python refine.py --ply 'exps/tmp/ply/{filename}.ply'  --camera 'exps/tmp/camera/{filename}.npy' --export_all --text '{text}' --num_refine_steps 1000

This has been tested on a single T4 GPU (16 GB). Let me know if it works!

How much gpu memory is actually needed to test?

@imlixinyang
Copy link
Owner

Not a sure number. Since the number of Gaussian points varies for different scenes during refining, the GPU cost also varies.
Typically, 28GB is basically enough for running the refining jointly with generation, and 16GB is enough for running the refining separately.
@githubnameoo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants