-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM during training #4
Comments
We have tested some cases on a single RTX 3090. The memory cost is very close to the maximum so we gave a simple solution. You can try running the refining separately to avoid OOM. For example:
python inference.py --export_all --text '{text}' --num_refine_steps 0 --num_samples 4
python refine.py --ply 'exps/tmp/ply/{filename}.ply' --camera 'exps/tmp/camera/{filename}.npy' --export_all --text '{text}' --num_refine_steps 1000 This has been tested on a single T4 GPU (16 GB). Let me know if it works! |
Thanks for replying, it works just fine! |
How much gpu memory is actually needed to test? |
Not a sure number. Since the number of Gaussian points varies for different scenes during refining, the GPU cost also varies. |
I meet OOM problem in refining, Here is my detailed error:
Refining...: 0%| | 0/1000 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/home/yejr/AIGC/Director3D-main/inference.py", line 93, in
result = system_gm_ldm.inference(sparse_cameras, text, dense_cameras=cameras, use_3d_mode_every_m_steps=args.use_3d_mode_every_m_steps, refiner=refiner)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yejr/AIGC/Director3D-main/system_gm_ldm.py", line 112, in inference
gaussians = refiner.refine_gaussians(result['gaussians'], text, dense_cameras=dense_cameras)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yejr/AIGC/Director3D-main/modules/refiners/sds_pp_refiner.py", line 242, in refine_gaussians
loss_latent_sds, loss_img_sds, loss_embedding = self.train_step(images_pred.squeeze(0), t, text_embeddings, uncond_text_embeddings, learnable_text_embeddings)
File "/home/yejr/AIGC/Director3D-main/modules/refiners/sds_pp_refiner.py", line 175, in train_step
images_pred = self.decode_latent(latents_pred).clamp(-1, 1)
File "/home/yejr/AIGC/Director3D-main/modules/refiners/sds_pp_refiner.py", line 126, in decode_latent
images = self.vae.decode(latents).sample
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 314, in decode
decoded = self._decode(z).sample
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 285, in _decode
dec = self.decoder(z)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/autoencoders/vae.py", line 337, in forward
sample = up_block(sample, latent_embeds)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 2746, in forward
hidden_states = resnet(hidden_states, temb=temb)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/diffusers/models/resnet.py", line 327, in forward
hidden_states = self.norm1(hidden_states)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/media/data4/yejr/conda_env/director3d/lib/python3.9/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.69 GiB total capacity; 22.04 GiB already allocated; 168.88 MiB free; 22.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I use one RTX 3090 to run this command:
python inference.py --export_all --text "a delicious hamburger on a wooden table."
Could you please tell me how to solve this problem?
The text was updated successfully, but these errors were encountered: