Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory usage #3

Open
revisitq opened this issue Nov 5, 2021 · 10 comments
Open

GPU memory usage #3

revisitq opened this issue Nov 5, 2021 · 10 comments

Comments

@revisitq
Copy link

revisitq commented Nov 5, 2021

The GPU memory usage reported in your paper is about 10G, but the GPU memory usage on my machine is about 18G when I train the model. Is there some different setting in the repo with your paper?
image

@revisitq
Copy link
Author

revisitq commented Nov 5, 2021

The validation memory usage is about 7G and the SECOND is not loaded during validation.
image

@xy-guo
Copy link
Owner

xy-guo commented Nov 7, 2021

Could you try run distributed training using only 1 gpu? The reason might be load the model in a single GPU multiple times.

@xy-guo
Copy link
Owner

xy-guo commented Nov 7, 2021

Make sure you run the code using the script given in README

@revisitq
Copy link
Author

revisitq commented Nov 8, 2021

Make sure you run the code using the script given in README

Thanks for your reply. I have try training with only 1 GPU by the command CUDA_VISIBLE_DEVICES='1' ./scripts/dist_train.sh 1 dev configs/stereo/kitti_models/liga.3d-and-bev.yaml, the GPU memory usage is still same. And here is the log.
log_train.txt

@xy-guo
Copy link
Owner

xy-guo commented Nov 10, 2021

If you train on multiple GPUs, are GPU memory usage roughly the same for every GPU? My model is trained on TiTAN X, and its memory is only 12 GB. Maybe you can print out real GPU memory assumption using pytorch APIs, sometimes pytorch will allocate more GPU than needed.

@revisitq
Copy link
Author

If you train on multiple GPUs, are GPU memory usage roughly the same for every GPU? My model is trained on TiTAN X, and its memory is only 12 GB. Maybe you can print out real GPU memory assumption using pytorch APIs, sometimes pytorch will allocate more GPU than needed.

Actually the memory allocated is about 10G, but I don't know why the GPU memory usage is about 18G.

@revisitq
Copy link
Author

If you train on multiple GPUs, are GPU memory usage roughly the same for every GPU? My model is trained on TiTAN X, and its memory is only 12 GB. Maybe you can print out real GPU memory assumption using pytorch APIs, sometimes pytorch will allocate more GPU than needed.

Actually the memory allocated is about 10G, but I don't know why the GPU memory usage is about 18G.

When training on multi-gpus, the gpu memory usage is same for every GPU.
image

@xy-guo
Copy link
Owner

xy-guo commented Nov 12, 2021

Maybe pytorch will pre-allocate GPU memory for future usage, which will not be freed automatically. Potential solutions include explicitly limiting GPU memory usage or torch.cuda.empty_cache() to free the cache.

@revisitq
Copy link
Author

empty_cache

Thanks for help. I tried torch.cuda.empty_cache() but not working. I am looking for another solution.

@zcspike
Copy link

zcspike commented Apr 23, 2023

你好,能问一下,gpu oom的问题解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants