GPU memory usage #3

revisitq · 2021-11-05T12:23:29Z

The GPU memory usage reported in your paper is about 10G, but the GPU memory usage on my machine is about 18G when I train the model. Is there some different setting in the repo with your paper?

revisitq · 2021-11-05T13:58:05Z

The validation memory usage is about 7G and the SECOND is not loaded during validation.

xy-guo · 2021-11-07T14:45:33Z

Could you try run distributed training using only 1 gpu? The reason might be load the model in a single GPU multiple times.

xy-guo · 2021-11-07T14:45:56Z

Make sure you run the code using the script given in README

revisitq · 2021-11-08T01:47:44Z

Make sure you run the code using the script given in README

Thanks for your reply. I have try training with only 1 GPU by the command CUDA_VISIBLE_DEVICES='1' ./scripts/dist_train.sh 1 dev configs/stereo/kitti_models/liga.3d-and-bev.yaml, the GPU memory usage is still same. And here is the log.
log_train.txt

xy-guo · 2021-11-10T01:41:58Z

If you train on multiple GPUs, are GPU memory usage roughly the same for every GPU? My model is trained on TiTAN X, and its memory is only 12 GB. Maybe you can print out real GPU memory assumption using pytorch APIs, sometimes pytorch will allocate more GPU than needed.

revisitq · 2021-11-10T04:24:49Z

If you train on multiple GPUs, are GPU memory usage roughly the same for every GPU? My model is trained on TiTAN X, and its memory is only 12 GB. Maybe you can print out real GPU memory assumption using pytorch APIs, sometimes pytorch will allocate more GPU than needed.

Actually the memory allocated is about 10G, but I don't know why the GPU memory usage is about 18G.

revisitq · 2021-11-10T05:39:17Z

If you train on multiple GPUs, are GPU memory usage roughly the same for every GPU? My model is trained on TiTAN X, and its memory is only 12 GB. Maybe you can print out real GPU memory assumption using pytorch APIs, sometimes pytorch will allocate more GPU than needed.

Actually the memory allocated is about 10G, but I don't know why the GPU memory usage is about 18G.

When training on multi-gpus, the gpu memory usage is same for every GPU.

xy-guo · 2021-11-12T01:59:05Z

Maybe pytorch will pre-allocate GPU memory for future usage, which will not be freed automatically. Potential solutions include explicitly limiting GPU memory usage or torch.cuda.empty_cache() to free the cache.

revisitq · 2021-11-12T02:34:12Z

empty_cache

Thanks for help. I tried torch.cuda.empty_cache() but not working. I am looking for another solution.

zcspike · 2023-04-23T14:21:22Z

你好，能问一下，gpu oom的问题解决了吗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory usage #3

GPU memory usage #3

revisitq commented Nov 5, 2021

revisitq commented Nov 5, 2021

xy-guo commented Nov 7, 2021

xy-guo commented Nov 7, 2021

revisitq commented Nov 8, 2021 •

edited

Loading

xy-guo commented Nov 10, 2021

revisitq commented Nov 10, 2021

revisitq commented Nov 10, 2021

xy-guo commented Nov 12, 2021

revisitq commented Nov 12, 2021

zcspike commented Apr 23, 2023

GPU memory usage #3

GPU memory usage #3

Comments

revisitq commented Nov 5, 2021

revisitq commented Nov 5, 2021

xy-guo commented Nov 7, 2021

xy-guo commented Nov 7, 2021

revisitq commented Nov 8, 2021 • edited Loading

xy-guo commented Nov 10, 2021

revisitq commented Nov 10, 2021

revisitq commented Nov 10, 2021

xy-guo commented Nov 12, 2021

revisitq commented Nov 12, 2021

zcspike commented Apr 23, 2023

revisitq commented Nov 8, 2021 •

edited

Loading