发票数据集训练报错：Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB. #10247

dizhenx · 2023-06-27T02:19:06Z

下载的官方的发票数据集做训练。运行python tools/train.py -c ./fapiao/train_data/ser_vi_layoutxlm.yml -o Global.save_model_dir=./output/kie/ 报以下错误。batchsize和num_works都调为1了还是报错。单卡显存有11G，无其他程序占用。
Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)
为啥只能使用1个gpu，不能设置多卡训练吗？

The text was updated successfully, but these errors were encountered:

livingbody · 2023-06-30T03:28:05Z

可以多卡训练，用paddle.distributed.launch --gpus '0,1,2,3' 。。。
参考

# 单机多卡训练，通过 --gpus 参数设置使用的GPU ID
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

链接https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/detection.md

shiyutang · 2023-06-30T06:19:42Z

@livingbody 需要进一步实验看是否是就bs=1依旧超过11g显存

dizhenx · 2023-07-03T06:21:23Z

是的

livingbody · 2023-07-03T09:04:54Z

下载的官方的发票数据集做训练。运行python tools/train.py -c ./fapiao/train_data/ser_vi_layoutxlm.yml -o Global.save_model_dir=./output/kie/ 报以下错误。batchsize和num_works都调为1了还是报错。单卡显存有11G，无其他程序占用。

Error Message Summary:
ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.

If no, please decrease the batch size of your model.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)
为啥只能使用1个gpu，不能设置多卡训练吗？

能否在aistudio上分享项目给我，我查看下具体情况。

livingbody · 2023-07-03T15:39:57Z

是的

联系我微信：livingbody

adamzhg · 2023-10-16T10:52:06Z

@livingbody @dizhenx 请问问题解决了吗？
我这GPU是单卡，显存只有8G，也报类似的问题，是参考https://aistudio.baidu.com/projectdetail/4823162做的发票数据训练。

ericyeyeye · 2023-12-21T08:00:09Z

@livingbody @dizhenx 请问问题解决了吗？我这GPU是单卡，显存只有8G，也报类似的问题，是参考https://aistudio.baidu.com/projectdetail/4823162做的发票数据训练。

可以提供訓練的yml檔，才能瞭解你的設定，det model 中Eval batch_size_per_card只能設為1。

papersuper · 2024-02-19T01:45:58Z

请问最后解决了吗

UserWangZz · 2024-05-14T02:57:10Z

该issue长时间未更新，暂将此issue关闭，如有需要可重新开启。

paddle-bot bot assigned tink2123 Jun 27, 2023

shiyutang added the good first issue Good for newcomers label Jun 29, 2023

livingbody mentioned this issue Jun 30, 2023

🏅️飞桨套件快乐开源常规赛 #10223

Closed

shiyutang added expneeded need extra experiment to fix issue and removed expneeded need extra experiment to fix issue labels Jun 30, 2023

shiyutang added the expneeded need extra experiment to fix issue label Jun 30, 2023

UserWangZz closed this as completed May 14, 2024

paddle-bot bot added the status/close label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

发票数据集训练报错：Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB. #10247

发票数据集训练报错：Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB. #10247

dizhenx commented Jun 27, 2023

livingbody commented Jun 30, 2023

shiyutang commented Jun 30, 2023

dizhenx commented Jul 3, 2023

livingbody commented Jul 3, 2023

下载的官方的发票数据集做训练。运行python tools/train.py -c ./fapiao/train_data/ser_vi_layoutxlm.yml -o Global.save_model_dir=./output/kie/ 报以下错误。batchsize和num_works都调为1了还是报错。单卡显存有11G，无其他程序占用。

livingbody commented Jul 3, 2023

adamzhg commented Oct 16, 2023

ericyeyeye commented Dec 21, 2023 •

edited

Loading

papersuper commented Feb 19, 2024

UserWangZz commented May 14, 2024

发票数据集训练报错：Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB. #10247

发票数据集训练报错：Out of memory error on GPU 0. Cannot allocate 14.406982MB memory on GPU 0, 10.746094GB memory has been allocated and available memory is only 15.562500MB. #10247

Comments

dizhenx commented Jun 27, 2023

下载的官方的发票数据集做训练。运行python tools/train.py -c ./fapiao/train_data/ser_vi_layoutxlm.yml -o Global.save_model_dir=./output/kie/ 报以下错误。batchsize和num_works都调为1了还是报错。单卡显存有11G，无其他程序占用。 Error Message Summary:

livingbody commented Jun 30, 2023

shiyutang commented Jun 30, 2023

dizhenx commented Jul 3, 2023

livingbody commented Jul 3, 2023

下载的官方的发票数据集做训练。运行python tools/train.py -c ./fapiao/train_data/ser_vi_layoutxlm.yml -o Global.save_model_dir=./output/kie/ 报以下错误。batchsize和num_works都调为1了还是报错。单卡显存有11G，无其他程序占用。

livingbody commented Jul 3, 2023

adamzhg commented Oct 16, 2023

ericyeyeye commented Dec 21, 2023 • edited Loading

papersuper commented Feb 19, 2024

UserWangZz commented May 14, 2024

下载的官方的发票数据集做训练。运行python tools/train.py -c ./fapiao/train_data/ser_vi_layoutxlm.yml -o Global.save_model_dir=./output/kie/ 报以下错误。batchsize和num_works都调为1了还是报错。单卡显存有11G，无其他程序占用。
Error Message Summary:

ericyeyeye commented Dec 21, 2023 •

edited

Loading