No response when running models in benchmark/fluid using multiple GPUs #11360

sneaxiy · 2018-06-11T08:37:31Z

When running models in benchmark/fluid using multiple GPUs, there is no response and the job is finally killed after a long time.

The full logs are as follows (the example here uses the mnist model, but the other models perform the same as mnist when using multiple GPUs):

$ python fluid_benchmark.py --model mnist --device GPU --gpus 2
----------- Configuration Arguments -----------
batch_size: 32
cpus: 1
data_format: NCHW
data_path: 
data_set: flowers
device: GPU
gpus: 2
infer_only: False
iterations: 80
learning_rate: 0.001
memory_optimize: False
model: mnist
no_test: False
pass_num: 100
profile: False
skip_batch_num: 5
update_method: local
use_cprof: False
use_fake_data: False
use_nvprof: False
use_reader_op: False
------------------------------------------------
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/home/docker/runtime/overlay2/l/E37ZPPONYOSMCAEWBUTECLE7XH:/home/docker/runtime/overlay2/l/J44XLEFYM66NOFIC5IPPYM3K4B:/home/docker/runtime/overlay2/l/TD5AIZOAV4HDVDBHIYLYA5MRUL:/home/docker/runtime/overlay2/l/UYA3MLQG6SXOENF2VLWCELNMDP:/home/docker/runtime/overlay2/l/KLJNLEIE7ROJMKKQ47RAMGYCSN:/home/docker/runtime/overlay2/l/IZWN5DWNX4XJFYXEWLIXIFIKRZ:/home/docker/runtime/overlay2/l/26FH2HFFZ3E4KCBZ3LVABHDWMJ:/home/docker/runtime/overlay2/l/2MYKEYWTMFTEVD3VQGTHHGBQFX:'
Unexpected end of /proc/mounts line `/home/docker/runtime/overlay2/l/B3HS2GRKDXV2S54B77Y6OSRQQT:/home/docker/runtime/overlay2/l/RY7PSMDPDYS3Z2E6WGZXPT3PDA:/home/docker/runtime/overlay2/l/52PISTXM4OEKVDASJATIGRYKM6:/home/docker/runtime/overlay2/l/NVN7MSVHOTD46R6UB25AEAQYTH:/home/docker/runtime/overlay2/l/OEBXDOGRX6SV7AM5C6X6O3KZFA:/home/docker/runtime/overlay2/l/4RX22CUHDFVPR5BSJBMBCCXUPA:/home/docker/runtime/overlay2/l/UMY2SDMX3YOD4QCKGP7YV6M3XY:/home/docker/runtime/overlay2/l/LPAI2GCE2P6RKBPM6EMIOVQJQP:/home/docker/runtime/overlay2/l/T2DEZFB'
Unexpected end of /proc/mounts line `EAEYYE42XHYJPDEWUY2:/home/docker/runtime/overlay2/l/QUPTGODCA3UK265SVJDLOMHEA6:/home/docker/runtime/overlay2/l/A4PCMPPJRVCTFSKBRTQFFISCWN:/home/docker/runtime/overlay2/l/4UYJNH3ECSDCBKLBBLQPSGZES7:/home/docker/runtime/overlay2/l/FBHGT3GWMQ662T7M4GVHVGX6WC:/home/docker/runtime/overlay2/l/E3774UASMYNWEP56UJBTWIOQU3:/home/docker/runtime/overlay2/l/NKKTOWHYC5Q33FMISWOG2MXL76:/home/docker/runtime/overlay2/l/UPENBO6KPQAN36JVVJFJK26F5D:/home/docker/runtime/overlay2/l/JVOKLXJMTKGL3XFQAQ72QNFCFX:/home/docker/runtim'
Unexpected end of /proc/mounts line `e/overlay2/l/GGT2RDYNJYE2O44ZK4UXAUML4D:/home/docker/runtime/overlay2/l/ILDUOQZ4IBTPDC4GSE4XM52WAJ:/home/docker/runtime/overlay2/l/PANZPZDC65B7QHH4DLJVCJCXRF:/home/docker/runtime/overlay2/l/PEA7W6TUXBKYBTBBRWRUMA5SLL:/home/docker/runtime/overlay2/l/WVM37NIKDKQSYRICDKVWF24XRC:/home/docker/runtime/overlay2/l/SXQLH7XIGNOV4B4GZDU2TEXY6Q:/home/docker/runtime/overlay2/l/3PP46YBKQS2WYGYKDQJ66CIJ3J:/home/docker/runtime/overlay2/l/6VG4GBX4DQKY43QNUESKGZNETD:/home/docker/runtime/overlay2/l/I5M2XMBTVKBZZIVQVDQ2AANHLU'

After a long time, the job is killed automatically.

However, the models work well when using CPU or only one GPU. The tests are running on docker container.

The text was updated successfully, but these errors were encountered:

typhoonzero · 2018-06-21T02:38:05Z

This may fix by #11377, please reopen this issue, if it's still not resolved.

shanyi15 · 2018-08-15T11:00:53Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

sneaxiy assigned reyoung and typhoonzero Jun 11, 2018

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No response when running models in benchmark/fluid using multiple GPUs #11360

No response when running models in benchmark/fluid using multiple GPUs #11360

sneaxiy commented Jun 11, 2018

typhoonzero commented Jun 21, 2018

shanyi15 commented Aug 15, 2018

No response when running models in benchmark/fluid using multiple GPUs #11360

No response when running models in benchmark/fluid using multiple GPUs #11360

Comments

sneaxiy commented Jun 11, 2018

typhoonzero commented Jun 21, 2018

shanyi15 commented Aug 15, 2018