out of memory #6

DL-ljw · 2018-05-12T02:49:57Z

Sorry for bothering you again. When I train it with one 1080 GPU with batchsize of 1. I got the following mistakes. How can I solve it?

2018-05-10 13:42:49: step247692 image_name:000624.jpg |
rpn_loc_loss:0.189756244421 | rpn_cla_loss:0.214562356472 | rpn_total_loss:0.404318600893 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.00815858319402 | fast_rcnn_total_loss:0.00815858319402 |
total_loss:1.17546725273 | per_cost_time:0.65540599823s
out of memory
invalid argument
2018-05-10 13:42:53.349625: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:639] failed to record completion event; therefore, failed to create inter-stream dependency
2018-05-10 13:42:53.349637: I tensorflow/stream_executor/stream.cc:4138] stream 0x55cd063dc880 did not memcpy host-to-device; source: 0x7fa30b0da010
2018-05-10 13:42:53.349641: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x55cd063dc950: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2018-05-10 13:42:53.349647: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2018-05-10 13:42:53.349650: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
an illegal memory access was encountered
an illegal memory access was encountered

powermano · 2018-05-20T08:50:09Z

Same problem. After 5000 steps, the problem occurs.

powermano · 2018-05-20T08:50:57Z

What is your cudnn version?

DL-ljw · 2018-07-16T08:09:07Z

cuda8.0 cudnn5.0

liqi-lizezhong · 2019-01-14T08:19:43Z

I have met the same question, do you have solved it. and how to sovle this. Thanks

powermano · 2019-01-14T08:40:32Z

I found that by reducing the anchors can somehow alleviate this problem, you can reduce some angels or ratios in R-DFPN_FPN_Tensorflow/libs/configs/cfgs.py ANCHOR_ANGLES = [-90, -75, -60, -45, -30, -15] ANCHOR_RATIOS = [1/5., 5., 1/7., 7., 1/9, 9] I encountered the problem of CUDA_ERROR_ILLEGAL_ADDRESS error during training when the objects are densely located, so control the objects in your own dataset( reduce some really exsiting objects) can also alleviate this problem. It works but not all the time.

…

------------------ 原始邮件 ------------------ 发件人: "李泽中"<[email protected]>; 发送时间: 2019年1月14日(星期一) 下午4:19 收件人: "yangxue0827/R-DFPN_FPN_Tensorflow"<[email protected]>; 抄送: "victor"<[email protected]>; "Comment"<[email protected]>; 主题: Re: [yangxue0827/R-DFPN_FPN_Tensorflow] out of memory (#6) I have met the same question, do you have solved it. and how to sovle this. Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

clw5180 · 2019-08-05T16:10:13Z

I found that by reducing the anchors can somehow alleviate this problem, you can reduce some angels or ratios in R-DFPN_FPN_Tensorflow/libs/configs/cfgs.py ANCHOR_ANGLES = [-90, -75, -60, -45, -30, -15] ANCHOR_RATIOS = [1/5., 5., 1/7., 7., 1/9, 9] I encountered the problem of CUDA_ERROR_ILLEGAL_ADDRESS error during training when the objects are densely located, so control the objects in your own dataset( reduce some really exsiting objects) can also alleviate this problem. It works but not all the time.
…
------------------ 原始邮件 ------------------ 发件人: "李泽中"[email protected]; 发送时间: 2019年1月14日(星期一) 下午4:19 收件人: "yangxue0827/R-DFPN_FPN_Tensorflow"[email protected]; 抄送: "victor"[email protected]; "Comment"[email protected]; 主题: Re: [yangxue0827/R-DFPN_FPN_Tensorflow] out of memory (#6) I have met the same question, do you have solved it. and how to sovle this. Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Thanks a lot! it works

clw5180 mentioned this issue Sep 25, 2019

out of memory clw5180/remote_sensing_object_detection_2019#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out of memory #6

out of memory #6

DL-ljw commented May 12, 2018

powermano commented May 20, 2018

powermano commented May 20, 2018 •

edited

Loading

DL-ljw commented Jul 16, 2018

liqi-lizezhong commented Jan 14, 2019

powermano commented Jan 14, 2019 via email

clw5180 commented Aug 5, 2019

out of memory #6

out of memory #6

Comments

DL-ljw commented May 12, 2018

powermano commented May 20, 2018

powermano commented May 20, 2018 • edited Loading

DL-ljw commented Jul 16, 2018

liqi-lizezhong commented Jan 14, 2019

powermano commented Jan 14, 2019 via email

clw5180 commented Aug 5, 2019

powermano commented May 20, 2018 •

edited

Loading