Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory #6

Open
DL-ljw opened this issue May 12, 2018 · 6 comments
Open

out of memory #6

DL-ljw opened this issue May 12, 2018 · 6 comments

Comments

@DL-ljw
Copy link

DL-ljw commented May 12, 2018

Sorry for bothering you again. When I train it with one 1080 GPU with batchsize of 1. I got the following mistakes. How can I solve it?

2018-05-10 13:42:49: step247692 image_name:000624.jpg |
rpn_loc_loss:0.189756244421 | rpn_cla_loss:0.214562356472 | rpn_total_loss:0.404318600893 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.00815858319402 | fast_rcnn_total_loss:0.00815858319402 |
total_loss:1.17546725273 | per_cost_time:0.65540599823s
out of memory
invalid argument
2018-05-10 13:42:53.349625: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:639] failed to record completion event; therefore, failed to create inter-stream dependency
2018-05-10 13:42:53.349637: I tensorflow/stream_executor/stream.cc:4138] stream 0x55cd063dc880 did not memcpy host-to-device; source: 0x7fa30b0da010
2018-05-10 13:42:53.349641: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x55cd063dc950: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2018-05-10 13:42:53.349647: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2018-05-10 13:42:53.349650: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
an illegal memory access was encountered
an illegal memory access was encountered

@powermano
Copy link

Same problem. After 5000 steps, the problem occurs.

@powermano
Copy link

powermano commented May 20, 2018

What is your cudnn version?

@DL-ljw
Copy link
Author

DL-ljw commented Jul 16, 2018

cuda8.0 cudnn5.0

@liqi-lizezhong
Copy link

I have met the same question, do you have solved it. and how to sovle this. Thanks

@powermano
Copy link

powermano commented Jan 14, 2019 via email

@clw5180
Copy link

clw5180 commented Aug 5, 2019

I found that by reducing the anchors can somehow alleviate this problem, you can reduce some angels or ratios in R-DFPN_FPN_Tensorflow/libs/configs/cfgs.py ANCHOR_ANGLES = [-90, -75, -60, -45, -30, -15] ANCHOR_RATIOS = [1/5., 5., 1/7., 7., 1/9, 9] I encountered the problem of CUDA_ERROR_ILLEGAL_ADDRESS error during training when the objects are densely located, so control the objects in your own dataset( reduce some really exsiting objects) can also alleviate this problem. It works but not all the time.

------------------ 原始邮件 ------------------ 发件人: "李泽中"[email protected]; 发送时间: 2019年1月14日(星期一) 下午4:19 收件人: "yangxue0827/R-DFPN_FPN_Tensorflow"[email protected]; 抄送: "victor"[email protected]; "Comment"[email protected]; 主题: Re: [yangxue0827/R-DFPN_FPN_Tensorflow] out of memory (#6) I have met the same question, do you have solved it. and how to sovle this. Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Thanks a lot! it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants