RuntimeError: CUDA error: out of memory #120

zimenglan-sysu-512 · 2018-11-06T06:20:56Z

❓ Questions and Help

when train my own dataset using Resnet101 backbone after 27k iterations, it always encouters this problem as below:

File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
    losses.backward()
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: out of memory

btw, the input size is set to be (800, 1333).

The text was updated successfully, but these errors were encountered:

fmassa · 2018-11-06T09:28:47Z

It's difficult to say where the problem comes from.

If your dataset might contain a large number of boxes in the same image, then I'd say that your issue might be related to #18, where we propose a few workaround solutions.

Apart from that, without further information it's difficult to say what else could be causing the OOM.

zimenglan-sysu-512 · 2018-11-06T11:22:57Z

thanks @fmassa

zimenglan-sysu-512 · 2018-11-06T11:23:02Z

thanks @fmassa

zimenglan-sysu-512 · 2018-11-13T10:36:27Z

hi @fmassa

when i reduce the IMS_PER_BATCH to 8 for 8 GPUs and use resnet50 as backbone, to train my own dataset, it encounters the problem as below:

File "maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 84, in boxlist_iou
    wh = (rb - lt + TO_REMOVE).clamp(min=0)  # [N,M,2]
RuntimeError: CUDA error: out of memory

do u have any suggestions to solve this problem?
thanks!

fmassa · 2018-11-13T12:25:34Z

Do you have a large number of boxes per image in your dataset?
If that's the case, then your problem might be related to #18 , and a possible solution is to move IoU computation to the CPU while we don't add custom kernels for box IoU

zimenglan-sysu-512 · 2018-11-14T10:23:20Z

hi @fmassa
the maximum number of gt boxes in my dataset is 60. i have no idea to deal with it.

fmassa · 2018-11-14T13:08:48Z

This is the maximum number of boxes in a single image?
Can you try making the box iou computation run on the CPU, as I explained just before, and see if you run out of memory?

zimenglan-sysu-512 · 2018-11-14T13:25:08Z

hi @fmassa
yet, it's in a single image. I have tried what u say, but meet other problems. i will give my results on cpu mode after i fix these problems.

zimenglan-sysu-512 · 2018-11-14T14:11:37Z

hi @fmassa
i add the code after this line as below:

    USE_CPU_MODE = True
    if USE_CPU_MODE and N >= 20:
        device = box1.device
        box1 = box1.cpu() # ground-truths
        box2 = box2.cpu() # predictions
        lt = torch.max(box1[:, None, :2], box2[:, :2]).cpu()  # [N,M,2]
        rb = torch.min(box1[:, None, 2:], box2[:, 2:]).cpu()  # [N,M,2]

        TO_REMOVE = 1

        wh = (rb - lt + TO_REMOVE).clamp(min=0).cpu()  # [N,M,2]
        inter = wh[:, :, 0] * wh[:, :, 1]  # [N,M]

        iou = inter.cpu() / (area1[:, None].cpu() + area2.cpu() - inter.cpu())
        iou = iou.to(device)
        return iou

if the number of gt boxes is lager or equal to 20, use cpu to compute IoU, otherwise use gpu mode. beside, i use multi-scales (=(700, 800, 900)), set MAX_SIZE_TRAIN to 1440 and use single image per gpu. finally it works, but the speed slows a lot ( about 16% more time than gpu mode, and the gpu memory of one or two of gpus reaches 9489MiB).

thanks for your help @fmassa

fmassa · 2018-11-14T15:17:11Z

Here is a simplified implementation:

device = box1.device
if USE_CPU_MODE and N >= 20:
    box1 = box1.cpu()
    box2 = box2.cpu()
...
# as before, no need to cast
# to .cpu() all the time
return iou.to(device)

So, just to see if I understand it properly, now your OOM error is gone, is that right?

This issue will be better fixed once we add a box iou implementation which is entirely in cuda. This will save a lot of memory I think.

zimenglan-sysu-512 · 2018-11-15T02:21:05Z

hi @fmassa
it is still OOM as below:

Traceback (most recent call last):
  File "tools/train_net.py", line 170, in <module>
    main()
  File "tools/train_net.py", line 163, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
    loss_dict = model(images, targets)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/deprecated/distributed.py", line 222, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 100, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 119, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 91, in __call__
    labels, regression_targets = self.prepare_targets(anchors, targets)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 55, in prepare_targets
    anchors_per_image, targets_per_image
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 38, in match_targets_to_anchors
    matched_idxs = self.proposal_matcher(match_quality_matrix)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 85, in __call__
    self.set_low_quality_matches_(matches, all_matches, match_quality_matrix)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 101, in set_low_quality_matches_
    match_quality_matrix == highest_quality_foreach_gt[:, None]
RuntimeError: CUDA error: out of memory

zimenglan-sysu-512 · 2018-11-15T07:15:20Z

hi @fmassa
if i reduce the input size, it can solve th OOM. but another problem is that if i use GTX TiTan instead of 1080 Ti, the training procedure will be held on and get stuck. it is weird.

fmassa · 2018-11-15T10:43:06Z

About the OOM, it might be due to many reasons, and I might need more information on the particularities of your dataset to be able to help you more.

About the hang, are you still using the same machine or different machines?
If you are using different machines, maybe your nvidia drivers are not up-to-date and you are facing deadlocks similarly to #58 ?

zimenglan-sysu-512 · 2018-11-16T02:44:47Z

hi @fmassa
my own dataset has 17 categories, and the maximum number of gt boxes in one image is 26. the image is not to large, the max size of these images is less then 1200. btw, my driver version is as below:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.130  Wed Mar 21 03:37:26 PDT 2018

thanks

zimenglan-sysu-512 · 2018-11-16T05:41:36Z

hi @fmassa
i update the driver from 384 to 390, the training procedure still hangs, and i use cuda8.0.61 and GTX Titan (12G) card. by the way, i use cpu to compute the IoU, the memory looks a litte strange, as below:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     13630      C   /usr/bin/python3.6                          5411MiB |
|    1     13631      C   /usr/bin/python3.6                          5325MiB |
|    2     13632      C   /usr/bin/python3.6                          5009MiB |
|    3     13633      C   /usr/bin/python3.6                          4339MiB |
|    4     13634      C   /usr/bin/python3.6                          5097MiB |
|    5     13635      C   /usr/bin/python3.6                          4873MiB |
|    6     13637      C   /usr/bin/python3.6                         11099MiB |
|    7     13638      C   /usr/bin/python3.6                          4231MiB |
+-----------------------------------------------------------------------------+

OOM as below:
Tried to allocate 7.09 GiB (GPU 6; 10.92 GiB total capacity; 3.73 GiB already allocated; 6.09 GiB free; 50.97 MiB cached)

fmassa · 2018-11-16T09:42:12Z

I think there might be some incompatibilities with your driver and your CUDA version.

So, by checking your previous driver version (384.130), you can see from here that it was before the bugfix, and thus the hang.

Can you update to CUDA 9.2 and install driver >=396.26 ? This will definitely fix your problems.

zimenglan-sysu-512 · 2018-11-16T10:50:29Z

thanks @fmassa .
after update ubuntu 14.04 to 16.04, i will try what u suggest, and then report my results here.
thanks again.

zimenglan-sysu-512 · 2018-11-19T06:53:31Z

hi @fmassa,

The OOM problem has been solved. beacuse i duplicated the ground-truths several times, making the number of gt bboxes to be 2k. (very sorry for that). btw, if using cpu to compute the IoUs for prediction and gt, not only need to modify these lines, but also need to pay attention to the few lines: so that it can deal with a large amout of gt bboxes at cost of slowing the training speed (maybe training time is doubled).

about the hanging, since i upgrade ubuntu 14.04 to 16.04, install cuda 9.0 (or cuda 9.2) with difference nvidia-drivers (390, 396, 410), it sometime happens. as @chengyangfu said, when use nvidia-driver 410, the frequency is much lower.

thanks!

fmassa · 2018-11-19T09:49:17Z

Cool, great that it's working now.

About the modifications, I'd say that you could move the data back to GPU in the end of boxlist if you have enough memory to hold it.

Let us know if you have further questions.

yuchenrao-bg · 2019-08-08T20:36:45Z

I also have the same problem. I noticed that when N > 200 (maybe smaller than 200) will show this error. I didn't change the calculation to cpu. I just use torch.cuda.empty_cache() fro each batch, which seems okay for my situation.

hetolin · 2020-07-13T09:51:46Z

hi @fmassa
it is still OOM as below:

Traceback (most recent call last):
  File "tools/train_net.py", line 170, in <module>
    main()
  File "tools/train_net.py", line 163, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
    loss_dict = model(images, targets)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/deprecated/distributed.py", line 222, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 100, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 119, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 91, in __call__
    labels, regression_targets = self.prepare_targets(anchors, targets)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 55, in prepare_targets
    anchors_per_image, targets_per_image
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 38, in match_targets_to_anchors
    matched_idxs = self.proposal_matcher(match_quality_matrix)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 85, in __call__
    self.set_low_quality_matches_(matches, all_matches, match_quality_matrix)
  File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 101, in set_low_quality_matches_
    match_quality_matrix == highest_quality_foreach_gt[:, None]
RuntimeError: CUDA error: out of memory

hi @zimenglan-sysu-512 , I tried making the box iou computation run on the CPU as you do:
USE_CPU_MODE = True if USE_CPU_MODE and N >= 20: device = box1.device ... iou = iou.to(device) return iou
but I meet the same errors as above. how did you solve that? Is it necessary to modify something in /maskrcnn_benchmark/modeling/mather.py?

hetolin · 2020-07-13T09:53:29Z

I also have the same problem. I noticed that when N > 200 (maybe smaller than 200) will show this error. I didn't change the calculation to cpu. I just use torch.cuda.empty_cache() fro each batch, which seems okay for my situation.

hi @yuchenrao-bg Could you please tell me where you add torch.cuda.empty_cache()? in which file? I met the same problems

yuchenrao-bg · 2020-08-31T13:12:16Z

I also have the same problem. I noticed that when N > 200 (maybe smaller than 200) will show this error. I didn't change the calculation to cpu. I just use torch.cuda.empty_cache() fro each batch, which seems okay for my situation.

hi @yuchenrao-bg Could you please tell me where you add torch.cuda.empty_cache()? in which file? I met the same problems

Sorry for late reply. I don't remember it clearly but I think you can add it in the training code.

fmassa added question Further information is requested awaiting response labels Nov 6, 2018

zimenglan-sysu-512 closed this as completed Nov 6, 2018

zimenglan-sysu-512 reopened this Nov 13, 2018

fmassa closed this as completed Nov 19, 2018

zimenglan-sysu-512 mentioned this issue Apr 2, 2019

Memory consumption buildup #606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: out of memory #120

RuntimeError: CUDA error: out of memory #120

zimenglan-sysu-512 commented Nov 6, 2018

fmassa commented Nov 6, 2018

zimenglan-sysu-512 commented Nov 6, 2018

zimenglan-sysu-512 commented Nov 6, 2018

zimenglan-sysu-512 commented Nov 13, 2018 •

edited

Loading

fmassa commented Nov 13, 2018

zimenglan-sysu-512 commented Nov 14, 2018

fmassa commented Nov 14, 2018

zimenglan-sysu-512 commented Nov 14, 2018

zimenglan-sysu-512 commented Nov 14, 2018 •

edited

Loading

fmassa commented Nov 14, 2018

zimenglan-sysu-512 commented Nov 15, 2018

zimenglan-sysu-512 commented Nov 15, 2018

fmassa commented Nov 15, 2018

zimenglan-sysu-512 commented Nov 16, 2018

zimenglan-sysu-512 commented Nov 16, 2018 •

edited

Loading

fmassa commented Nov 16, 2018

zimenglan-sysu-512 commented Nov 16, 2018

zimenglan-sysu-512 commented Nov 19, 2018 •

edited

Loading

fmassa commented Nov 19, 2018

yuchenrao-bg commented Aug 8, 2019

hetolin commented Jul 13, 2020

hetolin commented Jul 13, 2020

yuchenrao-bg commented Aug 31, 2020

RuntimeError: CUDA error: out of memory #120

RuntimeError: CUDA error: out of memory #120

Comments

zimenglan-sysu-512 commented Nov 6, 2018

❓ Questions and Help

fmassa commented Nov 6, 2018

zimenglan-sysu-512 commented Nov 6, 2018

zimenglan-sysu-512 commented Nov 6, 2018

zimenglan-sysu-512 commented Nov 13, 2018 • edited Loading

fmassa commented Nov 13, 2018

zimenglan-sysu-512 commented Nov 14, 2018

fmassa commented Nov 14, 2018

zimenglan-sysu-512 commented Nov 14, 2018

zimenglan-sysu-512 commented Nov 14, 2018 • edited Loading

fmassa commented Nov 14, 2018

zimenglan-sysu-512 commented Nov 15, 2018

zimenglan-sysu-512 commented Nov 15, 2018

fmassa commented Nov 15, 2018

zimenglan-sysu-512 commented Nov 16, 2018

zimenglan-sysu-512 commented Nov 16, 2018 • edited Loading

fmassa commented Nov 16, 2018

zimenglan-sysu-512 commented Nov 16, 2018

zimenglan-sysu-512 commented Nov 19, 2018 • edited Loading

fmassa commented Nov 19, 2018

yuchenrao-bg commented Aug 8, 2019

hetolin commented Jul 13, 2020

hetolin commented Jul 13, 2020

yuchenrao-bg commented Aug 31, 2020

zimenglan-sysu-512 commented Nov 13, 2018 •

edited

Loading

zimenglan-sysu-512 commented Nov 14, 2018 •

edited

Loading

zimenglan-sysu-512 commented Nov 16, 2018 •

edited

Loading

zimenglan-sysu-512 commented Nov 19, 2018 •

edited

Loading