i got this error when training cbnetv2, but It's normal when I train other models #57

wzr0108 · 2022-06-21T15:11:42Z

Traceback (most recent call last):
File "./tools/train.py", line 234, in
main()
File "./tools/train.py", line 221, in main
meta=meta)
File "/disk/sde/wzr/mmm/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 59, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/disk/sde/wzr/mmm/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 140, in new_func
output = old_func(*new_args, **new_kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/detectors/two_stage.py", line 142, in forward_train
**kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/dense_heads/base_dense_head.py", line 330, in forward_train
outs = self(x)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/dense_heads/anchor_head.py", line 169, in forward
return multi_apply(self.forward_single, feats)
File "/disk/sde/wzr/mmm/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/disk/sde/wzr/mmm/mmdet/models/dense_heads/rpn_head.py", line 64, in forward_single
x = self.rpn_conv(x)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple

wzr0108 · 2022-06-21T15:14:09Z

the config is as follows

model = dict(
    type='CascadeRCNN',
    init_cfg=dict(
        type='Pretrained',
        checkpoint="checkpoints/htc_cbv2_swin_base22k_patch4_window7_mstrain_400-1400_adamw_20e_coco_swa.pth"
    ),
    backbone=dict(
        type='CBSwinTransformer',
        embed_dim=128,
        depths=[2, 2, 18, 2],
        num_heads=[4, 8, 16, 32],
        window_size=7,
        mlp_ratio=4,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.3,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
    ),
    neck=dict(
        type='CBFPN',
        in_channels=[128, 256, 512, 1024],
        out_channels=256,
        num_outs=5,
    ),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],  # 8
            ratios=[0.25, 0.5, 1.0, 2.0, 4.0],  # 增加到7个
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)
    ),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,  # 3->4
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            gc_context=True,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=8,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),

                loss_cls=dict(type='EQLv2'),
                loss_bbox=dict(type='CIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=8,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),

                loss_cls=dict(type='EQLv2'),
                loss_bbox=dict(type='CIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=8,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),

                loss_cls=dict(type='EQLv2'),
                loss_bbox=dict(type='CIoULoss', loss_weight=10.0))
        ]),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,  # 0.7->0.5
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_across_levels=False,
            nms_pre=2000,
            nms_post=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,  # 0.5->0.3
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    # type='OHEMSampler',
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,  # 0.6->0.4
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.8,  # 0.7->0.5
                    neg_iou_thr=0.8,
                    min_pos_iou=0.8,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_across_levels=False,
            nms_pre=5000,
            nms_post=5000,
            max_per_img=5000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.1,  # 0.0001
            nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.0001),
            # nms=dict(type='nms', iou_threshold=0.99),
            max_per_img=300)))

wszhengjx · 2022-07-06T16:04:24Z

Hello,have you solved this problem？i meet this problem too.

wzr0108 · 2022-07-06T16:08:48Z

Hello,have you solved this problem？i meet this problem too.

直接替换mmdet的models文件夹，应该是这里的文件和我安装的mmdet有点不同，直接替换环境中的mmdet/models就好了

wszhengjx · 2022-07-06T16:31:06Z

Hello,have you solved this problem？i meet this problem too.

直接替换mmdet的models文件夹，应该是这里的文件和我安装的mmdet有点不同，直接替换环境中的mmdet/models就好了

太感谢了老哥，困扰了许久终于解决了！

lulu867 · 2022-11-12T16:27:35Z

请问您使用的pytorch和cuda 的版本是什么呢

wzr0108 · 2022-11-12T16:30:51Z

torch 1.9.1 cuda 11.1

lulu867 · 2022-11-14T07:40:00Z

可以再问一下您使用的mmcv-full是什么版本不

wzr0108 · 2022-11-14T07:43:09Z

1.6.1

lulu867 · 2022-11-14T08:16:33Z

谢谢！！我刚才用了跟您一样的环境跑了一遍遇到了loss为NaN的情况请问您有遇到过吗

wzr0108 · 2022-11-14T08:18:23Z

调一下学习率，或者debug看一下哪里nan

lulu867 · 2022-11-14T08:19:02Z

好的好的谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i got this error when training cbnetv2, but It's normal when I train other models #57

i got this error when training cbnetv2, but It's normal when I train other models #57

wzr0108 commented Jun 21, 2022

wzr0108 commented Jun 21, 2022

wszhengjx commented Jul 6, 2022

wzr0108 commented Jul 6, 2022

wszhengjx commented Jul 6, 2022

lulu867 commented Nov 12, 2022

wzr0108 commented Nov 12, 2022

lulu867 commented Nov 14, 2022

wzr0108 commented Nov 14, 2022

lulu867 commented Nov 14, 2022

wzr0108 commented Nov 14, 2022 •

edited

Loading

lulu867 commented Nov 14, 2022

i got this error when training cbnetv2, but It's normal when I train other models #57

i got this error when training cbnetv2, but It's normal when I train other models #57

Comments

wzr0108 commented Jun 21, 2022

wzr0108 commented Jun 21, 2022

wszhengjx commented Jul 6, 2022

wzr0108 commented Jul 6, 2022

wszhengjx commented Jul 6, 2022

lulu867 commented Nov 12, 2022

wzr0108 commented Nov 12, 2022

lulu867 commented Nov 14, 2022

wzr0108 commented Nov 14, 2022

lulu867 commented Nov 14, 2022

wzr0108 commented Nov 14, 2022 • edited Loading

lulu867 commented Nov 14, 2022

wzr0108 commented Nov 14, 2022 •

edited

Loading