Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i got this error when training cbnetv2, but It's normal when I train other models #57

Open
wzr0108 opened this issue Jun 21, 2022 · 11 comments

Comments

@wzr0108
Copy link

wzr0108 commented Jun 21, 2022

Traceback (most recent call last):
File "./tools/train.py", line 234, in
main()
File "./tools/train.py", line 221, in main
meta=meta)
File "/disk/sde/wzr/mmm/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 59, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/disk/sde/wzr/mmm/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 140, in new_func
output = old_func(*new_args, **new_kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/detectors/two_stage.py", line 142, in forward_train
**kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/dense_heads/base_dense_head.py", line 330, in forward_train
outs = self(x)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk/sde/wzr/mmm/mmdet/models/dense_heads/anchor_head.py", line 169, in forward
return multi_apply(self.forward_single, feats)
File "/disk/sde/wzr/mmm/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/disk/sde/wzr/mmm/mmdet/models/dense_heads/rpn_head.py", line 64, in forward_single
x = self.rpn_conv(x)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/disk/sdb/wzr/.conda/envs/wzr_env2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple

@wzr0108
Copy link
Author

wzr0108 commented Jun 21, 2022

the config is as follows

model = dict(
    type='CascadeRCNN',
    init_cfg=dict(
        type='Pretrained',
        checkpoint="checkpoints/htc_cbv2_swin_base22k_patch4_window7_mstrain_400-1400_adamw_20e_coco_swa.pth"
    ),
    backbone=dict(
        type='CBSwinTransformer',
        embed_dim=128,
        depths=[2, 2, 18, 2],
        num_heads=[4, 8, 16, 32],
        window_size=7,
        mlp_ratio=4,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.3,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
    ),
    neck=dict(
        type='CBFPN',
        in_channels=[128, 256, 512, 1024],
        out_channels=256,
        num_outs=5,
    ),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],  # 8
            ratios=[0.25, 0.5, 1.0, 2.0, 4.0],  # 增加到7个
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)
    ),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,  # 3->4
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            gc_context=True,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=8,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),

                loss_cls=dict(type='EQLv2'),
                loss_bbox=dict(type='CIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=8,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),

                loss_cls=dict(type='EQLv2'),
                loss_bbox=dict(type='CIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=8,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),

                loss_cls=dict(type='EQLv2'),
                loss_bbox=dict(type='CIoULoss', loss_weight=10.0))
        ]),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,  # 0.7->0.5
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_across_levels=False,
            nms_pre=2000,
            nms_post=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,  # 0.5->0.3
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    # type='OHEMSampler',
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,  # 0.6->0.4
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.8,  # 0.7->0.5
                    neg_iou_thr=0.8,
                    min_pos_iou=0.8,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_across_levels=False,
            nms_pre=5000,
            nms_post=5000,
            max_per_img=5000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.1,  # 0.0001
            nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.0001),
            # nms=dict(type='nms', iou_threshold=0.99),
            max_per_img=300)))

@wszhengjx
Copy link

Hello,have you solved this problem?i meet this problem too.

@wzr0108
Copy link
Author

wzr0108 commented Jul 6, 2022

Hello,have you solved this problem?i meet this problem too.

直接替换mmdet的models文件夹,应该是这里的文件和我安装的mmdet有点不同,直接替换环境中的mmdet/models就好了

@wszhengjx
Copy link

Hello,have you solved this problem?i meet this problem too.

直接替换mmdet的models文件夹,应该是这里的文件和我安装的mmdet有点不同,直接替换环境中的mmdet/models就好了

太感谢了老哥,困扰了许久终于解决了!

@lulu867
Copy link

lulu867 commented Nov 12, 2022

请问您使用的pytorch和cuda 的版本是什么呢

@wzr0108
Copy link
Author

wzr0108 commented Nov 12, 2022

torch 1.9.1 cuda 11.1

@lulu867
Copy link

lulu867 commented Nov 14, 2022

可以再问一下您使用的mmcv-full是什么版本不

@wzr0108
Copy link
Author

wzr0108 commented Nov 14, 2022

1.6.1

@lulu867
Copy link

lulu867 commented Nov 14, 2022

谢谢!!我刚才用了跟您一样的环境跑了一遍 遇到了loss为NaN的情况 请问您有遇到过吗

@wzr0108
Copy link
Author

wzr0108 commented Nov 14, 2022

调一下学习率,或者debug看一下哪里nan

@lulu867
Copy link

lulu867 commented Nov 14, 2022

好的好的 谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants