[WIP]: Add DINO on MMDetection 2.x #8362

Li-Qingyun · 2022-07-13T14:42:04Z

Motivation

Implement the results in the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection" with mmdetection. The original code has released.

Remark

This is my first time to submit a PR to reproduces an algorithm, and I'm still lack of experience. I hope that developers who pay attention to this PR can comment more problems in my implementation and share some results. I will also follow up and share some of my own experimental results and progress arrangements.

CLAassistant · 2022-07-13T14:42:09Z

All committers have signed the CLA.

…riginal DETR

czczup · 2022-07-25T06:12:42Z

Hi, thanks very much for your code.
May I ask if you have met this problem?

Li-Qingyun · 2022-07-25T06:21:24Z

Hi, thanks very much for your code. May I ask if you have met this problem?

When did this happen ? at the beginning or after a few epochs of training ?

Li-Qingyun · 2022-07-25T06:28:20Z

@czczup I haven't fully trained yet, but I have tested 1000 iters training (2 GPU dist_train). I didn't met it. Could you please provide more information? Thanks for your attention.

czczup · 2022-07-25T06:32:46Z

Hi, thanks very much for your code. May I ask if you have met this problem?

When did this happen ? at the beginning or after a few epochs of training ?

Thanks for your reply. I use a single GPU to train (debug) with the config you provided. It happened at the beginning of training (50 iters).
As you can see in this log:

It seems that bbox_head.label_embedding.weight did not receive a gradient during some iterations.
I could use find_unused_parameters=True to avoid this problem, but it would affect training efficiency.

Li-Qingyun · 2022-07-25T06:42:38Z

@czczup Hi, the newest config file contains other incompatibility problems.

norm_cfg=dict(type='FrozenBatchNorm2d'), # should use BN instead
conv_cfg=dict(type='Conv2d', bias=True), # should be deleted

I made modification in my local workstation, which I haven't committed.
Did you meet these problems.

Li-Qingyun · 2022-07-25T06:50:22Z

@czczup I'm working on experiments for aligning training results, there is still a few modification which i have not committed.

You can comment the FrozenBatchNorm2d, use BN instead. I modified the bn file of mmcv to registry the FrozenBatchNorm2d, but i prepare to delete it at last.
In the original DINO, the convs of neck has bias, which is not recommended in mmdetection. To align inference accuracy, i had to modify the conv building func in mmcv. You can comment this line.

Li-Qingyun · 2022-07-25T07:02:33Z

@czczup The label_embedding is used in DINOHead.forward_train() and DnQueryGenerator.forward(). I haven't found the bug yet, and maybe a few days later it'll be found.
I'm pushing full speed ahead with the training. And more details will be committed. Thanks for your attention and discussion.

czczup · 2022-07-25T07:58:44Z

@Li-Qingyun
When the latest config is used, the label_embedding problem disappeared. Thanks for helping.
I checked with the paper and the source code published by the authors, and it seems that most of the settings are aligned, except for the learning_rate (1e-4 in paper but 2e-4 in this config) and noise_scale (0.5&0.4 in paper but 0.5&1.0 in this config).
I'm not sure if there's any detail I missed. I plan to train DINO-R50 for 12 epochs with 8 GPUs today.

Li-Qingyun · 2022-07-25T08:11:26Z

@czczup Hi, the original repo has a sh file to launch training, in which the box_noise_scale is set 1.

The description is at the last paragraph of the D.3 (Supplementary Materials) of the paper.

Multi-scale setting. For our 4-scale models, we extract features from stages 2, 3, and 4 of the backbone and add an extra feature by down-sampling the output of the stage 4. An additional feature map of the backbone stage 1 is used for our
5-scale models. For hyper-parameters, we set λ1 = 1.0 and λ2 = 2.0 and use 100 CDN pairs which contain 100 positive queries and 100 negative queries.

The learning rate was indeed misaligned, thank you very much!

czczup · 2022-07-25T12:35:12Z

I met a new problem, it happens in the middle of training, such as 2000 iterations. I'm trying to fix it.
I notice that it happens when dn_cls_scores is None, so that keys in loss_dict are different across GPUs.

AssertionError: loss log variables are different across GPUs! 
rank 2 len(log_vars): 21 keys: interm_loss_cls,interm_loss_bbox,interm_loss_iou,loss_cls,loss_bbox,loss_iou,d0.loss_cls,d0.loss_bbox,d0.loss_iou,d1.loss_cls,d1.loss_bbox,d1.loss_iou,d2.loss_cls,d2.loss_bbox,d2.loss_iou,d3.loss_cls,d3.loss_bbox,d3.loss_iou,d4.loss_cls,d4.loss_bbox,d4.loss_iou

Li-Qingyun · 2022-07-25T13:32:17Z

@czczup Thanks! Maybe it's the special case when the current sample has no target, I'll have a check.

Li-Qingyun · 2022-07-26T01:25:55Z

@czczup Hi, I'm waiting for the queue of lab's slurm cluster, so can't experiment in time. Could you have a test of setting filter_empty_gt=True?

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(filter_empty_gt=True, pipeline=train_pipeline),  # Modify here
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))

Li-Qingyun · 2022-08-09T02:51:26Z

@zhaoguoqing12 你好，这个问题我们已经在 390b25a 基本解决

请问你是运行最新版本再次出现该问题吗。你可以检查你本地仓库的版本，目前最新版本为 d380deb，如果你本地不是最新版本，可以通过 git pull https://github.com/Li-Qingyun/mmdetection.git add-dino 更新本地仓库。如果是最新版本出现的问题，请麻烦提供一下环境版本，本地修改，运行命令等信息

zhaoguoqing12 · 2022-08-09T03:01:33Z

@zhaoguoqing12你好，这个问题我们已经在390b25a基本解决方案

你可以查看你本地仓库的最新版本，当前最新d380deb，你本地不是最新版本，git pull https://github.com/Li-Qingyun/mmdetection.git add-dino 可以更新本地仓库。是如果出现的更新版本修改问题，请解决一下最新的环境修改，本地提供的信息，运行命令等

我确实不是最新版本，但是我并不是在训练dino出现的这个问题。

Li-Qingyun · 2022-08-09T03:07:04Z

@zhaoguoqing12 如果是你自己的仓库出现相同的问题，你需要保证 loss_dict.keys() 是固定不变的。例如：在目前版本的DINO中loss_dict应当有39项目，我的检查方法为，在 loss_dict 被 reduce 之前:

if len(loss_dict) != 39:
    import ipdb; ipdb.set_trace()

通过移动帧发现 loss_dict.keys() 中缺少 dn 相关的 loss，所以在获取dn loss的函数中增加无目标情况的分支，来保证所有的情况下，loss_dict 都稳定地由 39 项构成。目前这里的操作能暂时解决该问题，后续review阶段可能会用更好的方式替代。（例如，在没有目标的情况下，loss_giou和loss_bbox就自然为0，这种情况下补充的 loss_dn为0，但没有计算图）

我发现 DINO 源码中其实也有防止类似报错的操作，但 loss 部分基本是完全重构的，所以当时并没有注意到~

所以提供给你的思路是，去检查一下你报错信息中的这个 loss_dict 的获取。可以用上面的方式定位到报错情况，然后对特殊情况进行补位处理。

zhaoguoqing12 · 2022-08-09T03:10:51Z

@Li-Qingyun 好的感谢我会尝试的

…rm2d

mmdet/models/dense_heads/detr_head.py

Li-Qingyun · 2022-08-25T08:36:57Z

We start to refactor modules of DETR-like models to enhance the usability and readability of our codebase. For not affecting the progress of experiments in the current version and the works of those who follow our PR. I create a new PR of refactor:

#9149

I'll merge the refactor PR to this PR when it's finished. The followers can continue to conduct experiments based on our current PR, and the relevant bugs will continue to be fixed too. Besides, the experimental results will continue to be released.

Thank you for your attention!

Li-Qingyun · 2023-02-19T02:24:38Z

feat: add detector module of DINO

e3c5c33

mm-assistant bot assigned jbwang1997 Jul 13, 2022

Li-Qingyun added 13 commits July 14, 2022 10:10

feat: add query generator modules for DeNoising component of DN and DINO

582f159

fix: fix building and inverse_sigmoid bug of DnQueryGenerator

78f8345

feat: add head module of DINO

4145070

feat: add transformer modules of DINO

71fc69f

fix: fix bugs of group_queries==0

6fbfdea

feat: add config file of DINO_4scales_r50

6964cba

feat: add metafile and readme of DINO

c159060

fix: align inference results of DINO_4scale

1d7a44e

update config files of 12 epoch settings

0c034f3

commented out the restrictions on the coef of loss and assigners in o…

4f4cd3e

…riginal DETR

fix: fix bugs of DINOHead when aligning loss

28306b5

fix: fix box misalignment problem of CdnQueryGenerator

39b9418

fix: fix bugs of not using attn_mask in DinoTransformer

5b22fc8

jshilong self-requested a review July 25, 2022 06:42

fix: fix the wrong lr in config file

b5baa95

Li-Qingyun added 4 commits August 10, 2022 08:53

update: modify configs, delete configs of neck conv and FrozenBatchNo…

b3bb245

…rm2d

fix: fix config of DINO_5scale

ba45b65

feat: add and update config files

28b6195

feat: add and update config files

fb359c1

jshilong reviewed Aug 17, 2022

View reviewed changes

mmdet/models/dense_heads/detr_head.py Show resolved Hide resolved

tuofeilunhifi mentioned this pull request Aug 19, 2022

support DINO algo alibaba/EasyCV#144

Merged

Li-Qingyun added 3 commits August 23, 2022 15:32

fix: update configs to fix errors and be more concise

7902f1a

update config files

1cef517

fix: reorganize init args of DetrTransformerDecoderLayer

7ddecb8

ZwwWayne force-pushed the dev branch from bc29584 to 00efd24 Compare September 9, 2022 11:51

add note of query_embed

3696cfa

Li-Qingyun mentioned this pull request Sep 16, 2022

[WIP]: Add DINO on 3.x #8820

Closed

jshilong force-pushed the add-dino branch from 91e6b7a to 3696cfa Compare October 5, 2022 12:30

This was referenced Oct 25, 2022

[ERROR PR]: Add DINO on MMDetection 3.x #9135

Closed

ERROR PR #9137

Closed

[Feat] Add DINO on MMDetection 3.x #9149

Merged

KeiChiTse mentioned this pull request Nov 6, 2022

Refactor DAB-DETR in MMDetection 3.x #9252

Merged

KeiChiTse mentioned this pull request Dec 7, 2022

[Refactor] Change to BNC data flow for all DETRs #9460

Merged

ZwwWayne added this to the 2.29.0 milestone Jan 29, 2023

Li-Qingyun and others added 2 commits February 19, 2023 10:02

delete registry mechanism of CdnQueryGenerator

02f662b

Merge branch 'dev' into add-dino

fae4d7a

Li-Qingyun changed the title ~~[WIP]: Add DINO~~ [WIP]: Add DINO on MMDetection 2.x Feb 19, 2023

Li-Qingyun closed this Oct 29, 2024

Li-Qingyun mentioned this pull request Nov 8, 2024

疑问 Li-Qingyun/RSCoTr#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: Add DINO on MMDetection 2.x #8362

[WIP]: Add DINO on MMDetection 2.x #8362

Li-Qingyun commented Jul 13, 2022 •

edited

Loading

CLAassistant commented Jul 13, 2022 •

edited

Loading

czczup commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022

czczup commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022 •

edited

Loading

Li-Qingyun commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022

czczup commented Jul 25, 2022 •

edited

Loading

Li-Qingyun commented Jul 25, 2022

czczup commented Jul 25, 2022 •

edited

Loading

Li-Qingyun commented Jul 25, 2022

Li-Qingyun commented Jul 26, 2022

Li-Qingyun commented Aug 9, 2022

zhaoguoqing12 commented Aug 9, 2022

Li-Qingyun commented Aug 9, 2022

zhaoguoqing12 commented Aug 9, 2022

Li-Qingyun commented Aug 25, 2022 •

edited

Loading

Li-Qingyun commented Feb 19, 2023 •

edited

Loading

[WIP]: Add DINO on MMDetection 2.x #8362

[WIP]: Add DINO on MMDetection 2.x #8362

Conversation

Li-Qingyun commented Jul 13, 2022 • edited Loading

Motivation

Remark

CLAassistant commented Jul 13, 2022 • edited Loading

czczup commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022

czczup commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022 • edited Loading

Li-Qingyun commented Jul 25, 2022

Li-Qingyun commented Jul 25, 2022

czczup commented Jul 25, 2022 • edited Loading

Li-Qingyun commented Jul 25, 2022

czczup commented Jul 25, 2022 • edited Loading

Li-Qingyun commented Jul 25, 2022

Li-Qingyun commented Jul 26, 2022

Li-Qingyun commented Aug 9, 2022

zhaoguoqing12 commented Aug 9, 2022

Li-Qingyun commented Aug 9, 2022

zhaoguoqing12 commented Aug 9, 2022

Li-Qingyun commented Aug 25, 2022 • edited Loading

Li-Qingyun commented Feb 19, 2023 • edited Loading

TO-DO List

Li-Qingyun commented Jul 13, 2022 •

edited

Loading

CLAassistant commented Jul 13, 2022 •

edited

Loading

Li-Qingyun commented Jul 25, 2022 •

edited

Loading

czczup commented Jul 25, 2022 •

edited

Loading

czczup commented Jul 25, 2022 •

edited

Loading

Li-Qingyun commented Aug 25, 2022 •

edited

Loading

Li-Qingyun commented Feb 19, 2023 •

edited

Loading