-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: Add DINO on MMDetection 2.x #8362
Conversation
@czczup I haven't fully trained yet, but I have tested 1000 iters training (2 GPU dist_train). I didn't met it. Could you please provide more information? Thanks for your attention. |
@czczup Hi, the newest config file contains other incompatibility problems. norm_cfg=dict(type='FrozenBatchNorm2d'), # should use BN instead
conv_cfg=dict(type='Conv2d', bias=True), # should be deleted I made modification in my local workstation, which I haven't committed. |
@czczup I'm working on experiments for aligning training results, there is still a few modification which i have not committed. You can comment the FrozenBatchNorm2d, use BN instead. I modified the bn file of mmcv to registry the FrozenBatchNorm2d, but i prepare to delete it at last. |
@czczup The label_embedding is used in DINOHead.forward_train() and DnQueryGenerator.forward(). I haven't found the bug yet, and maybe a few days later it'll be found. |
@Li-Qingyun |
@czczup Hi, the original repo has a sh file to launch training, in which the box_noise_scale is set 1. The description is at the last paragraph of the D.3 (Supplementary Materials) of the paper.
The learning rate was indeed misaligned, thank you very much! |
I met a new problem, it happens in the middle of training, such as 2000 iterations. I'm trying to fix it.
|
@czczup Thanks! Maybe it's the special case when the current sample has no target, I'll have a check. |
@czczup Hi, I'm waiting for the queue of lab's slurm cluster, so can't experiment in time. Could you have a test of setting data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(filter_empty_gt=True, pipeline=train_pipeline), # Modify here
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline)) |
@zhaoguoqing12 你好,这个问题我们已经在 390b25a 基本解决 请问你是运行最新版本再次出现该问题吗。你可以检查你本地仓库的版本,目前最新版本为 d380deb,如果你本地不是最新版本,可以通过 git pull https://github.com/Li-Qingyun/mmdetection.git add-dino 更新本地仓库。如果是最新版本出现的问题,请麻烦提供一下环境版本,本地修改,运行命令 等信息 |
我确实不是最新版本,但是我并不是在训练dino出现的这个问题。 |
@zhaoguoqing12 如果是你自己的仓库出现相同的问题,你需要保证 loss_dict.keys() 是固定不变的。例如:在目前版本的DINO中loss_dict应当有39项目,我的检查方法为,在 loss_dict 被 reduce 之前: if len(loss_dict) != 39:
import ipdb; ipdb.set_trace() 通过移动帧发现 loss_dict.keys() 中缺少 dn 相关的 loss,所以在获取dn loss的函数中增加 无目标情况的分支,来保证所有的情况下,loss_dict 都稳定地由 39 项构成。目前这里的操作能暂时解决该问题,后续review阶段可能会用更好的方式替代。(例如,在没有目标的情况下,loss_giou和loss_bbox就自然为0,这种情况下补充的 loss_dn为0,但没有计算图) 我发现 DINO 源码中其实也有防止类似报错的操作,但 loss 部分基本是完全重构的,所以当时并没有注意到~ 所以提供给你的思路是,去检查一下你报错信息中的这个 loss_dict 的获取。可以用上面的方式定位到报错情况,然后对特殊情况进行补位处理。 |
@Li-Qingyun 好的 感谢 我会尝试的 |
We start to refactor modules of DETR-like models to enhance the usability and readability of our codebase. For not affecting the progress of experiments in the current version and the works of those who follow our PR. I create a new PR of refactor: I'll merge the refactor PR to this PR when it's finished. The followers can continue to conduct experiments based on our current PR, and the relevant bugs will continue to be fixed too. Besides, the experimental results will continue to be released. Thank you for your attention! |
We decide to complete the development of DINO on MMDetection 2.x in this pr. The code remains using the old style (the code has been refactored in MMDetection 3.x for a new style). The users are recommended to use #9149 in MMDetection 3.x. This support is mainly for the users who may have to use old versions. TO-DO List
|
Motivation
Implement the results in the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection" with mmdetection. The original code has released.
Remark
This is my first time to submit a PR to reproduces an algorithm, and I'm still lack of experience. I hope that developers who pay attention to this PR can comment more problems in my implementation and share some results. I will also follow up and share some of my own experimental results and progress arrangements.