This is the official implementation of the paper "Language-aware Multiple Datasets Detection Pretraining for DETRs".
Authors: Jing Hao*, Song Chen*
We use the environment same to DINO to run METR. If you have run DN-DETR or DAB-DETR or DINO, you can skip this step.
We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1
. Other versions might be available as well.
- Clone this repo
git clone https://github.com/isbrycee/METR.git
cd METR
- Install Pytorch and torchvision
Follow the instruction on https://pytorch.org/get-started/locally/.
# an example:
conda install -c pytorch pytorch torchvision
- Install other needed packages
pip install -r requirements.txt
- Compiling CUDA operators
cd models/metr/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
Please download COCO 2017 dataset and organize them as following:
COCODIR/
├── train2017/
├── val2017/
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
We use METR 4-scale model trained for 12 epochs as default experiment setting.
bash scripts_METR/METR_train_dist_4scale_r50_coco.sh
bash scripts_METR/METR_eval_dist_4scale_r50_coco.sh
Notes:
- You should change the dataset path on scripts before running.
- This code implementation also supports for ViT backbone.
Our model is based on DINO.
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@article{hao2024language,
title={Language-aware multiple datasets detection pretraining for detrs},
author={Hao, Jing and Chen, Song},
journal={Neural Networks},
pages={106506},
year={2024},
publisher={Elsevier}
}
@misc{zhang2022dino,
title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},
year={2022},
eprint={2203.03605},
archivePrefix={arXiv},
primaryClass={cs.CV}
}