Guoting Wei1,4,*,
Xia Yuan1,*,
Yu Liu3,📧,
Zhenhao Shang2,
Kelu Yao3,
Chao Li3
Qingsen Yan2,
Chunxia Zhao1,
Haokui Zhang2,4,📧,
Rong Xiao4
* Equal contribution 📧 Corresponding author
1 Nanjing University of Science and Technology, 3 Zhejiang Lab
2 Northwestern Polytechnical University, 4 Intellifusion
Figure 1: Compared OVA-DETR with recently advanced open-vocabulary detectors in terms of speed and recall. All methods are evaluated on DIOR dataset under zero shot detection. The inference speeds were measured on a 3090 GPU by default, except that DescReg was measured on a 4090 GPU
Figure 2: Overall architecture of OVA-DETR.The improvements of OVA-DETR can be summarized into two main components: the Image-Text Alignment and the Bidirectional Vision-Language Fusion.
Figure 5:Qualitative results for zero-shot detection on the xView,DIOR,and DOTA datasets, focusing on novel classes.The green rectangles represent predicted bounding boxes, while red rectangles denote ground truth bounding boxes.
- Clone the OVA-DETR repository.
git clone https://github.com/GT-Wei/OVA-DETR.git
- Clone the mmdetection repository (include RT-DETR cfw)
git clone https://github.com/flytocc/mmdetection.git
cp -r OVA-DETR/* ./mmdetection/
- OVA-DETR is developed based on
torch==1.11.0+cu11.3
andmmdetection==3.3.0
conda create -n OVA-DETR python==3.8 -y
conda activate OVA-DETR
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -U openmim
mim install mmengine
mim install "mmcv==2.0.0"
pip install transformers open_clip_torch
pip install git+https://github.com/openai/CLIP.git
cd mmdetection
pip install -v -e .
mkdir pretrain_model
wget https://github.com/flytocc/mmdetection/releases/download/model_zoo/rtdetr_r50vd_8xb2-72e_coco_ff87da1a.pth
wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_30.pth
wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_45.pth
- Training
eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py 4
- Evaluation
eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py ./pretrain_model/epoch30.pt 4
We are grateful to the contributors for their crucial integration of RT-DETR into the mmdetection framework. We implemented OVA-DETR based on their shared resources available at mmdetection.
@article{wei2024ova,
title={OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion},
author={Wei, Guoting and Yuan, Xia and Liu, Yu and Shang, Zhenhao and Yao, Kelu and Li, Chao and Yan, Qingsen and Zhao, Chunxia and Zhang, Haokui and Xiao, Rong},
journal={arXiv preprint arXiv:2408.12246},
year={2024}
}