Skip to content

GT-Wei/OVA-DETR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Guoting Wei1,4,*, Xia Yuan1,*, Yu Liu3,📧, Zhenhao Shang2, Kelu Yao3, Chao Li3 Qingsen Yan2,
Chunxia Zhao1, Haokui Zhang2,4,📧, Rong Xiao4

* Equal contribution 📧 Corresponding author

1 Nanjing University of Science and Technology, 3 Zhejiang Lab
2 Northwestern Polytechnical University, 4 Intellifusion

Partial results

description

Figure 1: Compared OVA-DETR with recently advanced open-vocabulary detectors in terms of speed and recall. All methods are evaluated on DIOR dataset under zero shot detection. The inference speeds were measured on a 3090 GPU by default, except that DescReg was measured on a 4090 GPU


description

Figure 2: Overall architecture of OVA-DETR.The improvements of OVA-DETR can be summarized into two main components: the Image-Text Alignment and the Bidirectional Vision-Language Fusion.


description

Figure 5:Qualitative results for zero-shot detection on the xView,DIOR,and DOTA datasets, focusing on novel classes.The green rectangles represent predicted bounding boxes, while red rectangles denote ground truth bounding boxes.


description

description


description


description


description


description


description

Installation

  1. Clone the OVA-DETR repository.
git clone https://github.com/GT-Wei/OVA-DETR.git
  1. Clone the mmdetection repository (include RT-DETR cfw)
git clone https://github.com/flytocc/mmdetection.git
cp -r OVA-DETR/* ./mmdetection/
  1. OVA-DETR is developed based on torch==1.11.0+cu11.3 and mmdetection==3.3.0
conda create -n OVA-DETR python==3.8 -y
conda activate OVA-DETR

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

pip install -U openmim
mim install mmengine
mim install "mmcv==2.0.0"
pip install transformers open_clip_torch
pip install git+https://github.com/openai/CLIP.git

cd mmdetection 
pip install -v -e .

mkdir pretrain_model
wget https://github.com/flytocc/mmdetection/releases/download/model_zoo/rtdetr_r50vd_8xb2-72e_coco_ff87da1a.pth
wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_30.pth
wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_45.pth
  1. Training
eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py 4
  1. Evaluation
eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py ./pretrain_model/epoch30.pt 4

Acknowledgement

We are grateful to the contributors for their crucial integration of RT-DETR into the mmdetection framework. We implemented OVA-DETR based on their shared resources available at mmdetection.

@article{wei2024ova,
  title={OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion},
  author={Wei, Guoting and Yuan, Xia and Liu, Yu and Shang, Zhenhao and Yao, Kelu and Li, Chao and Yan, Qingsen and Zhao, Chunxia and Zhang, Haokui and Xiao, Rong},
  journal={arXiv preprint arXiv:2408.12246},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages