GitHub - GT-Wei/OVA-DETR

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

Guoting Wei^1,4,*, Xia Yuan^1,*, Yu Liu^3,📧, Zhenhao Shang², Kelu Yao³, Chao Li³ Qingsen Yan²,
Chunxia Zhao¹, Haokui Zhang^2,4,📧, Rong Xiao⁴

* Equal contribution 📧 Corresponding author

¹ Nanjing University of Science and Technology, ³ Zhejiang Lab
² Northwestern Polytechnical University, ⁴ Intellifusion

Partial results

Figure 1: Compared OVA-DETR with recently advanced open-vocabulary detectors in terms of speed and recall. All methods are evaluated on DIOR dataset under zero shot detection. The inference speeds were measured on a 3090 GPU by default, except that DescReg was measured on a 4090 GPU

Figure 2: Overall architecture of OVA-DETR.The improvements of OVA-DETR can be summarized into two main components: the Image-Text Alignment and the Bidirectional Vision-Language Fusion.

Figure 5:Qualitative results for zero-shot detection on the xView,DIOR,and DOTA datasets, focusing on novel classes.The green rectangles represent predicted bounding boxes, while red rectangles denote ground truth bounding boxes.

Installation

Clone the OVA-DETR repository.

git clone https://github.com/GT-Wei/OVA-DETR.git

Clone the mmdetection repository (include RT-DETR cfw)

git clone https://github.com/flytocc/mmdetection.git
cp -r OVA-DETR/* ./mmdetection/

OVA-DETR is developed based on torch==1.11.0+cu11.3 and mmdetection==3.3.0

conda create -n OVA-DETR python==3.8 -y
conda activate OVA-DETR

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

pip install -U openmim
mim install mmengine
mim install "mmcv==2.0.0"
pip install transformers open_clip_torch
pip install git+https://github.com/openai/CLIP.git

cd mmdetection 
pip install -v -e .

mkdir pretrain_model
wget https://github.com/flytocc/mmdetection/releases/download/model_zoo/rtdetr_r50vd_8xb2-72e_coco_ff87da1a.pth
wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_30.pth
wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_45.pth

Training

eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py 4

Evaluation

eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py ./pretrain_model/epoch30.pt 4

Acknowledgement

We are grateful to the contributors for their crucial integration of RT-DETR into the mmdetection framework. We implemented OVA-DETR based on their shared resources available at mmdetection.

@article{wei2024ova,
  title={OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion},
  author={Wei, Guoting and Yuan, Xia and Liu, Yu and Shang, Zhenhao and Yao, Kelu and Li, Chao and Yan, Qingsen and Zhao, Chunxia and Zhang, Haokui and Xiao, Rong},
  journal={arXiv preprint arXiv:2408.12246},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
class_text/texts		class_text/texts
configs/OVA_DETR		configs/OVA_DETR
images		images
mmdet		mmdet
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

Partial results

Installation

Acknowledgement

About

Releases 1

Packages

Contributors 3

Languages

GT-Wei/OVA-DETR

Folders and files

Latest commit

History

Repository files navigation

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

Partial results

Installation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages