This repository provides the test code of our paper Global to Local: Clip-LSTM based Object Detection from Remote Sensing Images.
Object detection from Remote Sensing Images(RSIs) is a basic topic in the area of aviation and satellite image processing, which has great effects on geological disaster detection, agricultural planning, and land utilization. However,it is always faced with several severe difficulties. For instance,the scale of the target spans over a very wide range, and the difference between the target size and the image size is huge, as some targets only account for a dozen pixels compared with the remote sensing image of the megapixel level. In this work, an innovative object detection network (GLNet) integrating clip-LSTM is proposed for remote sensing imagery. Our approach integrates global context clues extracted by multi-scale perception subnetwork (MSPNet) and local spatial contextual correlations encoded by the clip-LSTM. These rich semantic features are further exploited to design a self-adapted anchor subnetwork (SANet) to alleviate the scale variations in RSIs. Extensive experiments are executed on several public easilyaccessed benchmarks, including DOTA, NWPU VHR-10, and DIOR. Experimental results have demonstrated that our GLNet outperforms numerous latest methods.
Note: We added the three subnetworks proposed in this paper to Faster RCNN trained on oriented bounding boxes. The results show that the GLNet proposed in this paper can improve the accuracy of OBB object detection. Got trained model.
- Clone and enter this repository:
https://github.com/Zhu1Teng/GLNet.git
-
Download dataset Please download Nwpu vhr-10, Dior and unzip them in the
data
folder. -
Download model Please download all model and put them in the
model
folder.
Before the evaluation, we need to change the configuration information in config.py
, and change the configuration to the corresponding dataset we need.
The following is the configuration file content of Nwpu vhr-10
dataset_type = 'VOCDataset'
data_root = 'data/NWPU VHR-10 datasetVOC2007/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
imgs_per_gpu=1,
workers_per_gpu=2,
test=dict(
type=dataset_type,
ann_file=data_root + 'test.txt',
img_prefix=data_root + 'positive image set test/',
img_scale=(800, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_label=False,
test_mode=True))
Run evaluate
python voc_eval.py models/NW-model.pkl config.py
Our performance on the DOTA dataset
Methods | P | S | ST | BD | TC | BC | GTF | H | B | SV | LV | SBF | RA | SP | HC | MAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GLNet-HBB | 89.4 | 53.9 | 77.2 | 71.5 | 79.9 | 66.9 | 68.4 | 73.1 | 43.1 | 31.9 | 52.5 | 74.1 | 64.6 | 59.2 | 74.0 | 65.3 |
GLNet-OBB | 88.3 | 76.5 | 85.2 | 79.5 | 89.4 | 83.5 | 64.7 | 64.5 | 44.2 | 71.0 | 70.8 | 53.4 | 66.0 | 67.0 | 57.1 | 70.8 |
Our performance on the NWPU VHR-10 dataset
Dataset | airplane | storage tank | baseball diamond | tennis court | basketball court | ground track field | harbor | bridge | vehicle | shi | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|
NWPU VHR-10 | 100 | 84.4 | 98.5 | 81.6 | 88.2 | 100 | 97.2 | 88.4 | 90.9 | 88.7 | 91.8 |
Our performance on the DIOR dataset
Dataset | airplane | airport | stadium | ship | bridge | dam | chimney | harbor | overpass | vehicle | MAP |
ground track field | express way service area | express way toll station | basketball court | baseball field | storage tank | tennis court | train station | golf course | wind mill | ||
DIOR | 62.9 | 83.2 | 75.3 | 72.0 | 50.5 | 67.4 | 79.3 | 51.8 | 62.6 | 43.4 | 70.7 |
83.0 | 86.2 | 70.9 | 81.1 | 72.0 | 53.7 | 81.3 | 65.5 | 81.8 |