Skip to content

Latest commit

 

History

History
executable file
·
178 lines (138 loc) · 13.4 KB

README.md

File metadata and controls

executable file
·
178 lines (138 loc) · 13.4 KB

Code produced for the academic abstract

"Investigating the capability of UAV imagery for AI-assisted mapping of Refugee Camps in East Africa"

This github repository is the code-base for the Master Thesis submitted for the Master der Naturwissenschaften in Applied Earth Observation and Geoanalysis of the Living Environment (EAGLE) at the Julius-Maximilians-Universität Würzburg. For Full Master thesis, please click here

This work of this thesis is partnered with the Humanitarian OpenStreetMap (HOTOSM) and supported by the German Aerospace Center / Deutsches Zentrum für Luft- und Raumfahrt (DLR).

Introduction

HOTOSM would like to develop a solution for assisted mapping which can predict buildings in refugee camps on the drone imagery provided by associated organisation OpenAerialMap. Refugee camps and informal settlements reside some of the most vulnerable population, the majority of which are located in Sub-Saharan East Africa (UNHCR, 2016). Many of these settlements often lack up-to-date maps of which we take for granted in developed cities. Having up-to-date maps are important for assisting administration (e.g. population estimates, infrastructure development) in data impoverished environments and thereby encourages economic productivity (Herfort et al., 2021). The data inequality between developed and developing areas can be reduced using assisted mapping technology. To extract geospatial and imagery characteristics of dense urban enviornments, a combination of VHR satellite imagery and Machine Learning (ML) are commonly used. Recent advances in CV based Deep Learning might be able to address these issues. Convolutional Neural Networks (CNN) are a subtype of the Deep Learning (DL) family used in CV tasks. Past studies using CNN have shown high accuracy and transferability in small geographical setting (Kuffer et al., 2022). The datasets provided for this project consist of both highly structured, zoned newer refugee camps and chaotic, highly complex older camps. In addition, roofing materials are highly heterogeneous, especially in older sites where thatched roofs are often mixed with litter. This coupled with the complex spatial autocorrelation and relation due to the lack of zoning in older sites hinder rule-based and conventional ML based approach. Therefore, a CNN based approach might be able to simplify the task of selecting and testing parameters, taking advantage of VHR textural information but also learning contextual relations (Lang et al., 2022; Lehner & Blaschke, 2022). This study will be connected to a pilot project on testing the capabilities of building segmentation.

Research Questions and Answers

RQ1. Do state-of-the-art models allow for accurate detection of buildings from UAV data in refugee camps?\

RQ2. What is the optimal mixture of accurate and less-accurate labels and how does that affect the segmentation output result?\

RQ2(a). How does the introduction of complex environment such as heterogeneous urban morphologies, roofing materials, and UAV drone artefacts affect result?\

Shallow EfficientNet encoders U-Nets performed slightly better in Precision, Dice Score, and IoU on less-complicated accurately labelled Kalobeyei dataset\

BUT, they suffer larger performace loss than classical U-Nets when complex data were introduced\

RQ3. How do existing models pre-trained on classical CV datasets and/or building datasets response when applied to the setting of refugee camps?\

Further training of the EfficientNet B1 U-Net (OCC initalised) have largest improvement in Recall\

Architectures with ImageNet initialisation only saw improvement with EfficientNet B1 encoder but not B2\

Inconclusive

Experimental setup

Pre-processing pipeline

Before any process are ran, please ensure you have the capability to run shell scripts and have gdal, and PyTorch installed.

*1. Download, extract, reproject, and resample OpenAerialMap WMS raster using curl_warp.sh
*2. Rasterise available vector labels using rasterise_LBL.sh
3. 2-step normalisation (z-score --> linear scale) using labelmaker.ipynb
*4. Split the RGB into separate tif using RGB_split.sh
**5. Create virtual raster with 4 bands (R, G, B, Labels) using gdalbuildvrt
**6. From VRT make permenant raster tif using gdal_translate
7. Return to labelmaker.ipynb and crop the stacked raster using labelmaker.ipynb
8. Clean the stacked and cropped raster for no labels and non conformity using KBY_clean.ipynb
*9. Change the tiff to png, and delete the tiff using tiff2png.sh

*.sh scripts are Unix instructed shell script. Run these scripts using ./NAME_OF_SHELL_SCRIPT.sh on your Linux terminal. If you are using windows machine, you can run these scripts using Cygwin or WSL
*Take extra care that many shell scrip have targetted reprojection EPSG projection automatically set to map projection EPSG:3857. You might want to change that depending on your usage.
**gdal is an open-source geospatial processing library. Which contains many shell and python scripts executing processes with good memory efficiency.

Training pipeline

  1. Dataloader dataloader.py
  2. Training loop Train_loop.ipynb
  3. Some classical U-Nets are available as class objects through Networks.py
  4. Otherwise, the rest of the CNNs are constructed using higher level API segmentation-models-pytorch

Exploratory Data Analysis

See example:

  1. EDA_BASEruns.ipynb
  2. PR_delta.ipynb

Testing and Predicion

  1. For single image testing on various networks see ALLtest_model.ipynb
  2. For custom function to parse each camp and predict using a trained network, see PredSeg_Camp.ipynb

Baseline training results for Kalobeyei, Kakuma (perfect dataset)

  • Dataset: 256x256 px. 0.15 m/px.
  • Trainning data with augmentation: 5719
  • Validation data with augmentation: 1224
  • Testing data: 272
  • Optimiser: Adam
  • Learning rate: 1e-3
  • Weight decay: 1e-5
  • Batch size: 32, 16(OCC - 5 layer EB1-Unet)
  • Scheduler: Reduce Learning Rate on Plateau(min 1e-8) [Patient: 20 epochs, factor: 0.1]

256KBYFour-UnetBASE 256KBYFive-UnetBASE 256KBYEB1-UNet-NoIMNBASE 256KBYEB1-UNet-IMNBASE 256KBYEB1-UNet-OCCUNTRAINEDBASE 256KBYEB1-UNet-OCCBASE

Baseline training results for Kalobeyei + Dzaleka + Dzaleka North (full dataset)

  • Dataset: 256x256 px. 0.15 m/px.
  • Trainning data with augmentation: 18242
  • Validation data with augmentation: 3909
  • Testing data: 435
  • Optimiser: Adam
  • Learning rate: 1e-3
  • Weight decay: 1e-5
  • Batch size: 32, 16(OCC - 5 layer EB1-Unet)
  • Scheduler: Reduce Learning Rate on Plateau(min 1e-8) [Patient: 20 epochs, factor: 0.1]

256ALLFour-UnetBASE 256ALLFive-UnetBASE 256ALLEB1-UNet-NoIMNBASE 256ALLEB1-UNet-IMNBASE 256ALLEB1-UNet-OCCUNTRAINEDBASE 256ALLEB1-UNet-OCCBASE

Class-based accuracy assesments

cat_CAA

EfficientNet B2 header performance

  • Dataset: 256x256 px. 0.15 m/px.
  • Trainning data with augmentation: 18242
  • Validation data with augmentation: 3909
  • Testing data: 435
  • Optimiser: Adam
  • Learning rate: 1e-3
  • Weight decay: 1e-5
  • Batch size: 32
  • Scheduler: Reduce Learning Rate on Plateau(min 1e-8) [Patient: 20 epochs, factor: 0.1]

EfficientNet B2 header U-Net ImageNet vs No ImageNet (Vanilla) weights, where: red = ImageNet and blue = No ImageNet

256KBYEB2-UNet-NoIMNBASE 256KBYEB2-UNet-IMNBASE 256ALLEB2-UNet-NoIMN 256ALLEB2-UNet-IMN

Best run logs and weights .pth files

The best run long for both the [KBY] & [KBY + DZK + DZKN] datasets for each architectures could be found in this folder
The best weights for inferences can be found in this folder
Testing dataset can be found in this folder

Naming scheme are as followed:\

18242:3909_256oc_EB1-Unet-qubvel_lr1e-3_wd1e-5_b16_ep500_BCE_RLRonPlateau(min1e-8)_iter_548820.pth.csv
Train:Val_dimension_architecture_learningrate_weightdecay_batchsize_maxEpoque_lossfunction_lrScheduler_iter_bestiteration.pth.csv

Depth-wise Precision and Recall change

Dataset-wise Precision and Recall change

Weight-wise Precision and Recall change

Key takeaways

  1. Deeper network tends to reduce the classification of False Positive
  2. Architectures trained with initialised weights from ImageNet tends to reduce the classification of False Negative
  3. Transferability of competition winning network is limited
  4. Models might have better precision than calculated due to Human labelling ambiguity