Skip to content

Run statistics Harvey dirty data

Annie edited this page Aug 10, 2018 · 1 revision

First Run (Deprecated)

  • geojson file: just_buildings_w_uid_cleaned.geojson
  • total # of bboxes: 10866 (all damaged buildings)
  • unique 2048 chips in geojson: 862

Second Run (ssd-inceptionV2)

  • geojson file: just_buildings_w_uid_second_round.geojson

  • total # of bboxes: 10770 (all damaged buildings) = (8999 training + 1771 test)

  • unique 2048 chips in geojson: 854 (680 for training + 174 in test)

  • note*: this run contains invalid bboxes that are either all zeros or out of bounds.

  • actual chips included in train image folder: 730, test: 183

  • for training data: created val / train = 0.33

    • INFO:root:Max chips per resolution: 2827
    • INFO:root:Tot Box: 78463
    • INFO:root:Chips: 1885 (train + val 512 x 512)
    • for class: 1
    • augmentation applied: 15296
    • num of black small chips removed: 274
    • num of small chips containing clouds: 16
    • INFO:root:saved: 16585 train chips
    • INFO:root:saved: 596 test chips
  • output: harvey_test_second.record (val) and harvey_train_second.record (train)

  • **note: harvey_test_second.record is actually for validation. **

  • Create a toy image set for visualizing model inferences: 146 chips from training data. This toy set contains randomly training and validation images

  • augmentation applied: one training image was augmented into 15 images.

Third Run (faster-rcnn-inceptionV2 / SSD-inceptionV2)

  • Same big tifs and geojson files as the second run, no manual inspection applied
  • chipped into 256 * 256 small chips
  • In script, removed invalide bboxes (all zeros, out of bounds, contain NaNs)
  • faster-rcnn-inceptionV2: SSD-Xviews / ssd-inception-V2: ssd-Yan

With augmentation

  • INFO:root:Max chips per resolution: 4288
  • INFO:root:Tot Box: 62842
  • INFO:root:Chips: 2859
  • for class: 1
  • augmentation applied: 17418
  • num of black small chips removed: 978
  • num of small chips containing clouds: 19
  • INFO:root:saved: 19333 train chips
  • INFO:root:saved: 944 test chips

No augmentation for training and val

- INFO:root:Max chips per resolution: 4288 
- INFO:root:Tot Box: 8999
- INFO:root:Chips: 2859
- num of black small chips removed:  978
- num of small chips containing clouds: 19
- INFO:root:saved: 1997 train chips
- INFO:root:saved: 862 test chips

Test data:

  • INFO:root:Max chips per resolution: 931
  • INFO:root:Tot Box: 1723
  • INFO:root:Chips: 621
  • num of black small chips removed: 248
  • num of small chips containing clouds: 5
  • INFO:root:saved: 427 train chips
  • INFO:root:saved: 194 test chips

note

Changes need to do with the test data: remove black big tiffs and bboxes in folder (harvey_test_second), and recreate geojson (harvey_test_second.geojson)

original: - geojson file: just_buildings_w_uid_second_round.geojson - total # of bboxes: 10770 (all damaged buildings) = (8999 training + 1771 test) - actual chips included in train image folder: 730, test: 183

Manually removed 7 big tiffs with black regions, recreated geojson - actual files in test image folder: 176 - The total number of bboxes for test: 1702 - The total number of 2048 chips for test: 167

Tomnod + MS, 2 classes (SSD-inception V2)#

  1. on SSD_Xview
  • in initial geojson (train + test, no cleaning, no chipping, clipped by damaged buildings only, retain only 1 class: damaged buildings):
    • The total number of bboxes for training + test: 126937
    • The total number of bboxes for damaged buildings training + test: 17152
    • The total number of bboxes for non-damaged buildings training + test: 109785
    • The total number of 2048 chips for training + test: 1014

No cleaning data:

  • INFO:root:Max chips per resolution: 5511

  • INFO:root:Tot Box: 95009

  • INFO:root:Chips: 3674

  • for class: 1

  • augmentation applied: 22707

  • num of black small chips removed: 1343

  • num of small chips containing clouds: 998

  • INFO:root:saved: 25161 train chips

  • INFO:root:saved: 1220 test chips

  • folder: harvey_ssd_inceptionv2_ms_noclean_2class

  • note: training/val performed the best so far. mAP = 0.6 on validation set

  1. on SSD_Xview
  • in initial geojson (train + test, no cleaning, no chipping, clipped by ALL buildings, retain 2 classes):

    • INFO:root:Max chips per resolution: 27313
    • INFO:root:Tot Box: 208958
    • INFO:root:Chips: 18209
    • for class: 1
    • augmentation applied: 26884
    • num of black small chips removed: 1343
    • num of small chips containing clouds: 998
    • INFO:root:saved: 39097 train chips
    • INFO:root:saved: 5996 test chips
  1. on SSD_Yan (harvey_ssd_inceptionv2_ms_noclean_1class)
  • in initial geojson (train + test, no cleaning, no chipping, clipped by ALL buildings, retain ONE classes):

    • INFO:root:Max chips per resolution: 32496
    • INFO:root:Tot Box: 102164
    • INFO:root:Chips: 21664
    • for class: 1
    • augmentation applied: 25566
    • num of black small chips removed: 1343
    • num of small chips containing clouds: 998
    • INFO:root:saved: 40037 train chips
    • INFO:root:saved: 7193 test chips

Tomnod + MS, 2 classes (Data cleaning and run SSD-Inception V2 / Faster-rcnn on SSD-Xview)#

  • use bboxes_tomnod_2class_noclean.geojson (did not change or clean this file)

  • removed train chips / test chips that do no appear in geojson

  • resulted in 712 training chips (train + val) in harvey_train_bigtiff_v3 folder = # of chips in harvey_train_second_ms_noclean.geojson

  • 174 test images in harvey_test_bigtiff_v3 contain black chips. Removed 6 black chips, there are 168 test 2048 chips.

  • TODO: need to regenerate train/test geojson based on these two folders

  • Split train folder further into train + val (created training images (harvey_train_train_bigtiff_v3): 569 created val images (harvey_train_test_bigtiff_v3): 143)

  • note: so far ssd on this data works well (mAP = 0.48 on test dataset).

    • harvey_test_ms_noclean_2class_v2.record
    • harvey_train_ms_noclean_2class_v2.record
    • harvey_val_ms_noclean_2class_v2.record
  • name: harvey_ssd_inceptionv2_ms_noclean_2class_v2

  • training data

    • INFO:root:Tot Box: 213386
    • INFO:root:Chips: 14660
    • for class: 1
    • augmentation applied: 32123
    • num of black small chips removed: 985
    • num of small chips containing clouds: 800
    • INFO:root:saved: 46783 train chips
  • validation data:

    • INFO:root:Max chips per resolution: 5323
    • INFO:root:Tot Box: 17206
    • INFO:root:Chips: 3549
    • num of black small chips removed: 318
    • num of small chips containing clouds: 198
    • INFO:root:saved: 3549 test chips
  • test data

    • INFO:root:Max chips per resolution: 5397
    • INFO:root:Tot Box: 15548
    • INFO:root:Chips: 3598
    • num of black small chips removed: 0
    • num of small chips containing clouds: 265
    • INFO:root:saved: 3598 test chips

Tomnod + MS, 2 classes (Chip data from bbox center and run SSD-Inception V2 on SSD-Xview)#

  • Create train/val/test small chips by centering each bbox at the center of the chip

  • Do aggressive augmentation (one image augmented to 15) for damaged buildings

  • Randomly do shift to chips that contain ONLY non-damaged buildings.

  • Val data:

harvey_val_noclean_2class_cropcenter.record

- INFO:root:Max chips per resolution: 26256 
- INFO:root:Tot Box: 160932
- INFO:root:Chips: 17504
- num of black small chips removed:  558
- num of small chips containing clouds: 827
- INFO:root:saved: 17504 test chips

Caveat

  • Two ways of creating test data: 1) crop from the center; 2) regular tiling

  • Inference can be done on these two test dataset.

  • test data (crop from center)

harvey_test_noclean_2class_cropcenter.record

- INFO:root:Max chips per resolution: 23521 
- INFO:root:Tot Box: 136103
- INFO:root:Chips: 15681
- num of black small chips removed:  0
- num of small chips containing clouds: 922
- INFO:root:saved: 15681 test chips
  • test data (regular chipping) harvey_test_ms_noclean_2class_v2.record

  • train data (deprecated due to bad training performance)

    • INFO:root:Max chips per resolution: 105009
    • INFO:root:Tot Box: 2319324
    • INFO:root:Chips: 70006
    • for class: 1
    • augmentation applied: 176706
    • num of black small chips removed: 949
    • num of small chips containing clouds: 2760
    • INFO:root:saved: 300347 train chips
  • training data (deprecated due to bad training performance ) (changed augmentation strategy: only augment chips that have class 1 > class 2, add random shift to some images that have class 1 < class 2)

    • INFO:root:Max chips per resolution: 104811
    • INFO:root:Tot Box: 1786213
    • INFO:root:Chips: 69874
    • for class: 1
    • augmentation applied: 113308
    • num of black small chips removed: 949
    • num of small chips containing clouds: 2897
    • num of original class 1 bboxes: 93190
    • num of original class 2 bboxes: 515873
    • num of class 1 augmented: 717058
    • num of class 2 augmented: 460092
    • num of class 1 in total: 810248
    • num of class 2 in total: 975965
    • INFO:root:saved: 242258 train chips

Tomnod + MS, 2 classes (Chip data from bbox center with random SHIFT and run SSD-Inception V2 on SSD-Xview)#

  • Cropped from center but added random shift in x and y

  • harvey_ssd_inceptionv2_ms_noclean_2class_cropcenter_shift

  • val data

harvey_val_ms_noclean_cropcenter_shift.record

  • INFO:root:Max chips per resolution: 26176

  • INFO:root:Tot Box: 156268

  • INFO:root:Chips: 17451

  • num of black small chips removed: 562

  • num of small chips containing clouds: 855

  • INFO:root:saved: 17451 test chips

  • test data

    • INFO:root:Max chips per resolution: 23461
    • INFO:root:Tot Box: 133689
    • INFO:root:Chips: 15641
    • num of black small chips removed: 0
    • num of small chips containing clouds: 984
    • INFO:root:saved: 15641 test chips
  • training data (deprecated due to bad training performance )

    • INFO:root:Max chips per resolution: 104772
    • INFO:root:Tot Box: 1364847
    • INFO:root:Chips: 69848
    • for class: 1
    • augmentation applied: 113039
    • num of black small chips removed: 954
    • num of small chips containing clouds: 2857
    • num of original class 1 bboxes: 90557
    • num of original class 2 bboxes: 500502
    • num of class 1 augmented: 693152
    • num of class 2 augmented: 80636
    • num of class 1 in total: 783709
    • num of class 2 in total: 581138
    • INFO:root:saved: 182887 train chips

Tomnod + MS, 2 classes, Subsample non-damaged buildings, modified shifting (SSD-Inception V2)#

  • Chip sequentially like before

  • Get stat about # chips containing class 1 and 2, respectively

  • Remove some chips (10%) that contain only class2

  • Change shift method, remove black pixels resulted from it, add 10 ~ 20% chips contain class1

  • augment chips that contain at least one class 1 bbox

  • test and val data still use: harvey_test_ms_noclean_2class_v2.record / harvey_val_ms_noclean_2class_v2.record

  • fixed bbox script: harvey_ms_noclean_2class_fixedprecision.geojson (NO CLEAN)

  • On SSDXview

    • The total number of bboxes for training + test: 126937
    • The total number of bboxes for damaged buildings training + test: 12407
    • The total number of bboxes for non-damaged buildings training + test: 114530
    • The total number of 2048 chips for training + test: 1014

Train data(DEPRECATED)

  • INFO:root:Max chips per resolution: 27411
  • INFO:root:Tot Box: 603360
  • INFO:root:Chips: 18274
  • for class: 1
  • augmentation applied: 80515
  • num of black small chips removed: 1000
  • num of small chips containing clouds: 822
  • num of original class 1 bboxes: 42547
  • num of original class 2 bboxes: 65005
  • num of class 1 bbox augmented: 401311
  • num of class 2 bbox augmented: 94497
  • num of class 1 bbox in total: 443858
  • num of class 2 bbox in total: 159502
  • num of original chips that contain class 1: 7885
  • num of original chips that cntain class 2 bboxes: 13877
  • num of class 1 chips augmented: 79511
  • num of class 2 chips augmented: 34025
  • num of chips that contain class 1 bbox in total: 87396
  • num of chips that contain class 2 bbox in total: 47902
  • INFO:root:saved: 98789 train chips
  • training data
  • INFO:root:Max chips per resolution: 25515
  • INFO:root:Tot Box: 491050
  • INFO:root:Chips: 17010
  • for class: 1
  • augmentation applied: 63432
  • num of black small chips removed: 1001
  • num of small chips containing clouds: 815
  • num of original class 1 bboxes: 29344
  • num of original class 2 bboxes: 69439
  • num of class 1 bbox augmented: 275847
  • num of class 2 bbox augmented: 116420
  • num of class 1 bbox in total: 305191
  • num of class 2 bbox in total: 185859
  • num of original chips that contain class 1: 6237
  • num of original chips that cntain class 2 bboxes: 14767
  • num of class 1 chips augmented: 62342
  • num of class 2 chips augmented: 39757
  • num of chips that contain class 1 bbox in total: 68579
  • num of chips that contain class 2 bbox in total: 54524
  • INFO:root:saved: 80442 train chips
  • val data

    • INFO:root:Max chips per resolution: 5323
    • INFO:root:Tot Box: 17206
    • INFO:root:Chips: 3549
    • num of black small chips removed: 318
    • num of small chips containing clouds: 198
    • INFO:root:saved: 3549 test chips
  • test

    • INFO:root:Max chips per resolution: 5397
    • INFO:root:Tot Box: 15548
    • INFO:root:Chips: 3598
    • num of black small chips removed: 0
    • num of small chips containing clouds: 265
    • INFO:root:saved: 3598 test chips

Tomnod + MS, 2 classes, modified shifting (SSD-Inception V2)#

  • sequential chip

  • do not discard class2 images

  • shift methods using crop from the center, but do not augment them

  • harvey_train_ms_noclean_2class_fixedprecision_v2.record

  • harvey_ssd_inceptionv2_ms_noclean_2class_fixedprecision_v2

  • on spot instances: ssd_spot_0808

train data

  • INFO:root:Max chips per resolution: 26916
  • INFO:root:Tot Box: 218974
  • INFO:root:Chips: 17944
  • for class: 1
  • augmentation applied: 26645
  • num of black small chips removed: 1001
  • num of small chips containing clouds: 817
  • num of original class 1 bboxes: 26715
  • num of original class 2 bboxes: 68778
  • num of class 1 bbox augmented: 72266
  • num of class 2 bbox augmented: 51215
  • num of class 1 bbox in total: 98981
  • num of class 2 bbox in total: 119993
  • num of original chips that contain class 1: 6021
  • num of original chips that cntain class 2 bboxes: 15719
  • num of class 1 chips augmented: 25806
  • num of class 2 chips augmented: 17135
  • number of chips added by shift: 3336
  • num of chips that contain class 1 bbox in total: 31827
  • num of chips that contain class 2 bbox in total: 32854
  • INFO:root:saved: 44589 train chips

Annotation Cleaning

  • Automatic cloud removing using var/mean thresholding, this result in:3163 (concatenated_rm_cloud_bboxid.txt)

  • Then manual inspection removed:

    • 2 entire chips in test dataset, resulted in 166 in test