-
Notifications
You must be signed in to change notification settings - Fork 3
Run statistics Harvey dirty data
- geojson file: just_buildings_w_uid_cleaned.geojson
- total # of bboxes: 10866 (all damaged buildings)
- unique 2048 chips in geojson: 862
-
geojson file: just_buildings_w_uid_second_round.geojson
-
total # of bboxes: 10770 (all damaged buildings) = (8999 training + 1771 test)
-
unique 2048 chips in geojson: 854 (680 for training + 174 in test)
-
note*: this run contains invalid bboxes that are either all zeros or out of bounds.
-
actual chips included in train image folder: 730, test: 183
-
for training data: created val / train = 0.33
- INFO:root:Max chips per resolution: 2827
- INFO:root:Tot Box: 78463
- INFO:root:Chips: 1885 (train + val 512 x 512)
- for class: 1
- augmentation applied: 15296
- num of black small chips removed: 274
- num of small chips containing clouds: 16
- INFO:root:saved: 16585 train chips
- INFO:root:saved: 596 test chips
-
output: harvey_test_second.record (val) and harvey_train_second.record (train)
-
**note: harvey_test_second.record is actually for validation. **
-
Create a toy image set for visualizing model inferences: 146 chips from training data. This toy set contains randomly training and validation images
-
augmentation applied: one training image was augmented into 15 images.
- Same big tifs and geojson files as the second run, no manual inspection applied
- chipped into 256 * 256 small chips
- In script, removed invalide bboxes (all zeros, out of bounds, contain NaNs)
- faster-rcnn-inceptionV2: SSD-Xviews / ssd-inception-V2: ssd-Yan
With augmentation
- INFO:root:Max chips per resolution: 4288
- INFO:root:Tot Box: 62842
- INFO:root:Chips: 2859
- for class: 1
- augmentation applied: 17418
- num of black small chips removed: 978
- num of small chips containing clouds: 19
- INFO:root:saved: 19333 train chips
- INFO:root:saved: 944 test chips
No augmentation for training and val
- INFO:root:Max chips per resolution: 4288
- INFO:root:Tot Box: 8999
- INFO:root:Chips: 2859
- num of black small chips removed: 978
- num of small chips containing clouds: 19
- INFO:root:saved: 1997 train chips
- INFO:root:saved: 862 test chips
Test data:
- INFO:root:Max chips per resolution: 931
- INFO:root:Tot Box: 1723
- INFO:root:Chips: 621
- num of black small chips removed: 248
- num of small chips containing clouds: 5
- INFO:root:saved: 427 train chips
- INFO:root:saved: 194 test chips
Changes need to do with the test data: remove black big tiffs and bboxes in folder (harvey_test_second), and recreate geojson (harvey_test_second.geojson)
original: - geojson file: just_buildings_w_uid_second_round.geojson - total # of bboxes: 10770 (all damaged buildings) = (8999 training + 1771 test) - actual chips included in train image folder: 730, test: 183
Manually removed 7 big tiffs with black regions, recreated geojson - actual files in test image folder: 176 - The total number of bboxes for test: 1702 - The total number of 2048 chips for test: 167
- on SSD_Xview
- in initial geojson (train + test, no cleaning, no chipping, clipped by damaged buildings only, retain only 1 class: damaged buildings):
- The total number of bboxes for training + test: 126937
- The total number of bboxes for damaged buildings training + test: 17152
- The total number of bboxes for non-damaged buildings training + test: 109785
- The total number of 2048 chips for training + test: 1014
No cleaning data:
-
INFO:root:Max chips per resolution: 5511
-
INFO:root:Tot Box: 95009
-
INFO:root:Chips: 3674
-
for class: 1
-
augmentation applied: 22707
-
num of black small chips removed: 1343
-
num of small chips containing clouds: 998
-
INFO:root:saved: 25161 train chips
-
INFO:root:saved: 1220 test chips
-
folder: harvey_ssd_inceptionv2_ms_noclean_2class
-
note: training/val performed the best so far. mAP = 0.6 on validation set
- on SSD_Xview
-
in initial geojson (train + test, no cleaning, no chipping, clipped by ALL buildings, retain 2 classes):
- INFO:root:Max chips per resolution: 27313
- INFO:root:Tot Box: 208958
- INFO:root:Chips: 18209
- for class: 1
- augmentation applied: 26884
- num of black small chips removed: 1343
- num of small chips containing clouds: 998
- INFO:root:saved: 39097 train chips
- INFO:root:saved: 5996 test chips
- on SSD_Yan (harvey_ssd_inceptionv2_ms_noclean_1class)
-
in initial geojson (train + test, no cleaning, no chipping, clipped by ALL buildings, retain ONE classes):
- INFO:root:Max chips per resolution: 32496
- INFO:root:Tot Box: 102164
- INFO:root:Chips: 21664
- for class: 1
- augmentation applied: 25566
- num of black small chips removed: 1343
- num of small chips containing clouds: 998
- INFO:root:saved: 40037 train chips
- INFO:root:saved: 7193 test chips
-
use bboxes_tomnod_2class_noclean.geojson (did not change or clean this file)
-
removed train chips / test chips that do no appear in geojson
-
resulted in 712 training chips (train + val) in harvey_train_bigtiff_v3 folder = # of chips in harvey_train_second_ms_noclean.geojson
-
174 test images in harvey_test_bigtiff_v3 contain black chips. Removed 6 black chips, there are 168 test 2048 chips.
-
TODO: need to regenerate train/test geojson based on these two folders
-
Split train folder further into train + val (created training images (harvey_train_train_bigtiff_v3): 569 created val images (harvey_train_test_bigtiff_v3): 143)
-
note: so far ssd on this data works well (mAP = 0.48 on test dataset).
- harvey_test_ms_noclean_2class_v2.record
- harvey_train_ms_noclean_2class_v2.record
- harvey_val_ms_noclean_2class_v2.record
-
name: harvey_ssd_inceptionv2_ms_noclean_2class_v2
-
training data
- INFO:root:Tot Box: 213386
- INFO:root:Chips: 14660
- for class: 1
- augmentation applied: 32123
- num of black small chips removed: 985
- num of small chips containing clouds: 800
- INFO:root:saved: 46783 train chips
-
validation data:
- INFO:root:Max chips per resolution: 5323
- INFO:root:Tot Box: 17206
- INFO:root:Chips: 3549
- num of black small chips removed: 318
- num of small chips containing clouds: 198
- INFO:root:saved: 3549 test chips
-
test data
- INFO:root:Max chips per resolution: 5397
- INFO:root:Tot Box: 15548
- INFO:root:Chips: 3598
- num of black small chips removed: 0
- num of small chips containing clouds: 265
- INFO:root:saved: 3598 test chips
-
Create train/val/test small chips by centering each bbox at the center of the chip
-
Do aggressive augmentation (one image augmented to 15) for damaged buildings
-
Randomly do shift to chips that contain ONLY non-damaged buildings.
-
Val data:
harvey_val_noclean_2class_cropcenter.record
- INFO:root:Max chips per resolution: 26256
- INFO:root:Tot Box: 160932
- INFO:root:Chips: 17504
- num of black small chips removed: 558
- num of small chips containing clouds: 827
- INFO:root:saved: 17504 test chips
Caveat
-
Two ways of creating test data: 1) crop from the center; 2) regular tiling
-
Inference can be done on these two test dataset.
-
test data (crop from center)
harvey_test_noclean_2class_cropcenter.record
- INFO:root:Max chips per resolution: 23521
- INFO:root:Tot Box: 136103
- INFO:root:Chips: 15681
- num of black small chips removed: 0
- num of small chips containing clouds: 922
- INFO:root:saved: 15681 test chips
-
test data (regular chipping) harvey_test_ms_noclean_2class_v2.record
-
train data (deprecated due to bad training performance)
- INFO:root:Max chips per resolution: 105009
- INFO:root:Tot Box: 2319324
- INFO:root:Chips: 70006
- for class: 1
- augmentation applied: 176706
- num of black small chips removed: 949
- num of small chips containing clouds: 2760
- INFO:root:saved: 300347 train chips
-
training data (deprecated due to bad training performance ) (changed augmentation strategy: only augment chips that have class 1 > class 2, add random shift to some images that have class 1 < class 2)
- INFO:root:Max chips per resolution: 104811
- INFO:root:Tot Box: 1786213
- INFO:root:Chips: 69874
- for class: 1
- augmentation applied: 113308
- num of black small chips removed: 949
- num of small chips containing clouds: 2897
- num of original class 1 bboxes: 93190
- num of original class 2 bboxes: 515873
- num of class 1 augmented: 717058
- num of class 2 augmented: 460092
- num of class 1 in total: 810248
- num of class 2 in total: 975965
- INFO:root:saved: 242258 train chips
Tomnod + MS, 2 classes (Chip data from bbox center with random SHIFT and run SSD-Inception V2 on SSD-Xview)#
-
Cropped from center but added random shift in x and y
-
harvey_ssd_inceptionv2_ms_noclean_2class_cropcenter_shift
-
val data
harvey_val_ms_noclean_cropcenter_shift.record
-
INFO:root:Max chips per resolution: 26176
-
INFO:root:Tot Box: 156268
-
INFO:root:Chips: 17451
-
num of black small chips removed: 562
-
num of small chips containing clouds: 855
-
INFO:root:saved: 17451 test chips
-
test data
- INFO:root:Max chips per resolution: 23461
- INFO:root:Tot Box: 133689
- INFO:root:Chips: 15641
- num of black small chips removed: 0
- num of small chips containing clouds: 984
- INFO:root:saved: 15641 test chips
-
training data (deprecated due to bad training performance )
- INFO:root:Max chips per resolution: 104772
- INFO:root:Tot Box: 1364847
- INFO:root:Chips: 69848
- for class: 1
- augmentation applied: 113039
- num of black small chips removed: 954
- num of small chips containing clouds: 2857
- num of original class 1 bboxes: 90557
- num of original class 2 bboxes: 500502
- num of class 1 augmented: 693152
- num of class 2 augmented: 80636
- num of class 1 in total: 783709
- num of class 2 in total: 581138
- INFO:root:saved: 182887 train chips
-
Chip sequentially like before
-
Get stat about # chips containing class 1 and 2, respectively
-
Remove some chips (10%) that contain only class2
-
Change shift method, remove black pixels resulted from it, add 10 ~ 20% chips contain class1
-
augment chips that contain at least one class 1 bbox
-
test and val data still use: harvey_test_ms_noclean_2class_v2.record / harvey_val_ms_noclean_2class_v2.record
-
fixed bbox script: harvey_ms_noclean_2class_fixedprecision.geojson (NO CLEAN)
-
On SSDXview
- The total number of bboxes for training + test: 126937
- The total number of bboxes for damaged buildings training + test: 12407
- The total number of bboxes for non-damaged buildings training + test: 114530
- The total number of 2048 chips for training + test: 1014
Train data(DEPRECATED)
- INFO:root:Max chips per resolution: 27411
- INFO:root:Tot Box: 603360
- INFO:root:Chips: 18274
- for class: 1
- augmentation applied: 80515
- num of black small chips removed: 1000
- num of small chips containing clouds: 822
- num of original class 1 bboxes: 42547
- num of original class 2 bboxes: 65005
- num of class 1 bbox augmented: 401311
- num of class 2 bbox augmented: 94497
- num of class 1 bbox in total: 443858
- num of class 2 bbox in total: 159502
- num of original chips that contain class 1: 7885
- num of original chips that cntain class 2 bboxes: 13877
- num of class 1 chips augmented: 79511
- num of class 2 chips augmented: 34025
- num of chips that contain class 1 bbox in total: 87396
- num of chips that contain class 2 bbox in total: 47902
- INFO:root:saved: 98789 train chips
- training data
- INFO:root:Max chips per resolution: 25515
- INFO:root:Tot Box: 491050
- INFO:root:Chips: 17010
- for class: 1
- augmentation applied: 63432
- num of black small chips removed: 1001
- num of small chips containing clouds: 815
- num of original class 1 bboxes: 29344
- num of original class 2 bboxes: 69439
- num of class 1 bbox augmented: 275847
- num of class 2 bbox augmented: 116420
- num of class 1 bbox in total: 305191
- num of class 2 bbox in total: 185859
- num of original chips that contain class 1: 6237
- num of original chips that cntain class 2 bboxes: 14767
- num of class 1 chips augmented: 62342
- num of class 2 chips augmented: 39757
- num of chips that contain class 1 bbox in total: 68579
- num of chips that contain class 2 bbox in total: 54524
- INFO:root:saved: 80442 train chips
-
val data
- INFO:root:Max chips per resolution: 5323
- INFO:root:Tot Box: 17206
- INFO:root:Chips: 3549
- num of black small chips removed: 318
- num of small chips containing clouds: 198
- INFO:root:saved: 3549 test chips
-
test
- INFO:root:Max chips per resolution: 5397
- INFO:root:Tot Box: 15548
- INFO:root:Chips: 3598
- num of black small chips removed: 0
- num of small chips containing clouds: 265
- INFO:root:saved: 3598 test chips
-
sequential chip
-
do not discard class2 images
-
shift methods using crop from the center, but do not augment them
-
harvey_train_ms_noclean_2class_fixedprecision_v2.record
-
harvey_ssd_inceptionv2_ms_noclean_2class_fixedprecision_v2
-
on spot instances: ssd_spot_0808
train data
- INFO:root:Max chips per resolution: 26916
- INFO:root:Tot Box: 218974
- INFO:root:Chips: 17944
- for class: 1
- augmentation applied: 26645
- num of black small chips removed: 1001
- num of small chips containing clouds: 817
- num of original class 1 bboxes: 26715
- num of original class 2 bboxes: 68778
- num of class 1 bbox augmented: 72266
- num of class 2 bbox augmented: 51215
- num of class 1 bbox in total: 98981
- num of class 2 bbox in total: 119993
- num of original chips that contain class 1: 6021
- num of original chips that cntain class 2 bboxes: 15719
- num of class 1 chips augmented: 25806
- num of class 2 chips augmented: 17135
- number of chips added by shift: 3336
- num of chips that contain class 1 bbox in total: 31827
- num of chips that contain class 2 bbox in total: 32854
- INFO:root:saved: 44589 train chips
-
Automatic cloud removing using var/mean thresholding, this result in:3163 (concatenated_rm_cloud_bboxid.txt)
-
Then manual inspection removed:
- 2 entire chips in test dataset, resulted in 166 in test