initial commit

noelshin · Sep 22, 2022 · abcf534 · abcf534
commit abcf534
Show file tree

Hide file tree

Showing 59 changed files with 6,734 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+.DS_Store
+.idea/
+_video/
+
diff --git a/README.md b/README.md
@@ -0,0 +1,204 @@
+## NamedMask: Distilling Segmenters from Complementary Foundation Models
+Official PyTorch implementation for NamedMask. Details can be found in the paper.
+[[Paper]](https://arxiv.org/pdf/2206.07045.pdf) [[Project page]](https://www.robots.ox.ac.uk/~vgg/research/namedmask)
+
+![Alt Text](project_page/images/out_no_loop.gif)
+
+### Contents
+* [Preparation](#preparation)
+* [NamedMask training/inference](#namedmask-training/inference)
+* [Pre-trained weights](#pre-trained-weights)
+* [Citation](#citation)
+* [Acknowledgements](#acknowledgements)
+
+### Preparation
+#### 1. Download datasets
+Please download datasets of interest first by visiting the following links:
+* [Cityscapes](https://www.cityscapes-dataset.com/login)
+* [CoCA](http://zhaozhang.net/coca.html)
+* [COCO2017](https://cocodataset.org/#download)
+* [VOC2012](https://github.com/mhamilton723/STEGO#install) 
+* [(Optional) ImageNet2012](https://image-net.org/download.php) (for an index dataset used in training)
+
+It is worth noting that Cityscapes and ImageNet2012 require you to sign up an account.
+In addition, you need to download ImageNet2012 if you want to train NamedMask yourself. 
+
+We advise you to put the downloaded dataset(s) into the following directory structure for ease of implementation:
+```bash
+{your_dataset_directory}
+├──cityscapes
+│  ├──gtFine
+│  ├──leftImg8bit
+├──coca
+│  ├──binary
+│  ├──image
+├──coco2017
+│  ├──annotations
+│  ├──train2017
+│  ├──val2017
+├──ImageNet2012
+│  ├──train
+│  ├──val
+├──ImageNet-S
+│  ├──ImageNetS50
+│  ├──ImageNetS300
+│  ├──ImageNetS919
+├──VOCdevkit
+   ├──VOC2012
+```
+
+#### 2. Download required python packages:
+```
+conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
+conda install -c conda-forge tqdm
+conda install -c conda-forge matplotlib
+conda install -c anaconda ujson
+conda install -c conda-forge pyyaml
+conda install -c conda-forge pycocotools 
+conda install -c anaconda scipy
+pip install opencv-python
+pip install git+https://github.com/openai/CLIP.git
+```
+Please note that a required version of each package might vary depending on your local device.
+
+### NamedMask training/inference
+NamedMask is trained with pseudo-labels from either an unsupervised saliency detector (e.g., SelfMask) or category experts which refines the predictions made by the saliency network.
+For this reason, we need to generate pseudo-labels before training NamedMask. You can skip this part if you only want to do inference with pre-trained weights provided [below](#pre-trained-weights).
+
+#### 1. Generate pseudo-labels
+To compute pseudo-masks for images of the categories in Cityscapes, COCO2017, CoCA, or VOC2012,
+we provide for each benchmark a dictionary file (.json format) which maps a category to a list of 500 ImageNet2012 image paths which are retrieved by CLIP (with ViT-L/14@336px architecture).
+This file has the following structure:
+```python
+{
+    "category_a": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
+    "category_b": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
+    ...
+}
+```
+You need to change `{your_imagenet_dir}` before loading this file for the following steps (by default, it's set to `/home/cs-shin1/datasets/ImageNet2012`).
+
+Please download a dictionary file for a benchmark on which you want to evaluate and put it in the `ImageNet2012` directory:
+* [Cityscapes (object)](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/cityscapes/cityscapes_object_category_to_p_images_n500.json)
+* [CoCA](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coca/coca_category_to_p_images_n500.json)
+* [COCO2017](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coco2017/coco2017_category_to_p_images_n500.json)
+* [VOC2012](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/voc2012_category_to_p_images_n500.json)
+
+Then, open 
+`selfmask.sh` in `scripts` directory and change 
+```shell
+DIR_ROOT={your_working_directory}
+DIR_DATASET={your_ImageNet2012_directory}
+CATEGORY_TO_P_IMAGES_FP={your_category_to_p_images_fp}  # this should point to a json file you downloaded above
+```
+
+Run,
+```shell
+bash selfmask.sh
+```
+This will generate pseudo-masks for images retrieved by CLIP (with ViT-L/14@336px architecture) from the ImageNet2012 training set.
+The pseudo-masks will be saved at `{your_ImageNet2012_directory}/train_pseudo_masks_selfmask`.
+
+If you want to skip this process, please download the pre-computed pseudo-masks and uncompress it in `{your_ImageNet2012_directory}/train_pseudo_masks_selfmask`:
+* [pseudo-masks from SelfMask](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet2012/selfmask.zip) (~89 MB)
+
+Optionally, if you want to refine pseudo-masks with a category expert (after finishing the above step), check out 
+`expert_$DATASET_NAME_category.sh` file and configure `DIR_ROOT`, `CATEGORY_TO_P_IMAGES_FP` and `CATEGORY_TO_P_IMAGES_FP` as appropriate. Then,
+```shell
+bash expert_$DATASET_NAME_category.sh
+```
+Currently, we only provide code for training experts of the VOC2012 categories. 
+The pseudo-masks will be saved at `{your_ImageNet2012_directory}/train_pseudo_masks_experts`.
+
+If you want to skip this process, please download the pre-computed pseudo-masks:
+* [Cityscapes pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/cityscapes/cityscapes_object.zip) (~ 6.5 MB)
+* [CoCA pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coca/coca.zip) (~ 36 MB)
+* [COCO2017 pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coco2017/coco2017.zip) (~ 36 MB)
+* [VOC2012 pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/voc2012.zip) (~ 11 MB)
+
+Please uncompress `.zip` file in `{your_ImageNet2012_directory}/train_pseudo_masks_experts`.
+
+#### 2. Training
+Once pseudo-masks are created (or downloaded and uncompressed), set a path to the directory that contains the pseudo-masks in a configuration file.
+For example, to train a model with pseudo-masks from experts for the VOC2012 categories, open the
+`voc_val_n500_cp2_ex.yaml` file and change
+
+```yaml
+category_to_p_images_fp: {your_category_to_p_images_fp}  # this should point to a json file you downloaded above
+dir_ckpt: {your_dir_ckpt}  # this should point to a checkpoint directory
+dir_train_dataset: {your_dir_train_dataset}  # this should point to ImageNet2012 directory (as an index dataset)
+dir_val_dataset: {your_dir_val_dataset}  # this should point to a benchmark directory
+```
+
+arguments as appropriate.
+
+Then, run
+```shell
+bash voc_val_n500_cp2_sr10100_ex.sh
+```
+
+It is worth noting that an evaluation will be made at every certain iterations during training and
+the final weights will be saved at your checkpoint directory.
+
+#### 3. Inference
+To evaluate a model with pre-trained weights on a benchmark, e.g., VOC2012, please run
+(after customising the four arguments above)
+```shell
+bash voc_val_n500_cp2_sr10100_ex.sh $PATH_TO_WEIGHTS
+```
+
+
+### Pre-trained weights
+We provide the pre-trained weights of NamedMask:
+
+benchmark|split|IoU (%)|pixel accuracy (%)|link|
+:---:|:---:|:---:|:---:|:---:|
+Cityscapes (object) | val | 18.2 | 93.0 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/cityscapes/namedmask_cityscapes_object.pt) (~102 MB)
+COCA | - | 27.4 | 82.0 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coca/namedmask_coca.pt) (~102 MB)
+COCO2017 | val | 27.7 | 76.4 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coco2017/namedmask_coco2017.pt) (~102 MB)
+ImageNet-S50 | test | 47.5 | - |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet-s/namedmask_imagenet_s50.pt) (~102 MB)
+ImageNet-S300 | test | 33.1 | - |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet-s/namedmask_imagenet_s300.pt) (~103 MB)
+ImageNet-S919 | test | 23.1 | - |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet-s/namedmask_imagenet_s919.pt) (~103 MB)
+VOC2012 | val | 59.3 | 89.2 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/namedmask_voc2012.pt) (~102 MB)
+
+We additionally offer the pre-trained weights of the category experts for 20 classes in VOC2012:
+
+category|link|
+:---:|:---:|
+aeroplane|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_aeroplane.pt) (~102 MB)
+bicycle|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bicycle.pt) (~102 MB)
+bird|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bird.pt) (~102 MB)
+boat|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_boat.pt) (~102 MB)
+bottle|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bottle.pt) (~102 MB)
+bus|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bus.pt) (~102 MB)
+car|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_car.pt) (~102 MB)
+cat|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_cat.pt) (~102 MB)
+chair|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_chair.pt) (~102 MB)
+cow|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_cow.pt) (~102 MB)
+dining table|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_dining_table.pt) (~102 MB)
+dog|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_dog.pt) (~102 MB)
+horse|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_horse.pt) (~102 MB)
+motorbike|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_motorbike.pt) (~102 MB)
+person|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_person.pt) (~102 MB)
+potted plant|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_potted_plant.pt) (~102 MB)
+sheep|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_sheep.pt) (~102 MB)
+sofa|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_sofa.pt) (~102 MB)
+train|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_train.pt) (~102 MB)
+tv/monitor|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_tv_monitor.pt) (~102 MB)
+
+### Citation
+```
+@article{shin2022namedmask,
+  author = {Shin, Gyungin and Xie, Weidi and Albanie, Samuel},
+  title = {NamedMask: Distilling Segmenters from Complementary Foundation Models},
+  journal = {arXiv:},
+  year = {2022}
+}
+```
+
+### Acknowledgements
+We borrowed the code for SelfMask and DeepLabv3+ from
+* [SelfMask](https://github.com/NoelShin/selfmask)
+* [DeepLabv3+](https://github.com/VainF/DeepLabV3Plus-Pytorch)
+
+If you have any questions about our code/implementation, please contact us at gyungin [at] robots [dot] ox [dot] ac [dot] uk.
diff --git a/configs/cityscapes_category.yaml b/configs/cityscapes_category.yaml
@@ -0,0 +1,49 @@
+# base directories
+category_to_p_images_fp: "/home/cs-shin1/datasets/ImageNet2012/cityscapes_object_category_to_p_images_n500.json"
+dir_ckpt: "/home/cs-shin1/namedmask/ckpt"
+dir_train_dataset: "/home/cs-shin1/datasets/ImageNet2012"
+dir_val_dataset: "/home/cs-shin1/datasets/cityscapes"
+
+# augmentations
+max_n_masks: 1
+scale_range: [ 0.1, 1.0 ]
+
+use_specialist_pseudo_masks: false
+category_agnostic: true  # no-op in this case
+
+n_categories: 2
+categories: null
+n_images: 500
+
+# dataset
+dataset_name: "cityscapes_object"
+split: "train"
+train_image_size: 384
+
+# dataloader:
+train_dataloader_kwargs:
+  batch_size: 8
+  num_workers: 16
+  pin_memory: true
+  shuffle: true
+#  drop_last: true  # this is to prevent an error from a batch norm during training, i.e., ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
+
+val_dataloader_kwargs:
+  batch_size: 1
+  num_workers: 0
+  pin_memory: true
+
+# Segmenter configuration
+# ["deeplabv3plus_resnet101", "deeplabv3plus_resnet50", "deeplabv3plus_mobilenet"]
+segmenter_name: "deeplabv3plus_resnet50"
+
+# optimiser
+lr: 0.0005
+momentum: 0.9
+weight_decay: 0.0002
+betas: [0.9, 0.999]
+n_iters: 5000  # a very light schedule
+
+iter_eval: 1000
+iter_log: 100
+iter_curriculum: null
diff --git a/configs/cityscapes_val_n500_cp2_ex.yaml b/configs/cityscapes_val_n500_cp2_ex.yaml
@@ -0,0 +1,51 @@
+# base directories
+category_to_p_images_fp: "/home/cs-shin1/datasets/ImageNet2012/cityscapes_object_category_to_p_images_n500.json"
+dir_ckpt: "/home/cs-shin1/namedmask/ckpt"
+dir_train_dataset: "/home/cs-shin1/datasets/ImageNet2012"
+dir_val_dataset: "/home/cs-shin1/datasets/cityscapes"
+
+# augmentations
+max_n_masks: 2
+scale_range: [ 0.1, 1.0 ]
+
+use_expert_pseudo_masks: true
+category_agnostic: false
+
+n_categories: 15
+categories: [
+    "pole", "polegroup", "traffic light", "traffic sign", "person", "rider", "car", "truck", "bus", "caravan","trailer",
+    "train", "motorcycle", "bicycle"
+]
+
+n_images: 500
+
+# dataset
+dataset_name: "cityscapes_object"
+split: "val"
+train_image_size: 384
+
+# dataloader:
+train_dataloader_kwargs:
+  batch_size: 16
+  num_workers: 16
+  pin_memory: true
+  shuffle: true
+
+val_dataloader_kwargs:
+  batch_size: 1
+  num_workers: 4
+  pin_memory: true
+
+# Segmenter configuration
+# ["deeplabv3plus_resnet101", "deeplabv3plus_resnet50", "deeplabv3plus_mobilenet"]
+segmenter_name: "deeplabv3plus_resnet50"
+
+# optimiser
+lr: 0.0005
+momentum: 0.9
+weight_decay: 0.0002
+betas: [0.9, 0.999]
+n_iters: 20000
+
+iter_eval: 1000
+iter_log: 100
diff --git a/configs/coca_category.yaml b/configs/coca_category.yaml
@@ -0,0 +1,49 @@
+# base directories
+category_to_p_images_fp: "/home/cs-shin1/datasets/ImageNet2012/coca_category_to_p_images_n500.json"
+dir_ckpt: "/home/cs-shin1/namedmask/ckpt"
+dir_train_dataset: "/home/cs-shin1/datasets/ImageNet2012"
+dir_val_dataset: "/home/cs-shin1/datasets/coca"
+
+# augmentations
+max_n_masks: 1
+scale_range: [ 0.1, 1.0 ]
+
+use_specialist_pseudo_masks: false
+category_agnostic: true  # no-op in this case
+
+n_categories: 2
+categories: null
+n_images: 500
+
+# dataset
+dataset_name: "coca"
+split: "train"
+train_image_size: 384
+
+# dataloader:
+train_dataloader_kwargs:
+  batch_size: 8
+  num_workers: 16
+  pin_memory: true
+  shuffle: true
+#  drop_last: true  # this is to prevent an error from a batch norm during training, i.e., ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
+
+val_dataloader_kwargs:
+  batch_size: 1
+  num_workers: 0
+  pin_memory: true
+
+# Segmenter configuration
+# ["deeplabv3plus_resnet101", "deeplabv3plus_resnet50", "deeplabv3plus_mobilenet"]
+segmenter_name: "deeplabv3plus_resnet50"
+
+# optimiser
+lr: 0.0005
+momentum: 0.9
+weight_decay: 0.0002
+betas: [0.9, 0.999]
+n_iters: 5000  # a very light schedule
+
+iter_eval: 1000
+iter_log: 100
+iter_curriculum: null