Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
noelshin committed Sep 22, 2022
0 parents commit abcf534
Show file tree
Hide file tree
Showing 59 changed files with 6,734 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.DS_Store
.idea/
_video/

204 changes: 204 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
## NamedMask: Distilling Segmenters from Complementary Foundation Models
Official PyTorch implementation for NamedMask. Details can be found in the paper.
[[Paper]](https://arxiv.org/pdf/2206.07045.pdf) [[Project page]](https://www.robots.ox.ac.uk/~vgg/research/namedmask)

![Alt Text](project_page/images/out_no_loop.gif)

### Contents
* [Preparation](#preparation)
* [NamedMask training/inference](#namedmask-training/inference)
* [Pre-trained weights](#pre-trained-weights)
* [Citation](#citation)
* [Acknowledgements](#acknowledgements)

### Preparation
#### 1. Download datasets
Please download datasets of interest first by visiting the following links:
* [Cityscapes](https://www.cityscapes-dataset.com/login)
* [CoCA](http://zhaozhang.net/coca.html)
* [COCO2017](https://cocodataset.org/#download)
* [VOC2012](https://github.com/mhamilton723/STEGO#install)
* [(Optional) ImageNet2012](https://image-net.org/download.php) (for an index dataset used in training)

It is worth noting that Cityscapes and ImageNet2012 require you to sign up an account.
In addition, you need to download ImageNet2012 if you want to train NamedMask yourself.

We advise you to put the downloaded dataset(s) into the following directory structure for ease of implementation:
```bash
{your_dataset_directory}
├──cityscapes
│ ├──gtFine
│ ├──leftImg8bit
├──coca
│ ├──binary
│ ├──image
├──coco2017
│ ├──annotations
│ ├──train2017
│ ├──val2017
├──ImageNet2012
│ ├──train
│ ├──val
├──ImageNet-S
│ ├──ImageNetS50
│ ├──ImageNetS300
│ ├──ImageNetS919
├──VOCdevkit
├──VOC2012
```

#### 2. Download required python packages:
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge tqdm
conda install -c conda-forge matplotlib
conda install -c anaconda ujson
conda install -c conda-forge pyyaml
conda install -c conda-forge pycocotools
conda install -c anaconda scipy
pip install opencv-python
pip install git+https://github.com/openai/CLIP.git
```
Please note that a required version of each package might vary depending on your local device.

### NamedMask training/inference
NamedMask is trained with pseudo-labels from either an unsupervised saliency detector (e.g., SelfMask) or category experts which refines the predictions made by the saliency network.
For this reason, we need to generate pseudo-labels before training NamedMask. You can skip this part if you only want to do inference with pre-trained weights provided [below](#pre-trained-weights).

#### 1. Generate pseudo-labels
To compute pseudo-masks for images of the categories in Cityscapes, COCO2017, CoCA, or VOC2012,
we provide for each benchmark a dictionary file (.json format) which maps a category to a list of 500 ImageNet2012 image paths which are retrieved by CLIP (with ViT-L/14@336px architecture).
This file has the following structure:
```python
{
"category_a": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
"category_b": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
...
}
```
You need to change `{your_imagenet_dir}` before loading this file for the following steps (by default, it's set to `/home/cs-shin1/datasets/ImageNet2012`).

Please download a dictionary file for a benchmark on which you want to evaluate and put it in the `ImageNet2012` directory:
* [Cityscapes (object)](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/cityscapes/cityscapes_object_category_to_p_images_n500.json)
* [CoCA](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coca/coca_category_to_p_images_n500.json)
* [COCO2017](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coco2017/coco2017_category_to_p_images_n500.json)
* [VOC2012](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/voc2012_category_to_p_images_n500.json)

Then, open
`selfmask.sh` in `scripts` directory and change
```shell
DIR_ROOT={your_working_directory}
DIR_DATASET={your_ImageNet2012_directory}
CATEGORY_TO_P_IMAGES_FP={your_category_to_p_images_fp} # this should point to a json file you downloaded above
```

Run,
```shell
bash selfmask.sh
```
This will generate pseudo-masks for images retrieved by CLIP (with ViT-L/14@336px architecture) from the ImageNet2012 training set.
The pseudo-masks will be saved at `{your_ImageNet2012_directory}/train_pseudo_masks_selfmask`.

If you want to skip this process, please download the pre-computed pseudo-masks and uncompress it in `{your_ImageNet2012_directory}/train_pseudo_masks_selfmask`:
* [pseudo-masks from SelfMask](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet2012/selfmask.zip) (~89 MB)

Optionally, if you want to refine pseudo-masks with a category expert (after finishing the above step), check out
`expert_$DATASET_NAME_category.sh` file and configure `DIR_ROOT`, `CATEGORY_TO_P_IMAGES_FP` and `CATEGORY_TO_P_IMAGES_FP` as appropriate. Then,
```shell
bash expert_$DATASET_NAME_category.sh
```
Currently, we only provide code for training experts of the VOC2012 categories.
The pseudo-masks will be saved at `{your_ImageNet2012_directory}/train_pseudo_masks_experts`.

If you want to skip this process, please download the pre-computed pseudo-masks:
* [Cityscapes pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/cityscapes/cityscapes_object.zip) (~ 6.5 MB)
* [CoCA pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coca/coca.zip) (~ 36 MB)
* [COCO2017 pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coco2017/coco2017.zip) (~ 36 MB)
* [VOC2012 pseudo-masks from category experts](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/voc2012.zip) (~ 11 MB)

Please uncompress `.zip` file in `{your_ImageNet2012_directory}/train_pseudo_masks_experts`.

#### 2. Training
Once pseudo-masks are created (or downloaded and uncompressed), set a path to the directory that contains the pseudo-masks in a configuration file.
For example, to train a model with pseudo-masks from experts for the VOC2012 categories, open the
`voc_val_n500_cp2_ex.yaml` file and change

```yaml
category_to_p_images_fp: {your_category_to_p_images_fp} # this should point to a json file you downloaded above
dir_ckpt: {your_dir_ckpt} # this should point to a checkpoint directory
dir_train_dataset: {your_dir_train_dataset} # this should point to ImageNet2012 directory (as an index dataset)
dir_val_dataset: {your_dir_val_dataset} # this should point to a benchmark directory
```
arguments as appropriate.
Then, run
```shell
bash voc_val_n500_cp2_sr10100_ex.sh
```

It is worth noting that an evaluation will be made at every certain iterations during training and
the final weights will be saved at your checkpoint directory.

#### 3. Inference
To evaluate a model with pre-trained weights on a benchmark, e.g., VOC2012, please run
(after customising the four arguments above)
```shell
bash voc_val_n500_cp2_sr10100_ex.sh $PATH_TO_WEIGHTS
```


### Pre-trained weights
We provide the pre-trained weights of NamedMask:

benchmark|split|IoU (%)|pixel accuracy (%)|link|
:---:|:---:|:---:|:---:|:---:|
Cityscapes (object) | val | 18.2 | 93.0 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/cityscapes/namedmask_cityscapes_object.pt) (~102 MB)
COCA | - | 27.4 | 82.0 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coca/namedmask_coca.pt) (~102 MB)
COCO2017 | val | 27.7 | 76.4 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/coco2017/namedmask_coco2017.pt) (~102 MB)
ImageNet-S50 | test | 47.5 | - |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet-s/namedmask_imagenet_s50.pt) (~102 MB)
ImageNet-S300 | test | 33.1 | - |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet-s/namedmask_imagenet_s300.pt) (~103 MB)
ImageNet-S919 | test | 23.1 | - |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/imagenet-s/namedmask_imagenet_s919.pt) (~103 MB)
VOC2012 | val | 59.3 | 89.2 |[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/namedmask_voc2012.pt) (~102 MB)

We additionally offer the pre-trained weights of the category experts for 20 classes in VOC2012:

category|link|
:---:|:---:|
aeroplane|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_aeroplane.pt) (~102 MB)
bicycle|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bicycle.pt) (~102 MB)
bird|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bird.pt) (~102 MB)
boat|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_boat.pt) (~102 MB)
bottle|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bottle.pt) (~102 MB)
bus|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_bus.pt) (~102 MB)
car|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_car.pt) (~102 MB)
cat|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_cat.pt) (~102 MB)
chair|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_chair.pt) (~102 MB)
cow|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_cow.pt) (~102 MB)
dining table|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_dining_table.pt) (~102 MB)
dog|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_dog.pt) (~102 MB)
horse|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_horse.pt) (~102 MB)
motorbike|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_motorbike.pt) (~102 MB)
person|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_person.pt) (~102 MB)
potted plant|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_potted_plant.pt) (~102 MB)
sheep|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_sheep.pt) (~102 MB)
sofa|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_sofa.pt) (~102 MB)
train|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_train.pt) (~102 MB)
tv/monitor|[weights](https://www.robots.ox.ac.uk/~vgg/research/namedmask/shared_files/voc2012/experts/expert_tv_monitor.pt) (~102 MB)

### Citation
```
@article{shin2022namedmask,
author = {Shin, Gyungin and Xie, Weidi and Albanie, Samuel},
title = {NamedMask: Distilling Segmenters from Complementary Foundation Models},
journal = {arXiv:},
year = {2022}
}
```

### Acknowledgements
We borrowed the code for SelfMask and DeepLabv3+ from
* [SelfMask](https://github.com/NoelShin/selfmask)
* [DeepLabv3+](https://github.com/VainF/DeepLabV3Plus-Pytorch)

If you have any questions about our code/implementation, please contact us at gyungin [at] robots [dot] ox [dot] ac [dot] uk.
49 changes: 49 additions & 0 deletions configs/cityscapes_category.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# base directories
category_to_p_images_fp: "/home/cs-shin1/datasets/ImageNet2012/cityscapes_object_category_to_p_images_n500.json"
dir_ckpt: "/home/cs-shin1/namedmask/ckpt"
dir_train_dataset: "/home/cs-shin1/datasets/ImageNet2012"
dir_val_dataset: "/home/cs-shin1/datasets/cityscapes"

# augmentations
max_n_masks: 1
scale_range: [ 0.1, 1.0 ]

use_specialist_pseudo_masks: false
category_agnostic: true # no-op in this case

n_categories: 2
categories: null
n_images: 500

# dataset
dataset_name: "cityscapes_object"
split: "train"
train_image_size: 384

# dataloader:
train_dataloader_kwargs:
batch_size: 8
num_workers: 16
pin_memory: true
shuffle: true
# drop_last: true # this is to prevent an error from a batch norm during training, i.e., ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

val_dataloader_kwargs:
batch_size: 1
num_workers: 0
pin_memory: true

# Segmenter configuration
# ["deeplabv3plus_resnet101", "deeplabv3plus_resnet50", "deeplabv3plus_mobilenet"]
segmenter_name: "deeplabv3plus_resnet50"

# optimiser
lr: 0.0005
momentum: 0.9
weight_decay: 0.0002
betas: [0.9, 0.999]
n_iters: 5000 # a very light schedule

iter_eval: 1000
iter_log: 100
iter_curriculum: null
51 changes: 51 additions & 0 deletions configs/cityscapes_val_n500_cp2_ex.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# base directories
category_to_p_images_fp: "/home/cs-shin1/datasets/ImageNet2012/cityscapes_object_category_to_p_images_n500.json"
dir_ckpt: "/home/cs-shin1/namedmask/ckpt"
dir_train_dataset: "/home/cs-shin1/datasets/ImageNet2012"
dir_val_dataset: "/home/cs-shin1/datasets/cityscapes"

# augmentations
max_n_masks: 2
scale_range: [ 0.1, 1.0 ]

use_expert_pseudo_masks: true
category_agnostic: false

n_categories: 15
categories: [
"pole", "polegroup", "traffic light", "traffic sign", "person", "rider", "car", "truck", "bus", "caravan","trailer",
"train", "motorcycle", "bicycle"
]

n_images: 500

# dataset
dataset_name: "cityscapes_object"
split: "val"
train_image_size: 384

# dataloader:
train_dataloader_kwargs:
batch_size: 16
num_workers: 16
pin_memory: true
shuffle: true

val_dataloader_kwargs:
batch_size: 1
num_workers: 4
pin_memory: true

# Segmenter configuration
# ["deeplabv3plus_resnet101", "deeplabv3plus_resnet50", "deeplabv3plus_mobilenet"]
segmenter_name: "deeplabv3plus_resnet50"

# optimiser
lr: 0.0005
momentum: 0.9
weight_decay: 0.0002
betas: [0.9, 0.999]
n_iters: 20000

iter_eval: 1000
iter_log: 100
49 changes: 49 additions & 0 deletions configs/coca_category.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# base directories
category_to_p_images_fp: "/home/cs-shin1/datasets/ImageNet2012/coca_category_to_p_images_n500.json"
dir_ckpt: "/home/cs-shin1/namedmask/ckpt"
dir_train_dataset: "/home/cs-shin1/datasets/ImageNet2012"
dir_val_dataset: "/home/cs-shin1/datasets/coca"

# augmentations
max_n_masks: 1
scale_range: [ 0.1, 1.0 ]

use_specialist_pseudo_masks: false
category_agnostic: true # no-op in this case

n_categories: 2
categories: null
n_images: 500

# dataset
dataset_name: "coca"
split: "train"
train_image_size: 384

# dataloader:
train_dataloader_kwargs:
batch_size: 8
num_workers: 16
pin_memory: true
shuffle: true
# drop_last: true # this is to prevent an error from a batch norm during training, i.e., ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

val_dataloader_kwargs:
batch_size: 1
num_workers: 0
pin_memory: true

# Segmenter configuration
# ["deeplabv3plus_resnet101", "deeplabv3plus_resnet50", "deeplabv3plus_mobilenet"]
segmenter_name: "deeplabv3plus_resnet50"

# optimiser
lr: 0.0005
momentum: 0.9
weight_decay: 0.0002
betas: [0.9, 0.999]
n_iters: 5000 # a very light schedule

iter_eval: 1000
iter_log: 100
iter_curriculum: null
Loading

0 comments on commit abcf534

Please sign in to comment.