This is an (re-)implementation of DeepLab v2 (ResNet-101) in TensorFlow for semantic image segmentation on the PASCAL VOC 2012 dataset. We refer to DrSleep's implementation (Many thanks!). We do not use tf-to-caffe packages like kaffe so you only need TensorFlow 1.3.0+ to run this code.
The deeplab pre-trained ResNet-101 ckpt files (pre-trained on MSCOCO) are provided by DrSleep -- here. Thanks again!
Created by Zhengyang Wang and Shuiwang Ji at Texas A&M University.
05/08/2018:
Our work based on this implementation has led to a paper accepted for long presentation in KDD2018. You may find the code of the work in this branch.
If using this code , please cite our paper.
@inproceedings{wang2018smoothed,
title={Smoothed Dilated Convolutions for Improved Dense Prediction},
author={Wang, Zhengyang and Ji, Shuiwang},
booktitle={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
pages={2486--2495},
year={2018},
organization={ACM}
}
02/02/2018:
- A clarification:
As reported, ResNet pre-trained models (NOT deeplab) from Tensorflow were trained using the channel order RGB instead BGR (https://github.com/tensorflow/models/blob/master/research/slim/preprocessing/vgg_preprocessing.py).
Thus, the most correct way to apply them is to use the same order RGB. The original code is for pre-trained models from Caffe and uses BGR. To correct this, when you use res101 and res50, you need to delete line 116 and line 117 in utils/image_reader.py to remove the RGB to BGR step when reading images. Then, modify line 77 in utils/label_utils.py to remove the BGR to RGB step in the inverse process for image visualization. At last, you need to change the IMAGE_MEAN by swapping the first and the third values in line 26 and line 26 for non_msc and msc training, respectively.
However, this change actually does not affect the performance a lot, proved by discussion in issue 30. In this task, the size of training patches is different from that in ImageNet. And the set of images is different. The IMAGE_MEAN is never accurate. I guess that simply using IMAGE_MEAN=[127.5, 127.5, 127.5] will work as well.
12/13/2017:
- Now the test code will output the mIoU as well as the IoU for each class.
12/12/2017:
-
Add 'predict' function, you can use '--option=predict' to save your outputs now (both the true prediction where each pixel is between 0 and 20 and the visual one where each class has its own color).
-
Add multi-scale training, testing and predicting. Check main_msc.py and model_msc.py and use them just as main.py and model.py.
-
Add plot_training_curve.py to use the log.txt to make plots of training curve.
-
Now this is a 'full' (re-)implementation of DeepLab v2 (ResNet-101) in TensorFlow. Thank you for the support. You are welcome to report your settings and results as well as any bug!
11/09/2017:
-
The new version enables using original ImageNet pre-trained ResNet models (without pre-training on MSCOCO). You may change arguments ('encoder_name' and 'pretrain_file') in main.py to use corresponding pre-trained models. The original pre-trained ResNet-101 ckpt files are provided by tensorflow officially -- res101 and res50.
-
To help those who want to use this model on the CityScapes dataset, I shared the corresponding txt files and the python file which generates them. Note that you need to use tools here to generate labels with trainID first. Hope it would be helpful. Do not forget to change IMG_MEAN in model.py and other settings in main.py.
-
'is_training' argument is removed and 'self._batch_norm' changes. Basically, for a small batch size, it is better to keep the statistics of the BN layers (running means and variances) frozen, and to not update the values provided by the pre-trained model by setting 'is_training=False'. Note that is_training=False still updates BN parameters gamma (scale) and beta (offset) if they are presented in var_list of the optimiser definition. Set 'trainable=False' in BN fuctions to remove them from trainable_variables.
-
Add 'phase' argument in network.py for future development. 'phase=True' means training. It is mainly for controlling batch normalization (if any) in the non-pre-trained part.
Example: If you have a batch normalization layer in the decoder, you should use
outputs = self._batch_norm(inputs, name='g_bn1', is_training=self.phase, activation_fn=tf.nn.relu, trainable=True)
-
Some changes to make the code more readable and easy to modify for future research.
-
I plan to add 'predict' function to enable saving predicted results for offline evaluation, post-processing, etc.
Python 3.5
tensorflow-gpu 1.3.0
All network hyperparameters are configured in main.py.
num_steps: how many iterations to train
save_interval: how many steps to save the model
random_seed: random seed for tensorflow
weight_decay: l2 regularization parameter
learning_rate: initial learning rate
power: parameter for poly learning rate
momentum: momentum
encoder_name: name of pre-trained model, res101, res50 or deeplab
pretrain_file: the initial pre-trained model file for transfer learning
data_list: training data list file
grad_update_every (msc only): accumulate the gradients for how many steps before updating weights. Note that in the msc case, this is actually the true training batch size.
valid_step: checkpoint number for testing/validation
valid_num_steps: = number of testing/validation samples
valid_data_list: testing/validation data list file
out_dir: directory for saving prediction outputs
test_step: checkpoint number for prediction
test_num_steps: = number of prediction samples
test_data_list: prediction data list filename
visual: whether to save visualizable prediction outputs
data_dir: data directory
batch_size: training batch size
input height: height of input image
input width: width of input image
num_classes: number of classes
ignore_label: label pixel value that should be ignored
random_scale: whether to perform random scaling data-augmentation
random_mirror: whether to perform random left-right flipping data-augmentation
modeldir: where to store saved models
logfile: where to store training log
logdir: where to store log for tensorboard
After configuring the network, we can start to train. Run
python main.py
The training of Deeplab v2 ResNet will start.
We employ tensorboard for visualization.
tensorboard --logdir=log --port=6006
You may visualize the graph of the model and (training images + groud truth labels + predicted labels).
To visualize the training loss curve, write your own script to make use of the training log.
Select a checkpoint to test/validate your model in terms of pixel accuracy and mean IoU.
Fill the valid_step in main.py with the checkpoint you want to test. Change valid_num_steps and valid_data_list accordingly. Run
python main.py --option=test
The final output includes pixel accuracy and mean IoU.
Run
python main.py --option=predict
The outputs will be saved in the 'output' folder.