Skip to content

Latest commit

 

History

History
94 lines (72 loc) · 5.48 KB

README.md

File metadata and controls

94 lines (72 loc) · 5.48 KB

Multibox

This is an implementation of the Multibox detection system proposed by Szegedy et al. in Scalable High Quality Object Detection. Currently this repository uses the Inception-Reset-v2 network as the base network. The post classification network is not currently incorporated in this repository, but any classification network can be used.

This repo supports Python 2.7. Checkout the requirements file to make sure you have the necessary packages. TensorFlow r0.11 is required.

The input functions to the model require a specific dataset format. You can create the dataset using the utility functions found here. You'll also need to genertate the priors for the bounding boxes. In the priors.py file you will find convenience functions for generating the priors. For example, assuming you are in a python terminal (in the project directory):

import cPickle as pickle
import priors

aspect_ratios = [1, 2, 3, 1./2, 1./3]
p = priors.generate_priors(aspect_ratios, min_scale=0.1, max_scale=0.95, restrict_to_image_bounds=True)
with open('priors.pkl', 'w') as f:
  pickle.dump(p, f)

Instead of hand defining the aspect ratios, you can use your training dataset to cluster the aspect ratios of the bounding boxes. There is a utility function to do this in the priors.py file.

Next, you'll need to create a configuration file. Checkout the example to see the different settings. Some especially important settings include:

Key Value
NUM_BBOXES_PER_CELL This needs to be set the number of aspect ratios you used to generate the priors. The output layers of the model depend on this number.
MAX_NUM_BBOXES In order to properly pad all the image annotation data in a batch, the maximum number of boxes in a single image must be known.
BATCH_SIZE Depending on your hardware setup, you will need to adjust this parameter so that the network fits in memory.
NUM_TRAIN_EXAMPLES This, along with BATCH_SIZE, is used to compute the number of iterations in an epoch.
NUM_TRAIN_ITERATIONS This is how many iterations to run before stopping.

You'll definitely want to go through the other configuration parameters, but make sure you have the above parameters set correctly.

Now that you have your dataset in tfrecords, you generated priors, and you set up your configuration file, you'll be able to train the detection model. First, you can debug your image augmentation setting by visualizing the inputs to the network:

python visualize_inputs.py \
--tfrecords /Volumes/Untitled/tensorflow_datasets/coco_people/kps/val2000/* \
--config /Volumes/Untitled/models/coco_person_detection/9/config_train.yaml

Once you are ready for training, you should download the pretrained inception-resnet-v2 network and use it as a starting point. Before training the whole network, you need to warmup the detection heads. I call this finetuning. Therefore the training procedure consists of 2 calls to the train.py script. First you finetune:

python train.py \
--tfrecords /Volumes/Untitled/tensorflow_datasets/coco_people/kps/val2000/* \
--priors /Volumes/Untitled/models/coco_person_detection/9/coco_person_priors_7.pkl \
--logdir /Users/GVH/Desktop/multibox_train/finetune \
--config /Users/GVH/Desktop/multibox_train/config_train.yaml \
--pretrained_model /Users/GVH/Desktop/Inception_Models/inception-resnet-v2/inception_resnet_v2_2016_08_30.ckpt \
--fine_tune

Once the detection heads have warmed up, you can train the whole model:

python train.py \
--tfrecords /Volumes/Untitled/tensorflow_datasets/coco_people/kps/val2000/* \
--priors /Volumes/Untitled/models/coco_person_detection/9/coco_person_priors_7.pkl \
--logdir /Users/GVH/Desktop/multibox_train/ \
--config /Users/GVH/Desktop/multibox_train/config_train.yaml \
--pretrained_model /Users/GVH/Desktop/multibox_train/finetune

If you have a validation set, you can visualize the ground truth boxes and the predicted boxes:

python visualize_val.py \
--tfrecords /Volumes/Untitled/tensorflow_datasets/coco_people/kps/val2000/* \
--priors  /Volumes/Untitled/models/coco_person_detection/9/coco_person_priors_7.pkl \
--checkpoint_path /Volumes/Untitled/models/coco_person_detection/9/model.ckpt-300000 \
--config /Users/GVH/Desktop/multibox_train/config_train.yaml

At "application time" you can run the detect script to generate predicted boxes on new images. You can debug your detection setting by using another visualization script:

python visualize_detect.py \
--tfrecords /Volumes/Untitled/tensorflow_datasets/coco_people/kps/val2000/* \
--priors /Volumes/Untitled/models/coco_person_detection/9/coco_person_priors_7.pkl \
--checkpoint_path /Volumes/Untitled/models/coco_person_detection/9/model.ckpt-300000 \
--config /Users/GVH/Desktop/multibox_train/config_detect.yaml
python detect.py \
--tfrecords /Volumes/Untitled/tensorflow_datasets/coco_people/kps/val2000/* \
--priors /Volumes/Untitled/models/coco_person_detection/9/coco_person_priors_7.pkl \
--checkpoint_path /Volumes/Untitled/models/coco_person_detection/9/model.ckpt-300000 \
--save_dir /Users/GVH/Desktop/multibox_train/ \
--config /Users/GVH/Desktop/multibox_train/config_detect.yaml \
--max_iterations 20