This folder contains building code for MobileNetV2, based on MobileNetV2: Inverted Residuals and Linear Bottlenecks
This is the timing of MobileNetV1 vs MobileNetV2 using TF-Lite on the large core of Pixel 1 phone.
MACs, also sometimes known as MADDs - the number of multiply-accumulates needed to compute an inference on a single image is a common metric to measure the efficiency of the model.
Below is the graph comparing V2 vs a few selected networks. The size of each blob represents the number of parameters. Note for ShuffleNet there are no published size numbers. We estimate it to be comparable to MobileNetV2 numbers.
Classification Checkpoint | MACs (M) | Parameters (M) | Top 1 Accuracy | Top 5 Accuracy | Mobile CPU (ms) Pixel 1 |
---|---|---|---|---|---|
mobilenet_v2_1.4_224 | 582 | 6.06 | 75.0 | 92.5 | 138.0 |
mobilenet_v2_1.3_224 | 509 | 5.34 | 74.4 | 92.1 | 123.0 |
mobilenet_v2_1.0_224 | 300 | 3.47 | 71.8 | 91.0 | 73.8 |
mobilenet_v2_1.0_192 | 221 | 3.47 | 70.7 | 90.1 | 55.1 |
mobilenet_v2_1.0_160 | 154 | 3.47 | 68.8 | 89.0 | 40.2 |
mobilenet_v2_1.0_128 | 99 | 3.47 | 65.3 | 86.9 | 27.6 |
mobilenet_v2_1.0_96 | 56 | 3.47 | 60.3 | 83.2 | 17.6 |
mobilenet_v2_0.75_224 | 209 | 2.61 | 69.8 | 89.6 | 55.8 |
mobilenet_v2_0.75_192 | 153 | 2.61 | 68.7 | 88.9 | 41.6 |
mobilenet_v2_0.75_160 | 107 | 2.61 | 66.4 | 87.3 | 30.4 |
mobilenet_v2_0.75_128 | 69 | 2.61 | 63.2 | 85.3 | 21.9 |
mobilenet_v2_0.75_96 | 39 | 2.61 | 58.8 | 81.6 | 14.2 |
mobilenet_v2_0.5_224 | 97 | 1.95 | 65.4 | 86.4 | 28.7 |
mobilenet_v2_0.5_192 | 71 | 1.95 | 63.9 | 85.4 | 21.1 |
mobilenet_v2_0.5_160 | 50 | 1.95 | 61.0 | 83.2 | 14.9 |
mobilenet_v2_0.5_128 | 32 | 1.95 | 57.7 | 80.8 | 9.9 |
mobilenet_v2_0.5_96 | 18 | 1.95 | 51.2 | 75.8 | 6.4 |
mobilenet_v2_0.35_224 | 59 | 1.66 | 60.3 | 82.9 | 19.7 |
mobilenet_v2_0.35_192 | 43 | 1.66 | 58.2 | 81.2 | 14.6 |
mobilenet_v2_0.35_160 | 30 | 1.66 | 55.7 | 79.1 | 10.5 |
mobilenet_v2_0.35_128 | 20 | 1.66 | 50.8 | 75.0 | 6.9 |
mobilenet_v2_0.35_96 | 11 | 1.66 | 45.5 | 70.4 | 4.5 |
The numbers above can be reproduced using slim's train_image_classifier
.
Below is the set of parameters that achieves 72.0% for full size MobileNetV2, after about 700K when trained on 8 GPU.
If trained on a single GPU the full convergence is after 5.5M steps. Also note that learning rate and
num_epochs_per_decay both need to be adjusted depending on how many GPUs are being
used due to slim's internal averaging.
--model_name="mobilenet_v2"
--learning_rate=0.045 * NUM_GPUS #slim internally averages clones so we compensate
--preprocessing_name="inception_v2"
--label_smoothing=0.1
--moving_average_decay=0.9999
--batch_size= 96
--num_clones = NUM_GPUS # you can use any number here between 1 and 8 depending on your hardware setup.
--learning_rate_decay_factor=0.98
--num_epochs_per_decay = 2.5 / NUM_GPUS # train_image_classifier does per clone epochs
See this ipython notebook or open and run the network directly in Colaboratory.