accuracy #2

SophieZhou · 2018-03-16T01:56:38Z

I use the default parameters in your code. But I have not got your results. The results are as follows:

Prec@1 62.320 Prec@5 84.862, prec1 accuracy is only 62 and top5 is only 84.86, I do not know why.

Test: [0/196] Time 5.409 (5.409) Loss 0.8694 (0.8694) Prec@1 79.297 (79.297) Prec@5 92.578 (92.578)
Test: [10/196] Time 0.603 (1.098) Loss 1.4834 (1.1204) Prec@1 60.938 (71.804) Prec@5 85.156 (89.666)
Test: [20/196] Time 2.040 (1.014) Loss 1.2063 (1.1246) Prec@1 76.953 (72.024) Prec@5 87.500 (89.844)
Test: [30/196] Time 0.090 (0.925) Loss 1.1544 (1.1004) Prec@1 67.578 (72.228) Prec@5 91.406 (90.373)
Test: [40/196] Time 0.090 (0.895) Loss 1.1766 (1.1700) Prec@1 69.141 (69.769) Prec@5 91.406 (90.139)
Test: [50/196] Time 0.139 (0.864) Loss 0.8123 (1.1642) Prec@1 79.297 (69.447) Prec@5 95.703 (90.640)
Test: [60/196] Time 0.145 (0.877) Loss 1.4835 (1.1626) Prec@1 60.938 (69.454) Prec@5 90.234 (90.843)
Test: [70/196] Time 0.501 (0.863) Loss 1.1081 (1.1471) Prec@1 72.266 (70.054) Prec@5 91.797 (91.065)
Test: [80/196] Time 1.244 (0.866) Loss 2.0545 (1.1744) Prec@1 50.000 (69.517) Prec@5 79.297 (90.615)
Test: [90/196] Time 2.430 (0.871) Loss 2.7312 (1.2613) Prec@1 37.891 (67.801) Prec@5 69.141 (89.423)
Test: [100/196] Time 0.107 (0.851) Loss 2.3366 (1.3372) Prec@1 42.969 (66.286) Prec@5 73.047 (88.285)
Test: [110/196] Time 0.100 (0.852) Loss 1.3854 (1.3681) Prec@1 67.578 (65.819) Prec@5 86.328 (87.767)
Test: [120/196] Time 0.099 (0.847) Loss 2.1421 (1.3998) Prec@1 53.516 (65.357) Prec@5 75.391 (87.206)
Test: [130/196] Time 0.653 (0.844) Loss 1.3761 (1.4418) Prec@1 67.188 (64.474) Prec@5 87.891 (86.650)
Test: [140/196] Time 0.102 (0.834) Loss 1.7194 (1.4745) Prec@1 58.984 (63.860) Prec@5 82.031 (86.212)
Test: [150/196] Time 0.096 (0.832) Loss 1.7810 (1.5061) Prec@1 66.016 (63.351) Prec@5 81.250 (85.741)
Test: [160/196] Time 0.468 (0.830) Loss 1.4580 (1.5287) Prec@1 69.141 (62.963) Prec@5 85.156 (85.355)
Test: [170/196] Time 1.068 (0.833) Loss 1.2060 (1.5562) Prec@1 69.922 (62.358) Prec@5 90.234 (84.937)
Test: [180/196] Time 0.259 (0.826) Loss 1.4454 (1.5751) Prec@1 59.766 (61.991) Prec@5 90.234 (84.647)
Test: [190/196] Time 0.212 (0.827) Loss 1.5322 (1.5684) Prec@1 57.812 (62.089) Prec@5 88.281 (84.757)

Prec@1 62.320 Prec@5 84.862

ericsun99 · 2018-03-16T06:04:14Z

Hi, this is my test log,pls have a reference.
Test: [0/98] Time 99.469 (99.469) Loss 0.6380 (0.6380) Prec@1 82.812 (82.812) Prec@5 95.703 (95.703)
Test: [10/98] Time 0.166 (9.192) Loss 0.8281 (0.8153) Prec@1 79.102 (78.516) Prec@5 93.750 (93.821)
Test: [20/98] Time 0.222 (4.897) Loss 0.7882 (0.8287) Prec@1 77.930 (78.125) Prec@5 94.531 (94.085)
Test: [30/98] Time 5.631 (3.810) Loss 1.0537 (0.8360) Prec@1 72.656 (77.640) Prec@5 92.773 (94.418)
Test: [40/98] Time 0.143 (4.345) Loss 1.5852 (0.8507) Prec@1 62.891 (77.553) Prec@5 86.328 (94.150)
Test: [50/98] Time 0.159 (3.524) Loss 1.2438 (0.9598) Prec@1 68.555 (75.314) Prec@5 89.453 (92.785)
Test: [60/98] Time 0.145 (3.394) Loss 1.7359 (1.0162) Prec@1 54.297 (74.318) Prec@5 81.445 (91.983)
Test: [70/98] Time 0.231 (3.321) Loss 1.1273 (1.0696) Prec@1 73.047 (73.228) Prec@5 91.211 (91.299)
Test: [80/98] Time 0.154 (3.168) Loss 1.3778 (1.1144) Prec@1 68.359 (72.377) Prec@5 88.477 (90.688)
Test: [90/98] Time 0.940 (3.096) Loss 1.2233 (1.1465) Prec@1 69.336 (71.585) Prec@5 91.797 (90.284)

Prec@1 71.806 Prec@5 90.410

sunwillz · 2018-03-18T13:13:54Z

hello，thanks for your sharing,I am not got your results too, could you please tell me the training environment such as use how many GPUs and it took how much time to train?

austingg · 2018-04-18T03:50:43Z

the mobilenetv2 has change the top1 accuracy to 72.0% in the latest paper.

blueardour · 2018-06-15T01:12:53Z

I achived 66% top1 and 88% top5 with SGD.
The initial lr is 0.045 and exert a plateau detection and drop stragegy. However I could not get the 72% result. Thus I'm urge to know how to gain the paper accuracy. Any experiences are welcome.

Coderx7 · 2018-09-16T17:29:26Z

@blueardour ,@ericsun99 : same here, I cant get passed 66 top 1 with sgd.
Are you using the very same script and hyperparameters?
(by the way how many epochs does it need to reach 72% top1?)
Any help is greatly appreciated.

blueardour · 2018-09-17T02:03:43Z

@Coderx7 Hi, I had another try to seek luck which get 68% top1 and 88% top5. The only hyperparameter different with my last one was the mini-batch size as 256. Still it is lower than the 72% top1 accuracy.

In the training, the accuracy became stable after 3 days (reached 68% top1) and I continued the training, even after 10 days, the accuracy did not seem to perform better. I employed a plateau learning rate decrease method. Initial lr was 0.01 and finally it decreased to 1e-8. By the way I used RandomResizedCrop and RandomHorizontalFlip for data augment.

What's interesting is all the image classification network I trained, for example the resnet18, resnet50, xception met the same problem. The accuracy I got was always 2 percents lower than those from the paper. Sigh~~~

I guess it might not be caused by gradient decrease method or initial weight settings but probably the data augment. Another important factor is the mini-batch size.

Any tricks to overcome the last few percents accuracy are welcomed.

blueardour · 2018-10-12T02:02:14Z

A fix step learning rate policy helps. Decreasing the lr by 0.98 each epoch improved the accuracy 1~2 points.

zeyu-liu · 2018-12-19T04:05:12Z

A fix step learning rate policy helps. Decreasing the lr by 0.98 each epoch improved the accuracy 1~2 points.

Did you reproduce the model with 72% accuracy? If yes, could you share the hyper-parameter settings?

blueardour · 2018-12-20T01:01:41Z

No, I didn't reach the 72%, but somewhat nearly. As the training is several month ago, I didn't remember exactly the precision. As far as I could remember it was about 71%, less than 1% than the authors's.

The key point is to decrease the lr slowly, 0.98 for each epoch and to wait a long time. I quickly reached 68% in the first three days, however to obtain the 71% it cost another three days.

CF2220160244 · 2018-12-20T03:00:30Z

Hello @blueardour , your training skill is excellent. I train with a 1080Ti, batchsize=96, lr=0.045, weight decay=0.00004, and decrease the lr 0.98 for each epoch, after 2 days only get 67%, can you tell me your GPU, batchsize, and the initial lr to help me.
I am a student in beijing institute of technology. Thank you very much!

blueardour · 2018-12-22T02:16:12Z

I tried both 128 and 256 for batchsize, 5e-4, 5e-5 for weight decay. It seemed no benefit for the accuracy.
Initial lr is the same with 0.045, decrease 0.98 for every epoch.

I trained on P100. As I mentioned, to reach 68% accuracy was easy, the final 2~3% precision cost another several days. To obtain more precision, I advise to spend more time on training if you think it is worthwhile.

CF2220160244 · 2018-12-22T06:49:31Z

thank you very much! @blueardour

Coderx7 · 2018-12-22T07:08:42Z

@CF2220160244 : hey would you do us a favor and keep up updated about how your try turned out ?

itsliupeng · 2019-01-04T11:58:58Z

mobilenet_v2 1.0 top1: 0.716
mobilenet_v2 1.4 top1: 0.749

80 1080Ti, label smoothing, inception_preprocessing, epoch 120, cosine lr

Coderx7 · 2019-01-04T13:02:22Z

@itsliupeng what was your learning rate and batch-size?
80!! 1080TI ?
could you kindly also share your training script and logs?
what was your pytorch version by the way?

itsliupeng · 2019-01-07T08:20:31Z

@itsliupeng what was your learning rate and batch-size?
80!! 1080TI ?
could you kindly also share your training script and logs?
what was your pytorch version by the way?

Sorry, I don't used PyTorch. I use Horovod + TensorFlow. 8 machines, 8 1080Ti GPUs per machine.
Batch size is 64 per GPU, so the total batch size is 4k.

The MobileNet_v1 can also achieve top1 0.7323. But I cannot repeat the top1 in ShuffleNet V2 thesis.

Coderx7 · 2019-01-07T09:43:09Z

@itsliupeng Thanks a lot for the further clarification it helps a lot. by the way, could you share your tensorflow training script? Its greatly appreciated

itsliupeng · 2019-01-07T12:58:33Z

@Coderx7
Sorry, the code is based on our inner framework, a wrapper of Horovod and Tensorflow. But it has no tricks. Cosine lr and label smoothing are just learned from Mxnet https://gluon-cv.mxnet.io/model_zoo/classification.html, https://arxiv.org/abs/1812.01187.

Coderx7 · 2019-01-07T14:24:07Z

@itsliupeng : Thanks a lot :) I really appreciate your kind and helpful response.

mathmanu · 2019-01-16T05:22:42Z

@itsliupeng Thankyou for sharing this information. Being able to train MobileNets in 120 epochs is a wonderful thing.
I have couple of questions.

What was the initial learning rate?
How was the weight update after backpropagation? Was it one weight update for the entire 4K image batch together as usually done in PyTorch or did your framework use some other kind of asynchronous update that is specific to tensorflow?

itsliupeng · 2019-01-17T02:22:29Z

@mathmanu

Learning rate is 5 epochs linear warm up starting from 0 to 1.6, then use cosine lr.
No special, just like PyTorch DataParallel, it's synchronous update every batch. I use SGD with momentum 0.9

mathmanu · 2019-01-17T08:54:45Z

Thanks. I have read about the use of high learning rates (0.5 or 0.6) in ShuffleNetV2 and Squeeze&Excite papers - but this is even higher. Motivates me to try it. May be warmup is the key to use such high rates.

mathmanu · 2019-01-17T14:14:52Z

@itsliupeng I have one doubt. In the MobileNetV2 paper, the learning rate is still kept at 0.045 even though there are 16 (asynchronous) GPUs and with each having a batch size of 96. My question is, why isn't the learning rate increased like you have done?

"MobileNetV2: Inverted Residuals and Linear Bottlenecks", https://arxiv.org/pdf/1801.04381.pdf
6.1. ImageNet Classification
Training setup We train our models using
TensorFlow[31]. We use the standard RMSPropOptimizer with both decay and momentum set to 0.9.
We use batch normalization after every layer, and the
standard weight decay is set to 0.00004. Following
MobileNetV1[27] setup we use initial learning rate of
0.045, and learning rate decay rate of 0.98 per epoch.
We use 16 GPU asynchronous workers, and a batch size
of 96.

mathmanu · 2019-01-17T16:54:04Z

Never mind, I read about asynchronous update here:
https://blog.skymind.ai/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks/
The learning rate that they used is probably applied to the gradient within each worker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accuracy #2

accuracy #2

SophieZhou commented Mar 16, 2018

ericsun99 commented Mar 16, 2018

sunwillz commented Mar 18, 2018

austingg commented Apr 18, 2018

blueardour commented Jun 15, 2018

Coderx7 commented Sep 16, 2018

blueardour commented Sep 17, 2018 •

edited

Loading

blueardour commented Oct 12, 2018

zeyu-liu commented Dec 19, 2018

blueardour commented Dec 20, 2018

CF2220160244 commented Dec 20, 2018

blueardour commented Dec 22, 2018

CF2220160244 commented Dec 22, 2018

Coderx7 commented Dec 22, 2018

itsliupeng commented Jan 4, 2019

Coderx7 commented Jan 4, 2019 •

edited

Loading

itsliupeng commented Jan 7, 2019

Coderx7 commented Jan 7, 2019

itsliupeng commented Jan 7, 2019

Coderx7 commented Jan 7, 2019

mathmanu commented Jan 16, 2019 •

edited

Loading

itsliupeng commented Jan 17, 2019

mathmanu commented Jan 17, 2019

mathmanu commented Jan 17, 2019

mathmanu commented Jan 17, 2019

accuracy #2

accuracy #2

Comments

SophieZhou commented Mar 16, 2018

ericsun99 commented Mar 16, 2018

sunwillz commented Mar 18, 2018

austingg commented Apr 18, 2018

blueardour commented Jun 15, 2018

Coderx7 commented Sep 16, 2018

blueardour commented Sep 17, 2018 • edited Loading

blueardour commented Oct 12, 2018

zeyu-liu commented Dec 19, 2018

blueardour commented Dec 20, 2018

CF2220160244 commented Dec 20, 2018

blueardour commented Dec 22, 2018

CF2220160244 commented Dec 22, 2018

Coderx7 commented Dec 22, 2018

itsliupeng commented Jan 4, 2019

Coderx7 commented Jan 4, 2019 • edited Loading

itsliupeng commented Jan 7, 2019

Coderx7 commented Jan 7, 2019

itsliupeng commented Jan 7, 2019

Coderx7 commented Jan 7, 2019

mathmanu commented Jan 16, 2019 • edited Loading

itsliupeng commented Jan 17, 2019

mathmanu commented Jan 17, 2019

mathmanu commented Jan 17, 2019

mathmanu commented Jan 17, 2019

blueardour commented Sep 17, 2018 •

edited

Loading

Coderx7 commented Jan 4, 2019 •

edited

Loading

mathmanu commented Jan 16, 2019 •

edited

Loading