-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accuracy #2
Comments
Hi, this is my test log,pls have a reference.
|
hello,thanks for your sharing,I am not got your results too, could you please tell me the training environment such as use how many GPUs and it took how much time to train? |
the mobilenetv2 has change the top1 accuracy to 72.0% in the latest paper. |
I achived 66% top1 and 88% top5 with SGD. |
@blueardour ,@ericsun99 : same here, I cant get passed 66 top 1 with sgd. |
@Coderx7 Hi, I had another try to seek luck which get 68% top1 and 88% top5. The only hyperparameter different with my last one was the mini-batch size as 256. Still it is lower than the 72% top1 accuracy. In the training, the accuracy became stable after 3 days (reached 68% top1) and I continued the training, even after 10 days, the accuracy did not seem to perform better. I employed a plateau learning rate decrease method. Initial lr was 0.01 and finally it decreased to 1e-8. By the way I used RandomResizedCrop and RandomHorizontalFlip for data augment. What's interesting is all the image classification network I trained, for example the resnet18, resnet50, xception met the same problem. The accuracy I got was always 2 percents lower than those from the paper. Sigh~~~ I guess it might not be caused by gradient decrease method or initial weight settings but probably the data augment. Another important factor is the mini-batch size. Any tricks to overcome the last few percents accuracy are welcomed. |
A fix step learning rate policy helps. Decreasing the lr by 0.98 each epoch improved the accuracy 1~2 points. |
Did you reproduce the model with 72% accuracy? If yes, could you share the hyper-parameter settings? |
No, I didn't reach the 72%, but somewhat nearly. As the training is several month ago, I didn't remember exactly the precision. As far as I could remember it was about 71%, less than 1% than the authors's. The key point is to decrease the lr slowly, 0.98 for each epoch and to wait a long time. I quickly reached 68% in the first three days, however to obtain the 71% it cost another three days. |
Hello @blueardour , your training skill is excellent. I train with a 1080Ti, batchsize=96, lr=0.045, weight decay=0.00004, and decrease the lr 0.98 for each epoch, after 2 days only get 67%, can you tell me your GPU, batchsize, and the initial lr to help me. |
I tried both 128 and 256 for batchsize, 5e-4, 5e-5 for weight decay. It seemed no benefit for the accuracy. I trained on P100. As I mentioned, to reach 68% accuracy was easy, the final 2~3% precision cost another several days. To obtain more precision, I advise to spend more time on training if you think it is worthwhile. |
thank you very much! @blueardour |
@CF2220160244 : hey would you do us a favor and keep up updated about how your try turned out ? |
mobilenet_v2 1.0 top1: 0.716 80 1080Ti, label smoothing, inception_preprocessing, epoch 120, cosine lr |
@itsliupeng what was your learning rate and batch-size? |
Sorry, I don't used PyTorch. I use Horovod + TensorFlow. 8 machines, 8 1080Ti GPUs per machine. The MobileNet_v1 can also achieve top1 0.7323. But I cannot repeat the top1 in ShuffleNet V2 thesis. |
@itsliupeng Thanks a lot for the further clarification it helps a lot. by the way, could you share your tensorflow training script? Its greatly appreciated |
@Coderx7 |
@itsliupeng : Thanks a lot :) I really appreciate your kind and helpful response. |
@itsliupeng Thankyou for sharing this information. Being able to train MobileNets in 120 epochs is a wonderful thing.
|
Thanks. I have read about the use of high learning rates (0.5 or 0.6) in ShuffleNetV2 and Squeeze&Excite papers - but this is even higher. Motivates me to try it. May be warmup is the key to use such high rates. |
@itsliupeng I have one doubt. In the MobileNetV2 paper, the learning rate is still kept at 0.045 even though there are 16 (asynchronous) GPUs and with each having a batch size of 96. My question is, why isn't the learning rate increased like you have done? "MobileNetV2: Inverted Residuals and Linear Bottlenecks", https://arxiv.org/pdf/1801.04381.pdf |
Never mind, I read about asynchronous update here: |
I use the default parameters in your code. But I have not got your results. The results are as follows:
Test: [0/196] Time 5.409 (5.409) Loss 0.8694 (0.8694) Prec@1 79.297 (79.297) Prec@5 92.578 (92.578)
Test: [10/196] Time 0.603 (1.098) Loss 1.4834 (1.1204) Prec@1 60.938 (71.804) Prec@5 85.156 (89.666)
Test: [20/196] Time 2.040 (1.014) Loss 1.2063 (1.1246) Prec@1 76.953 (72.024) Prec@5 87.500 (89.844)
Test: [30/196] Time 0.090 (0.925) Loss 1.1544 (1.1004) Prec@1 67.578 (72.228) Prec@5 91.406 (90.373)
Test: [40/196] Time 0.090 (0.895) Loss 1.1766 (1.1700) Prec@1 69.141 (69.769) Prec@5 91.406 (90.139)
Test: [50/196] Time 0.139 (0.864) Loss 0.8123 (1.1642) Prec@1 79.297 (69.447) Prec@5 95.703 (90.640)
Test: [60/196] Time 0.145 (0.877) Loss 1.4835 (1.1626) Prec@1 60.938 (69.454) Prec@5 90.234 (90.843)
Test: [70/196] Time 0.501 (0.863) Loss 1.1081 (1.1471) Prec@1 72.266 (70.054) Prec@5 91.797 (91.065)
Test: [80/196] Time 1.244 (0.866) Loss 2.0545 (1.1744) Prec@1 50.000 (69.517) Prec@5 79.297 (90.615)
Test: [90/196] Time 2.430 (0.871) Loss 2.7312 (1.2613) Prec@1 37.891 (67.801) Prec@5 69.141 (89.423)
Test: [100/196] Time 0.107 (0.851) Loss 2.3366 (1.3372) Prec@1 42.969 (66.286) Prec@5 73.047 (88.285)
Test: [110/196] Time 0.100 (0.852) Loss 1.3854 (1.3681) Prec@1 67.578 (65.819) Prec@5 86.328 (87.767)
Test: [120/196] Time 0.099 (0.847) Loss 2.1421 (1.3998) Prec@1 53.516 (65.357) Prec@5 75.391 (87.206)
Test: [130/196] Time 0.653 (0.844) Loss 1.3761 (1.4418) Prec@1 67.188 (64.474) Prec@5 87.891 (86.650)
Test: [140/196] Time 0.102 (0.834) Loss 1.7194 (1.4745) Prec@1 58.984 (63.860) Prec@5 82.031 (86.212)
Test: [150/196] Time 0.096 (0.832) Loss 1.7810 (1.5061) Prec@1 66.016 (63.351) Prec@5 81.250 (85.741)
Test: [160/196] Time 0.468 (0.830) Loss 1.4580 (1.5287) Prec@1 69.141 (62.963) Prec@5 85.156 (85.355)
Test: [170/196] Time 1.068 (0.833) Loss 1.2060 (1.5562) Prec@1 69.922 (62.358) Prec@5 90.234 (84.937)
Test: [180/196] Time 0.259 (0.826) Loss 1.4454 (1.5751) Prec@1 59.766 (61.991) Prec@5 90.234 (84.647)
Test: [190/196] Time 0.212 (0.827) Loss 1.5322 (1.5684) Prec@1 57.812 (62.089) Prec@5 88.281 (84.757)
The text was updated successfully, but these errors were encountered: