You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set the batchsize=256/lr=0.1, but the training result(top1-acc: 77.64) is much lower than paper reported(top1-acc: 78.24)! More details about hyperparameters are listed as below. The epoch setting is converted from the iteration which is mentioned in paper. If we set the batchsize as 256, then there is 5k iteration in 1 epoch. According to the paper, we should decay the learning rate at 200k/5k=40, 400k/5k=80, 500k/5k=100 epoch, and terminate training at 530/5k=106 epoch.
The learning rate is divided by 10 at 200k, 400k, 500k iterations. We terminate training at 530k iterations.
No description provided.
The text was updated successfully, but these errors were encountered: