Questions about learning rate decay #20

Jiaolong · 2019-12-13T03:26:09Z

In the paper section 4.1, it is said:
ηp = η0(1 + αp)−β, where p is the training progress changing from 0 to 1

I understand p = iter_num/max_iters

However, in the following pytorch code:

Line 3 in f788906

lr = lr * (1 + gamma * iter_num) ** (-power)

p = iter_num is used instead of p = iter_num/max_iters

Can you explain about this? Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback