Training details #14

jjhooon · 2024-11-06T21:25:09Z

Thank you for sharing wonderful work! I have some questions about training details.

In the paper, the entire model is trained on a single A6000 GPU for 40,000 iterations with batch size 16. What is exact meaning of iteration in this line? Was it expressed as 40,000 iterations with a batch size of 16 because there are approximately 600,000 scenes? Because there are 8 million images within 600,000 scenes, if the batch size is set to 16, the number of iterations would be much greater than 40,000. This makes me confused about the meaning behind this expression. Also, there is no number of iterations in config file.
In addition, this model is trained with 20 epochs? I am curious because the paper only provides explanations regarding the number of iterations and does not mention epochs.
The other one is the training time. When I run the model in 4 RTX 4090 with batch size 8, it requires much more than times you mentioned in the paper, In my case, it requires almost 3 days for one epoch.b
By running the official code, not modifying training scheme, model is converged very quickly. Starting from the first validation at 500 steps, the following metric results are obtained. Is this a normal phenomenon?

Provide feedback