What is 'Scale loss by nominal batch_size of 64'? #507

maxmx911 · 2019-09-19T10:12:47Z

Hi, may I ask what is the purpose of this scale loss by nominal batch_size and where did the value '64' come from?

It is from train,py line 273 & line 274

# Scale loss by nominal batch_size of 64
loss *= batch_size / 64

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2019-09-19T21:31:17Z

64 is the nominal batch size of darknet. This way you can use different --batch-size --accumulate combinations to maintain a 64 image batch-size even with smaller graphics cards.

maxmx911 · 2019-09-20T01:25:36Z

Ahh I see, does that mean I should only use a combination of --batch-size and --accumulate that produces 64? I was going through CUSTOM TRAINING EXAMPLE & SINGLE-CLASS TRAINING EXAMPLE, the --batch-size and --accumulate value that you have specified don't multiply to the value '64'.

glenn-jocher · 2019-09-20T09:11:49Z

@maxmx911 you can experiment with different batch sizes also of course.

If you comment out that line, then smaller batch sizes will produce faster training, but with a worse plateau. Larger batch-sizes produce slower but less noisy training and tend to plateau to better results.

The tutorials do have smaller batch sizes, since they are very tiny datasets, i.e. only 16 or 64 images in the dataset.

maxmx911 · 2019-09-20T09:52:27Z

@glenn-jocher I'm sorry I don't really understand, I'm still very new to this.

Will it be the same with having loss *= batch_size / 64 as it is, 'uncommented', while using

--batch-size 64 --accumulate 1

and also

--batch-size 2 --accumulate 32

glenn-jocher · 2019-09-20T11:06:56Z

@maxmx911 yes your two lines are the same as the code is now. Your second line will train slower but will use less GPU memory, allowing you to train on larger images for example.

maxmx911 · 2019-09-20T11:47:39Z

Okay, so if I change the batch size to 12 by doing 4 * 3 = 12

--batch-size 4 --accumulate 3

as I'm training with a small dataset of 100+ images as well, do I need to change anything in this line loss *= batch_size / 64? like changing 64 to 12? Since my batch size is no longer 64 it is 12 now.

glenn-jocher · 2019-09-20T11:48:28Z

@maxmx911 just leave it.

maxmx911 · 2019-09-20T11:56:56Z

@glenn-jocher The purpose of dividing it by 64 is it due to original darknet is configured with 64 batch size, and if I'm doing any batch size other than 64, I divide it with 64 to make my result looks like it is being trained with 64 batch size?

If it ain't so, can you explain a little of why divide it by 64?

Again, I'm sorry for being annoying as I'm new to this and I'm trying to make sense out of it.

glenn-jocher · 2019-09-20T13:33:50Z

@maxmx911 it's simply the darknet default. Take it or leave it, it really depends on your own custom situation. Try it out both ways and use what works best.

glenn-jocher closed this as completed Sep 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

What is 'Scale loss by nominal batch_size of 64'? #507

What is 'Scale loss by nominal batch_size of 64'? #507

maxmx911 commented Sep 19, 2019

glenn-jocher commented Sep 19, 2019

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019 •

edited

Loading

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019

What is 'Scale loss by nominal batch_size of 64'? #507

What is 'Scale loss by nominal batch_size of 64'? #507

Comments

maxmx911 commented Sep 19, 2019

glenn-jocher commented Sep 19, 2019

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019 • edited Loading

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019

maxmx911 commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019

glenn-jocher commented Sep 20, 2019 •

edited

Loading