Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is 'Scale loss by nominal batch_size of 64'? #507

Closed
maxmx911 opened this issue Sep 19, 2019 · 9 comments
Closed

What is 'Scale loss by nominal batch_size of 64'? #507

maxmx911 opened this issue Sep 19, 2019 · 9 comments

Comments

@maxmx911
Copy link

Hi, may I ask what is the purpose of this scale loss by nominal batch_size and where did the value '64' come from?

It is from train,py line 273 & line 274

# Scale loss by nominal batch_size of 64
loss *= batch_size / 64
@glenn-jocher
Copy link
Member

64 is the nominal batch size of darknet. This way you can use different --batch-size --accumulate combinations to maintain a 64 image batch-size even with smaller graphics cards.

@maxmx911
Copy link
Author

Ahh I see, does that mean I should only use a combination of --batch-size and --accumulate that produces 64? I was going through CUSTOM TRAINING EXAMPLE & SINGLE-CLASS TRAINING EXAMPLE, the --batch-size and --accumulate value that you have specified don't multiply to the value '64'.

@glenn-jocher
Copy link
Member

@maxmx911 you can experiment with different batch sizes also of course.

If you comment out that line, then smaller batch sizes will produce faster training, but with a worse plateau. Larger batch-sizes produce slower but less noisy training and tend to plateau to better results.

The tutorials do have smaller batch sizes, since they are very tiny datasets, i.e. only 16 or 64 images in the dataset.

@maxmx911
Copy link
Author

@glenn-jocher I'm sorry I don't really understand, I'm still very new to this.

Will it be the same with having loss *= batch_size / 64 as it is, 'uncommented', while using

--batch-size 64 --accumulate 1 

and also

--batch-size 2 --accumulate 32

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 20, 2019

@maxmx911 yes your two lines are the same as the code is now. Your second line will train slower but will use less GPU memory, allowing you to train on larger images for example.

@maxmx911
Copy link
Author

Okay, so if I change the batch size to 12 by doing 4 * 3 = 12

--batch-size 4 --accumulate 3

as I'm training with a small dataset of 100+ images as well, do I need to change anything in this line loss *= batch_size / 64? like changing 64 to 12? Since my batch size is no longer 64 it is 12 now.

@glenn-jocher
Copy link
Member

@maxmx911 just leave it.

@maxmx911
Copy link
Author

@glenn-jocher The purpose of dividing it by 64 is it due to original darknet is configured with 64 batch size, and if I'm doing any batch size other than 64, I divide it with 64 to make my result looks like it is being trained with 64 batch size?

If it ain't so, can you explain a little of why divide it by 64?

Again, I'm sorry for being annoying as I'm new to this and I'm trying to make sense out of it.

@glenn-jocher
Copy link
Member

@maxmx911 it's simply the darknet default. Take it or leave it, it really depends on your own custom situation. Try it out both ways and use what works best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants