minibatch size and ADAM optimizer #695

bssrdf · 2024-01-11T15:54:42Z

bssrdf
Jan 11, 2024

Hi,

I am trying to use ggml to train a small scale model (MNIST size) using ADAM optimizer. I have a question about where to scale the loss function with 1/(minibatch_size). In general, should minibatch_size always be accounted for in the loss function calculation (i.e., the gradient for each param is the average over all minibatch_size samples) ? Sorry, this question may sound naive. Thanks,

Edit: the batch size does not matter as it is the loss function's gradient (which is 1) gets propagated backed. One can always divide the batch size to get loss per data point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minibatch size and ADAM optimizer #695

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

minibatch size and ADAM optimizer #695

bssrdf Jan 11, 2024

Replies: 0 comments

bssrdf
Jan 11, 2024