You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use ggml to train a small scale model (MNIST size) using ADAM optimizer. I have a question about where to scale the loss function with 1/(minibatch_size). In general, should minibatch_size always be accounted for in the loss function calculation (i.e., the gradient for each param is the average over all minibatch_size samples) ? Sorry, this question may sound naive. Thanks,
Edit: the batch size does not matter as it is the loss function's gradient (which is 1) gets propagated backed. One can always divide the batch size to get loss per data point.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I am trying to use
ggml
to train a small scale model (MNIST size) using ADAM optimizer. I have a question about where to scale the loss function with 1/(minibatch_size). In general, shouldminibatch_size
always be accounted for in the loss function calculation (i.e., the gradient for each param is the average over allminibatch_size
samples) ? Sorry, this question may sound naive. Thanks,Edit: the batch size does not matter as it is the loss function's gradient (which is 1) gets propagated backed. One can always divide the batch size to get loss per data point.
Beta Was this translation helpful? Give feedback.
All reactions