Reduce the training loss #84

Jiayuan-Gu · 2018-11-01T01:16:26Z

❓ Questions and Help

Is it necessary to reduce the training losses through multiple processes in trainer.py?

I think current reduction on losses only work for logging instead of backward.

Could anyone explain about which parts of codes average the gradients from all processes when doing distributed training?

fmassa · 2018-11-01T10:36:47Z

The reduction in the losses is indeed only needed for logging purposes, and not for training.

The gradient averaging is handled by DistributedDataParallel, and is done automatically for us when we call .backward.

I'm closing this issue as I believe I have answered your question, but let me know if something isn't clear.

fmassa closed this as completed Nov 1, 2018