Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Reduce the training loss #84

Closed
Jiayuan-Gu opened this issue Nov 1, 2018 · 1 comment
Closed

Reduce the training loss #84

Jiayuan-Gu opened this issue Nov 1, 2018 · 1 comment

Comments

@Jiayuan-Gu
Copy link

Jiayuan-Gu commented Nov 1, 2018

❓ Questions and Help

Is it necessary to reduce the training losses through multiple processes in trainer.py?

I think current reduction on losses only work for logging instead of backward.

Could anyone explain about which parts of codes average the gradients from all processes when doing distributed training?

@fmassa
Copy link
Contributor

fmassa commented Nov 1, 2018

The reduction in the losses is indeed only needed for logging purposes, and not for training.

The gradient averaging is handled by DistributedDataParallel, and is done automatically for us when we call .backward.

I'm closing this issue as I believe I have answered your question, but let me know if something isn't clear.

@fmassa fmassa closed this as completed Nov 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants