Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty grad fix #291

Merged
merged 4 commits into from
Jul 15, 2020
Merged

Empty grad fix #291

merged 4 commits into from
Jul 15, 2020

Conversation

jeffra
Copy link
Collaborator

@jeffra jeffra commented Jul 15, 2020

This fixes a case where there is an imbalance between empty gradients across ranks. Example: rank 0 has 2 parameters with gradients and rank 1 has 1 parameter with gradients and 1 parameter where grads are None. This caused the all-reduce to be imbalanced in size since we we're previously ignoring all grads that were None. Instead now we pad all None gradients with zero tensors. Unfortunately this imbalance did not cause a crash it would cause hanging issues in the first place that tried to access the reduced gradient data itself.

@jeffra jeffra requested review from tjruwase and samyam July 15, 2020 19:19
tests/unit/test_fp16.py Outdated Show resolved Hide resolved
@jeffra jeffra merged commit 376818e into master Jul 15, 2020
@jeffra jeffra deleted the jeffra/empty-grad branch July 15, 2020 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants