Skip to content

Commit

Permalink
Bug fix for norm calculation in absence of model parallel group (#551)
Browse files Browse the repository at this point in the history
In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None
  • Loading branch information
samyam authored Nov 23, 2020
1 parent bcd56f9 commit 00c3a25
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion deepspeed/runtime/zero/stage2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1198,7 +1198,7 @@ def _model_parallel_all_reduce(self, tensor, op):
""" Perform all reduce within model parallel group, if any.
"""
if self.model_parallel_group is None:
torch.distributed.all_reduce(tensor=tensor, op=op)
pass
else:
torch.distributed.all_reduce(tensor=tensor,
op=op,
Expand Down

0 comments on commit 00c3a25

Please sign in to comment.