Bug fix for norm calculation in absence of model parallel group (#551)

In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None
microsoft · Nov 23, 2020 · 00c3a25 · 00c3a25
1 parent bcd56f9
commit 00c3a25
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/deepspeed/runtime/zero/stage2.py b/deepspeed/runtime/zero/stage2.py
@@ -1198,7 +1198,7 @@ def _model_parallel_all_reduce(self, tensor, op):
         """ Perform all reduce within model parallel group, if any.
         """
         if self.model_parallel_group is None:
-            torch.distributed.all_reduce(tensor=tensor, op=op)
+            pass
         else:
             torch.distributed.all_reduce(tensor=tensor,
                                          op=op,