With Zero-Optimizer Stage 3, is it recommended not to use intra-layer model parallelism (Megatron-ML)? #948

SantoshGuptaML · 2021-04-12T19:45:41Z

SantoshGuptaML
Apr 12, 2021

Since the Zero-Optimizer at Stage 3 also partitions the model parameters, my intuition is that intra-layer model parallelism would not not increase memory efficiency, and maybe even interfere with in the Zero-optimizer's efficiency, since it seems now that every GPU has a very specific set of operations, and these operations are now evenly distributed (memory wise) among the GPUs.

adhithadias · 2023-06-27T17:02:21Z

adhithadias
Jun 27, 2023

Hi @SantoshGuptaML! Could you please direct me to an example where ZeRO is used with Megatron-LM (model parallelism)? Does the Megatron-Deepspeed repository use ZeRO? I didn't see it using the ZeRO stage in the examples in the Megatron-Deepspeed repo.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With Zero-Optimizer Stage 3, is it recommended not to use intra-layer model parallelism (Megatron-ML)? #948

{{title}}

Replies: 1 comment

{{title}}

Select a reply

With Zero-Optimizer Stage 3, is it recommended not to use intra-layer model parallelism (Megatron-ML)? #948

SantoshGuptaML Apr 12, 2021

Replies: 1 comment

adhithadias Jun 27, 2023

SantoshGuptaML
Apr 12, 2021

adhithadias
Jun 27, 2023