-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Details about pipeline parallelism implementation in DeepSpeed #1110
Comments
Hi @ParamsRaman,
|
Please re-open if not resolved. |
@jeffra @ShadenSmith
Would be really helpful if you could summarize the latest status on this. Thanks! |
@ParamsRaman I think #980 this PR says Pipeline parallelism is incompatible with ZeRO2 and 3. This(980) PR was merged later than the one (677) you mentioned. |
@hyunwoongko Still a bit confused. Do you mean later this PR was merged => PP + Zero2/3 works now in DeepSpeed? Or is it still open? |
Nope. PP + ZeRO 2/3 is impossible. PP needs to accumulate gradients, but ZeRO2 needs to chunk gradients. Therefore, they are not compatible. Even if it can be implemented, there is no real performance improvement. |
Hi,
I had some questions about the pipeline parallelism implementation in DeepSpeed. Can someone help shed some information on the following?
From among the following types of pipeline scheduling, which one does DeepSpeed implement in its code?
(a) Figure 2 in PipeDream-2BW paper (https://arxiv.org/pdf/2006.09503.pdf)
(b) PipeDream-Flush (1F1B) schedule mentioned in Figure 4 (top) in Megatron 3D paper (https://arxiv.org/pdf/2104.04473.pdf)
(c) Interleaved 1F1B schedule mentioned in Figure 4 (bottom) in Megatron 3D paper (https://arxiv.org/pdf/2104.04473.pdf)
What communication collective primitives are used while implementing pipeline parallelism?
runtime/pipe/engine.py mentions following comment.
Note: ZeRO-2 and ZeRO-3 are incompatible with pipeline parallelism.
Is this still true in the recent version of DeepSpeed?
The text was updated successfully, but these errors were encountered: