Is ZeRO-3 compatible to tensor parallel? #4877
Answered
by
tjruwase
skyshine102
asked this question in
Q&A
-
In recent blogs posts, AWS: Nvidia NeMO: AWS makes transformer engine's tensor parallel into FSDP, which is "similar" algorithm to ds ZeRO-3.
(Really looking forwards to seeing these hybrid sharding technology being easily accessible) |
Beta Was this translation helpful? Give feedback.
Answered by
tjruwase
Jan 2, 2024
Replies: 1 comment 1 reply
-
@skyshine102, zero-3 can be combined with tensor parallelism and we validated this combination in our megatron-deespeed repo a while ago. This tutorial might be helpful: https://www.deepspeed.ai/tutorials/zero/ |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
skyshine102
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@skyshine102, zero-3 can be combined with tensor parallelism and we validated this combination in our megatron-deespeed repo a while ago. This tutorial might be helpful: https://www.deepspeed.ai/tutorials/zero/