-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is zero_optimization stage 2 can't work with pipeline? #568
Comments
Have you tried dropping "reduce_bucket_size": 5000000, |
Thank you very much, I will try it. |
No problem! You can find my code here. |
Thanks, your gpt-neox is a wonderful project. |
Maybe fixed by #677 |
Why thank you! I'm quite excited about it :) |
hi, StellaAthena, According to my experiment, zero2 is not compatible with pipeline, can you verify that your example really uses zero stage 2 successfully? |
Did you use my code or did you use the official DS code? I believe that the GPT-NeoX + DeeperSpeed codebase has some necessary bug fixes that are yet to be integrated into the master branch of DeepSpeed. |
Hi, I use this example: https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM-v1.1.5-3D_parallelism |
could you please provide your pull requests of deepspeed bug fixes here?you mean this:https://github.com/microsoft/DeepSpeed/pull/677/files ?thanks alot |
This is my config:
is it because pipeline already partitioned optimizer state and gradient state, so no need to use zero_optimization partition? (After checked the code, the answer is no, pipeline partition the whole model(include both layers/activation/gradient/state, zero_optimization can further partition gradient/state inside pipeline layers.) but if they failed to work together, then how to do cpu_offload?
error message is:
When contiguous_gradients is true, error message is:
When Zero stage is 1, the pipeline works fine.
Environment:
python 3.6
torch 1.6.0
deepspeed 0.3.7
The text was updated successfully, but these errors were encountered: