-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When strategy deepspeed, the key erro of zeRO will be error, and it will just crash #36
Comments
Could you provide more information about your configuration, hardware and environment? |
|
Some python environments are as follows:
|
Can you provide more information on how to reproduce this issue? |
deepspeed config:
This is an error I encountered when using deepspeed. |
ok thanks for the report. lightning:
accelerator: gpu
devices: -1
strategy: common.deepspeed._sdxl_strategy
precision: bf16 for the full config please refer to config/train_sdxl_deepspeed.yaml |
When we perform multi-machine multi-GPU training, we are prompted with an out-of-memory error for the GPUs. After troubleshooting, we have identified this issue. If you can fix it, we would be very grateful
The text was updated successfully, but these errors were encountered: