-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] exits with return code = -11 #3989
Comments
@KeepAndWin, unfortunately I cannot repro this error because I do not have access to those specific GPU types. Most likely there is something incompatible with Cuda or the built in ops on that second device. I suggest trying basic pytorch+cuda first to ensure that works. |
@KeepAndWin please see this thread for latest discussion on this: #4002 |
It seems both -7 and -11 are related to shared memory issues with docker. Please see this reply that has fixed other people's recent issues: #4002 (comment) |
Closing for now. |
So how do you solve the issue? |
你解决这个问题了吗?我也遇到了这个问题 |
Describe the bug
I have run the code successfully on a machine with x4 1080Tis. However, when I ran the same code on a machine with x2 3090s, deepspeed report
Kill subprocess
after[real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
. In the end,exits with return code = -11
is prompted.ds_report output

Screenshots
System info (please complete the following information):
The text was updated successfully, but these errors were encountered: