Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch_lightning, DDP, GPU stucked at 100%, training stopped #29

Open
anonymoussky opened this issue Oct 27, 2023 · 0 comments
Open

pytorch_lightning, DDP, GPU stucked at 100%, training stopped #29

anonymoussky opened this issue Oct 27, 2023 · 0 comments

Comments

@anonymoussky
Copy link

Do you encounter this issue? Any suggestions?
Training after one epoch, somewhere in the middle of the 2nd epoch training, all GPU stucked at 100% without error. Training is also stucked. It seems like a common bug of pytorch_lightning using DDP. But I still did not find a solution.

Lightning-AI/pytorch-lightning#11242

@anonymoussky anonymoussky changed the title pytorch_lightning, DDP, GPU stucked at 100% pytorch_lightning, DDP, GPU stucked at 100%, training stopped Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant