-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip reconciliate_processes if used within a cluster environment that creates processes externally #9389
Skip reconciliate_processes if used within a cluster environment that creates processes externally #9389
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req. review from @tchaton
Codecov Report
@@ Coverage Diff @@
## master #9389 +/- ##
=======================================
- Coverage 93% 89% -4%
=======================================
Files 180 180
Lines 14965 15065 +100
=======================================
- Hits 13904 13370 -534
- Misses 1061 1695 +634 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
@tchaton @awaelchli I'm not sure why this test is failing:
are special tests run with |
Hey @ananthsub. Interesting. Mind taking |
ah @tchaton @awaelchli - I believe it's due to the check in the LightningEnvironment here: https://github.com/PyTorchLightning/pytorch-lightning/blob/b294c5760eee30a995fcd400127be209b12c4d7c/pytorch_lightning/plugins/environments/lightning_environment.py#L43-L49
So while This was added in #7480 |
pytorch_lightning/plugins/environments/lightning_environment.py
Outdated
Show resolved
Hide resolved
You are right, I should have captured the value like you do in this PR |
pytorch_lightning/plugins/environments/lightning_environment.py
Outdated
Show resolved
Hide resolved
Hey @ananthsub, Do you see blockers to introduce an env variable to enable Lightning to kill the processes even with a cluster env. Something like I believe this would be useful for Grid.ai. Best, |
@tchaton @awaelchli could you take another look to confirm the opt-in deadlock detection looks reasonable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
… creates processes externally (#9389) * [RFC] Skip reconciliate_processes if used within a cluster environment that creates processes externally
… creates processes externally (#9389) * [RFC] Skip reconciliate_processes if used within a cluster environment that creates processes externally
… creates processes externally (#9389) * [RFC] Skip reconciliate_processes if used within a cluster environment that creates processes externally
What does this PR do?
Fixes #9388
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃