-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Set OMP_NUM_THREADS by default in Elastic #2569
Conversation
Signed-off-by: Dennis Keck <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is changing behavior of current code, I think it's a better default.
For my curiosity, how are coming across this issue with PyTorch? (PyTorch's DataLoader also sets the number of threads to one here so it does not oversubscribe with num_workers
.)
Can we add to Elastic
's docstring, how a user should set OMP_NUM_THREADS
? Do you see users doing this:
# my_workflow.py
import os
os.environ["OMP_NUM_THREADS"] = "2"
@task(task_config=Elastic(...))
def my_task():
....
@fellhorn i like @thomasjpfan's question. @thomasjpfan it should be set using |
Signed-off-by: Dennis Keck <[email protected]>
Good point, I added some information now in 95a10f9:
|
Signed-off-by: Dennis Keck <[email protected]>
@thomasjpfan regarding your other question: OpenMP is e.g. also used for |
Signed-off-by: Dennis Keck <[email protected]> Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Dennis Keck <[email protected]> Signed-off-by: mao3267 <[email protected]>
Why are the changes needed?
torchrun
sets the environment variableOMP_NUM_THREADS
automatically if not specified (code). If it is not set, we saw some special cases in which our experiments were stuck for several minutes with high CPU load.Also see https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#utilize-openmp for more details.
To make
Elastic
tasks behave the same as executions started withtorchrun
, I would propose copying this behavior over to flytekit.What changes were proposed in this pull request?
See above
How was this patch tested?
OMP_NUM_THREADS
.Check all the applicable boxes
Related PRs
Docs link