Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Set OMP_NUM_THREADS by default in Elastic #2569

Merged
merged 3 commits into from
Jul 8, 2024

Conversation

fellhorn
Copy link
Contributor

@fellhorn fellhorn commented Jul 7, 2024

Why are the changes needed?

torchrun sets the environment variable OMP_NUM_THREADS automatically if not specified (code). If it is not set, we saw some special cases in which our experiments were stuck for several minutes with high CPU load.

Also see https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#utilize-openmp for more details.

To make Elastic tasks behave the same as executions started with torchrun, I would propose copying this behavior over to flytekit.

What changes were proposed in this pull request?

See above

How was this patch tested?

  1. Added unit tests
  2. Tested with internal workflows to check the behavior of OMP_NUM_THREADS.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is changing behavior of current code, I think it's a better default.

For my curiosity, how are coming across this issue with PyTorch? (PyTorch's DataLoader also sets the number of threads to one here so it does not oversubscribe with num_workers.)

Can we add to Elastic's docstring, how a user should set OMP_NUM_THREADS? Do you see users doing this:

# my_workflow.py
import os

os.environ["OMP_NUM_THREADS"] = "2"

@task(task_config=Elastic(...))
def my_task():
    ....

kumare3
kumare3 previously approved these changes Jul 7, 2024
@kumare3
Copy link
Contributor

kumare3 commented Jul 7, 2024

@fellhorn i like @thomasjpfan's question. @thomasjpfan it should be set using
'Pyflyte run --env'

@fellhorn
Copy link
Contributor Author

fellhorn commented Jul 8, 2024

@fellhorn i like @thomasjpfan's question. @thomasjpfan it should be set using 'Pyflyte run --env'

Good point, I added some information now in 95a10f9:

Like torchrun, this plugin sets the environment variable OMP_NUM_THREADS to 1 if it is not set.
Please see https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html for potential performance improvements.
To change OMP_NUM_THREADS, specify it in the environment dict of the flytekit task decorator or via pyflyte run --env.

thomasjpfan
thomasjpfan previously approved these changes Jul 8, 2024
pingsutw
pingsutw previously approved these changes Jul 8, 2024
Signed-off-by: Dennis Keck <[email protected]>
@fellhorn fellhorn dismissed stale reviews from pingsutw and thomasjpfan via ec68536 July 8, 2024 18:50
@fellhorn
Copy link
Contributor Author

fellhorn commented Jul 8, 2024

Although this is changing behavior of current code, I think it's a better default.

For my curiosity, how are coming across this issue with PyTorch? (PyTorch's DataLoader also sets the number of threads to one here so it does not oversubscribe with num_workers.)

Can we add to Elastic's docstring, how a user should set OMP_NUM_THREADS? Do you see users doing this:

# my_workflow.py
import os

os.environ["OMP_NUM_THREADS"] = "2"

@task(task_config=Elastic(...))
def my_task():
    ....

@thomasjpfan regarding your other question: OpenMP is e.g. also used for torch.jit.script inference, where no default value for the number of threads is set.

@pingsutw pingsutw merged commit 89e5461 into flyteorg:master Jul 8, 2024
46 checks passed
fiedlerNr9 pushed a commit that referenced this pull request Jul 25, 2024
mao3267 pushed a commit to mao3267/flytekit that referenced this pull request Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants