Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable thread pool creation when enabled OpenMP #2485

Merged
merged 7 commits into from
Nov 27, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions onnxruntime/core/framework/session_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ struct SessionOptions {
TransformerLevel graph_optimization_level = TransformerLevel::Level1;

// controls the size of the thread pool used to parallelize the execution of tasks within individual nodes (ops)
// if OpenMP is enabled, this configuration will be ignored
int intra_op_num_threads = 0;

// controls the size of the thread pool used to parallelize the execution of nodes (ops)
Expand Down
2 changes: 2 additions & 0 deletions onnxruntime/core/session/inference_session.cc
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,10 @@ InferenceSession::InferenceSession(const SessionOptions& session_options,
: session_options_(session_options),
graph_transformation_mgr_(session_options.max_num_graph_transformation_steps),
logging_manager_(logging_manager),
#ifdef USE_OPENMP
fs-eire marked this conversation as resolved.
Show resolved Hide resolved
thread_pool_(concurrency::CreateThreadPool("intra_op_thread_pool",
session_options.intra_op_num_threads)),
#endif
inter_op_thread_pool_(session_options.execution_mode == ExecutionMode::ORT_PARALLEL
Copy link
Contributor

@ke1337 ke1337 Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inter_op_thread_pool_ [](start = 6, length = 21)

Seems this would disable parallel executor when building OpenMP? Say, if running on a 8-HT machine, with OMP_NUM_THREADS=4 and -y 2 in onnxruntime_perf_test, is that a valid setting? #Resolved

Copy link
Contributor Author

@fs-eire fs-eire Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Do you think it's better to keep the previous behavior for inter threadpool (-y) ? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel executor is off by default so it should not cause any perf difference in your case. By keeping this logic, we can try to enable it together with OpenMP in case some models might benefit from the combination.


In reply to: 350977021 [](ancestors = 350977021)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If USE_OPENMP is not defined, please initialize the var to nullptr here. Because the next few lines are referencing this variable.

? concurrency::CreateThreadPool("inter_op_thread_pool",
session_options.inter_op_num_threads)
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/test/perftest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Options:

-v: Show verbose information.

-x: [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes. A value of 0 means the test will auto-select a default. Must >=0.
-x: [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes. A value of 0 means the test will auto-select a default. Must >=0. If OpenMP is enabled, this configuration will be ignored.

-y: [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means the test will auto-select a default. Must >=0.

Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/test/perftest/command_args_parser.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ namespace perftest {
"\t-p [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.\n"
"\t-s: Show statistics result, like P75, P90.\n"
"\t-v: Show verbose information.\n"
"\t-x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >=0.\n"
"\t-x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >=0. If OpenMP is enabled, this configuration will be ignored.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-x [intra_op_num_threads] [](start = 9, length = 25)

Can we change -x to call omp_set_num_threads() when building with OpenMP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently this is controlled by env var OMP_NUM_THREADS. if -x has different value to OMP_NUM_THREADS, what is the expected behavior? I think this may cause confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe set_omp_num_threads would override the env then, which is what user would expect if OpenMP is mainly used for intra node or data parallelism. Otherwise, we need to ask user to use different ways to specify data parallelism #threads.


In reply to: 350977758 [](ancestors = 350977758)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one tricky part of the current behavior:

if OpenMP is not enabled, if you set -x 8, it means you will have 9 threads (main + 8 workers)
if OpenMP is enabled, if you set OMP_NUM_THREADS=8, it means you will have 8 threads in total (ie. main + 7 workers)

this cause the problem: if we want to extend -x, the behavior will be inconsistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we unify the definition of "numThreads" ( number of worker threads? or number of threads in total? ), I would prefer to keep the current status

Copy link
Contributor

@ke1337 ke1337 Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if -x means different things when using OpenMP vs. not using OpenMP, I'd prefer to have -x to set OMP thread count, because it's easier to control than setting env and prevents accidentally picking up previously set env, especially in Windows. In this case, we are not changing any existing behavior, and only made it easy to control omp threads.


In reply to: 351000042 [](ancestors = 351000042)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot expect that everyone fully understand this tricky behavior, and it's not necessarily have to understand. It can't be easier to make mistakes -- create threadpool with the wrong number of threads without being aware of it.

I would even prefer to add a new flag rather than -x, for example, --omp-num-threads or anything you name it, but not combining it with -x to make it multi-behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, so we just disable -x in OpenMP build.


In reply to: 351019314 [](ancestors = 351019314)

"\t-y [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >=0.\n"
"\t-P: Use parallel executor instead of sequential executor.\n"
"\t-o [optimization level]: Default is 1. Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).\n"
Expand Down