-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize ManyToMany plugin #4454
Conversation
@@ -125,6 +127,7 @@ template <typename Algorithm> class Engine final : public EngineInterface | |||
} | |||
std::unique_ptr<DataFacadeProvider<Algorithm>> facade_provider; | |||
mutable SearchEngineData<Algorithm> heaps; | |||
tbb::task_scheduler_init task_scheduler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be constructed with num threads otherwise default ctor will use available cpus to infer number - which will be wrong e.g. in a Docker container:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daniel-j-h i added use_threads_number
parameter that can be set in node bindings and in command line arguments of osrm-routed
. osrm-routed
now has two separate thread pools: server and internal tbb, so number of threads is actually doubled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It it? I thought the benefit of using the dynamically linked libtbb was to avoid exactly these issues?
feada3e
to
19c8fc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! However this needs to be verified under load and I would like to see the slowdown with just using one thread over the old version. I checked our node bindings and it seems we already link against libtbb for other stuff so this is not breaking any dependencies.
This is somewhat of an paradigm shift with how we do parallelization (external vs. internal thread pool), if this goes well we might consider parallizing other algorithms as well.
@@ -90,6 +90,7 @@ struct EngineConfig final | |||
int max_alternatives = 3; // set an arbitrary upper bound; can be adjusted by user | |||
bool use_shared_memory = true; | |||
Algorithm algorithm = Algorithm::CH; | |||
int use_threads_number = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: use_threads_number
is kind of a weird phrasing. number_of_threads
or just threads
.
We should also document a value that would make TBB use the default number of threads (I think -1
?).
int &max_locations_map_matching, | ||
int &max_results_nearest, | ||
int &max_alternatives) | ||
EngineConfig &config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 👍
const auto bucket_iterator = search_space_with_buckets.find(node); | ||
// iterate bucket if there exists one | ||
if (bucket_iterator != search_space_with_buckets.end()) | ||
const auto &bucket_list = std::equal_range(search_space_with_buckets.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check the performance impact of this over using a unordered_map? 2log(N)
might be fine though.
c74e038
to
d132cf7
Compare
@daniel-j-h let me show what i mean by "double number of threads". In my local run when i stop in
The first thread is the main one and waits for a signal at osrm-backend/src/tools/routed.cpp Line 303 in b361225
The second thread is a server thread that starts at osrm-backend/src/tools/routed.cpp Line 288 in b361225
osrm-backend/include/server/server.hpp Line 82 in b361225
Threads 3-6 are asio services started at osrm-backend/include/server/server.hpp Line 76 in b361225
Threads 7-8 are started by TBB during the first call of
Without the PR Also as we checked yesterday there is no static TBB distributions, so it should be safe to assume a unique TBB data singleton. Also changing a map of vectors to a ordered vector leads to change of average query time for germany latest from 568.7661ms to 496.6084ms |
Issue
Backward and forward searches in manyToMany plugin are independent and can be easily parallelized.
Here is timing results in milliseconds with 4 cores (2 real + 2 HT) for 25x25 random table requests for DE-sized extract:
Tasklist
Requirements / Relations
Link any requirements here. Other pull requests this PR is based on?