Parallelize ManyToMany plugin #4454

oxidase · 2017-08-29T18:55:13Z

Issue

Backward and forward searches in manyToMany plugin are independent and can be easily parallelized.
Here is timing results in milliseconds with 4 cores (2 real + 2 HT) for 25x25 random table requests for DE-sized extract:

Tasklist

review
adjust for comments

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

daniel-j-h · 2017-08-29T22:27:12Z

include/engine/engine.hpp

@@ -125,6 +127,7 @@ template <typename Algorithm> class Engine final : public EngineInterface
    }
    std::unique_ptr<DataFacadeProvider<Algorithm>> facade_provider;
    mutable SearchEngineData<Algorithm> heaps;
+    tbb::task_scheduler_init task_scheduler;


Needs to be constructed with num threads otherwise default ctor will use available cpus to infer number - which will be wrong e.g. in a Docker container:

https://www.threadingbuildingblocks.org/docs/help/hh_goto.htm?index.htm#reference/task_scheduler/task_scheduler_init_cls.html

@daniel-j-h i added use_threads_number parameter that can be set in node bindings and in command line arguments of osrm-routed. osrm-routed now has two separate thread pools: server and internal tbb, so number of threads is actually doubled.

It it? I thought the benefit of using the dynamically linked libtbb was to avoid exactly these issues?

TheMarex

Looks great! However this needs to be verified under load and I would like to see the slowdown with just using one thread over the old version. I checked our node bindings and it seems we already link against libtbb for other stuff so this is not breaking any dependencies.

This is somewhat of an paradigm shift with how we do parallelization (external vs. internal thread pool), if this goes well we might consider parallizing other algorithms as well.

TheMarex · 2017-08-30T16:39:42Z

include/engine/engine_config.hpp

@@ -90,6 +90,7 @@ struct EngineConfig final
    int max_alternatives = 3; // set an arbitrary upper bound; can be adjusted by user
    bool use_shared_memory = true;
    Algorithm algorithm = Algorithm::CH;
+    int use_threads_number = 1;


Nitpick: use_threads_number is kind of a weird phrasing. number_of_threads or just threads.

We should also document a value that would make TBB use the default number of threads (I think -1?).

TheMarex · 2017-08-30T16:47:31Z

src/tools/routed.cpp

-                                             int &max_locations_map_matching,
-                                             int &max_results_nearest,
-                                             int &max_alternatives)
+                                             EngineConfig &config)


TheMarex · 2017-08-30T16:53:18Z

src/engine/routing_algorithms/many_to_many.cpp

-    const auto bucket_iterator = search_space_with_buckets.find(node);
-    // iterate bucket if there exists one
-    if (bucket_iterator != search_space_with_buckets.end())
+    const auto &bucket_list = std::equal_range(search_space_with_buckets.begin(),


Did you check the performance impact of this over using a unordered_map? 2log(N) might be fine though.

oxidase · 2017-09-15T07:29:29Z

@daniel-j-h let me show what i mean by "double number of threads". In my local run when i stop in manyToManySearch osrm-routed has the following threads

 Id   Target Id         Frame 
  1    Thread 0x7ffff7f96740 (LWP 11822) "osrm-routed" do_sigwait (sig=0x7fffffffc430, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:64
  2    Thread 0x7ffff070e700 (LWP 12263) "osrm-routed" 0x00007ffff6ecd9dd in pthread_join (threadid=140737217296128, thread_return=0x0) at pthread_join.c:90
  3    Thread 0x7fffefd7f700 (LWP 12264) "osrm-routed" 0x00007ffff5f1a373 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
* 4    Thread 0x7fffef3f0700 (LWP 12265) "osrm-routed" osrm::engine::routing_algorithms::manyToManySearch<osrm::engine::routing_algorithms::ch::Algorithm> (engine_working_data=..., facade=..., phantom_nodes=std::vector of length 25, capacity 25 = {...}, source_indices=std::vector of length 0, capacity 0, target_indices=std::vector of length 0, capacity 0) at /home/miha/mapbox/osrm-backend/src/engine/routing_algorithms/many_to_many.cpp:322
  5    Thread 0x7fffeea61700 (LWP 12266) "osrm-routed" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6    Thread 0x7fffee0d2700 (LWP 12267) "osrm-routed" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  7    Thread 0x7fffed743700 (LWP 12773) "osrm-routed" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  8    Thread 0x7fffecf41700 (LWP 12774) "osrm-routed" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  9    Thread 0x7fffed342700 (LWP 12775) "osrm-routed" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

The first thread is the main one and waits for a signal at

osrm-backend/src/tools/routed.cpp

Line 303 in b361225

sigwait(&wait_mask, &sig);

The second thread is a server thread that starts at

osrm-backend/src/tools/routed.cpp

Line 288 in b361225

std::thread server_thread(std::move(server_task));

and waits at

osrm-backend/include/server/server.hpp

Line 82 in b361225

thread->join();

Threads 3-6 are asio services started at

osrm-backend/include/server/server.hpp

Line 76 in b361225

std::shared_ptr<std::thread> thread = std::make_shared<std::thread>(

Threads 7-8 are started by TBB during the first call of

osrm-backend/src/engine/routing_algorithms/many_to_many.cpp

Line 344 in b361225

tbb::parallel_for(

Without the PR osrm-routed uses 6 threads in my particular case, with the PR 6 + 3 threads in TBB pool.

Also as we checked yesterday there is no static TBB distributions, so it should be safe to assume a unique TBB data singleton.

Also changing a map of vectors to a ordered vector leads to change of average query time for germany latest from 568.7661ms to 496.6084ms

oxidase requested a review from TheMarex August 29, 2017 18:55

oxidase force-pushed the parallel/m2m branch from de931cf to 89d6b7e Compare August 29, 2017 20:46

daniel-j-h requested changes Aug 29, 2017

View reviewed changes

oxidase force-pushed the parallel/m2m branch 2 times, most recently from feada3e to 19c8fc1 Compare August 30, 2017 12:18

TheMarex approved these changes Aug 30, 2017

View reviewed changes

TheMarex added the Review - In feedback label Aug 30, 2017

TheMarex modified the milestones: 5.11.0, 5.12.0 Aug 30, 2017

danpat modified the milestones: 5.13.0, 5.12.0 Sep 5, 2017

oxidase added 4 commits September 14, 2017 20:02

Restructure manyToManySearch for parallelization

c01d275

Remove std::unordered_map<NodeID, std::vector<NodeBucket>>

6715e7d

Parallelize ManyToMany plugin

f6da1d0

Link TBB task_scheduler lifetime with Engine scope

d132cf7

oxidase force-pushed the parallel/m2m branch 2 times, most recently from c74e038 to d132cf7 Compare September 14, 2017 20:32

Adjust number of threads in osrm-routed

b361225

daniel-j-h approved these changes Sep 15, 2017

View reviewed changes

oxidase merged commit 966139c into master Sep 15, 2017

oxidase deleted the parallel/m2m branch September 15, 2017 08:55

danpat mentioned this pull request Oct 20, 2023

Table request time complexity #6721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize ManyToMany plugin #4454

Parallelize ManyToMany plugin #4454

oxidase commented Aug 29, 2017 •

edited by TheMarex

Loading

daniel-j-h Aug 29, 2017

oxidase Aug 30, 2017

daniel-j-h Aug 30, 2017

TheMarex left a comment

TheMarex Aug 30, 2017

TheMarex Aug 30, 2017

TheMarex Aug 30, 2017

oxidase commented Sep 15, 2017 •

edited

Loading

Parallelize ManyToMany plugin #4454

Parallelize ManyToMany plugin #4454

Conversation

oxidase commented Aug 29, 2017 • edited by TheMarex Loading

Issue

Tasklist

Requirements / Relations

daniel-j-h Aug 29, 2017

Choose a reason for hiding this comment

oxidase Aug 30, 2017

Choose a reason for hiding this comment

daniel-j-h Aug 30, 2017

Choose a reason for hiding this comment

TheMarex left a comment

Choose a reason for hiding this comment

TheMarex Aug 30, 2017

Choose a reason for hiding this comment

TheMarex Aug 30, 2017

Choose a reason for hiding this comment

TheMarex Aug 30, 2017

Choose a reason for hiding this comment

oxidase commented Sep 15, 2017 • edited Loading

oxidase commented Aug 29, 2017 •

edited by TheMarex

Loading

oxidase commented Sep 15, 2017 •

edited

Loading