Multi-threaded server message endpoints #147

Shillaker · 2021-09-30T15:41:35Z

Currently our server message endpoints (FunctionCallServer, StateServer, and SnapshotServer) are backed by a single worker in a single thread, which makes them a potential bottleneck. This is particularly likely in a multi-tenant scenario where we might have a few applications running, all modifying state, chaining functions, or pushing snapshots simultaneously.

This PR converts both our "sync" and "async" server endpoints to being multi-threaded via a fan-in/ fan-out approach (i.e. all incoming requests still go to the same socket as before, but under the hood they're shared between several worker threads).

The flow of messages through sockets is:

Incoming remote requests (PUSH/ REQ from remote host).
Fan-in (all requests received on a single PULL/ ROUTER socket).
Fan-out (proxy between fan-in and a single PUSH/ DEALER connected to an inproc:// socket).
Workers (multiple PULL/ REPs connected to the inproc:// socket of the fan-out).

Changes:

Use 0MQ ROUTER/ DEALER sockets for fanning out incoming REQ requests to multiple downstream REPs (example in docs).
Use an interim PULL/ PUSH pair of sockets for fanning out incoming PUSH requests to multiple downstream PULLs (info here).
Connect both fan-in/ fan-out sockets with a proxy_steerable. This allows us to stop the proxy programatically without a signal.
Update server shutdown process to send a TERMINATE message to the proxy, followed by shutdown messages to each worker one-by-one.
Make number of threads used in function call, snapshot and state servers configurable.

csegarragonz

One high-level observation is why are we using a thread-pool, and not assigning a different inproc address to each thread in our already existing thread pool.

I am concerned that ZeroMQ's scheduling is out of our control, and we rely on setting an arbitrarily adequate number of working threads, and we could always find a corner-case to break it (as it happened with MPI async layer).
As far as I can tell the current execution flow is (correct me if I am wrong):

Local executor thread blocks due to a distributed coordination operation.
Underlying transport thread (randomly assigned to this executor) blocks.
Remote executor thread unlocks the local executor thread, thus sends a message to notify.
A different transport thread logically unlocks the local transport thread that was locked.
Main concern with this design is that it strongly depends on the availability of a free transport thread, and on ZeroMQ actually scheduling work to threads that are not blocked, which again is out of our control.

My alternative would be to give each executor in faabric an inproc socket, or use a PUB/SUB scheme over the fan-out in-proc fabric with topic being something like appid/<function/snapshot/state>/exec-id.

Then, implement the distributed coordination operations as transport primitives. This is, use ZeroMQ's blocking functionality to implement distributed barriers.

I have only drafted the idea here, happy to dismiss or discuss offline, as I am sure the reasoning needs a lot of polishing.

include/faabric/transport/MessageEndpoint.h

src/transport/MessageEndpoint.cpp

src/transport/MessageEndpointServer.cpp

Shillaker added 2 commits September 30, 2021 14:34

Rework message endpoints to accommodate router/ dealer

c1e262d

Hooked up router/ dealer

26ebc9b

Shillaker self-assigned this Sep 30, 2021

Shillaker added 7 commits October 1, 2021 15:24

Progressing on async multi-threading

bd37e72

Connect fan in/ out

a89f660

Async shutdown working

2c07134

Sync shutdown working

f6f7521

Proxy steerable

46f12a6

Killing proxies properly

9e20c1a

Connect up configurable thread numbers

6831882

Shillaker changed the title ~~Multi-threaded servers~~ Multi-threaded message endpoints Oct 4, 2021

Shillaker changed the title ~~Multi-threaded message endpoints~~ Load-balance message endpoints across multiple workers Oct 4, 2021

Shillaker changed the title ~~Load-balance message endpoints across multiple workers~~ Multi-threaded message endpoints Oct 4, 2021

Add tests for mulit-threaded server

52fe765

Shillaker marked this pull request as ready for review October 4, 2021 13:19

Shillaker added 5 commits October 4, 2021 15:37

Tidy-up and sleep

16b202c

Merge branch 'master' into mt-servers

1ffde20

Temporary trace logging in tests

9d163e6

Config tests, different latches

fd27c50

Remove trace log in tests

275692a

Shillaker requested a review from csegarragonz October 4, 2021 17:15

csegarragonz reviewed Oct 5, 2021

View reviewed changes

Shillaker changed the title ~~Multi-threaded message endpoints~~ Multi-threaded "server" message endpoints Oct 6, 2021

Shillaker changed the title ~~Multi-threaded "server" message endpoints~~ Multi-threaded server message endpoints Oct 6, 2021

Shillaker added 3 commits October 6, 2021 10:52

PR comments

989ef10

Merge branch 'master' into mt-servers

6a461da

Formatting

587e6c0

Shillaker requested a review from csegarragonz October 6, 2021 11:37

csegarragonz approved these changes Oct 6, 2021

View reviewed changes

Shillaker merged commit 5e3ea0a into master Oct 6, 2021

Shillaker deleted the mt-servers branch October 6, 2021 14:26

csegarragonz mentioned this pull request Feb 23, 2022

Add task to generate release body #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threaded server message endpoints #147

Multi-threaded server message endpoints #147

Shillaker commented Sep 30, 2021 •

edited

Loading

csegarragonz left a comment

Multi-threaded server message endpoints #147

Multi-threaded server message endpoints #147

Conversation

Shillaker commented Sep 30, 2021 • edited Loading

csegarragonz left a comment

Choose a reason for hiding this comment

Shillaker commented Sep 30, 2021 •

edited

Loading