Event loop run-time system #233

symbiont-stevan-andjelkovic · 2021-04-22T12:29:17Z

So far the design of this project has been very top-down -- from specification
in our heads, down to the software under test (SUT). Basically this can be
summarised by saying that we've introduced enough structure to allow for
deterministic testing, i.e. the Reactor interface, etc.

It's time to start thinking bottom-up -- from the hardware and OS level, up to
the SUT. Trying to answer questsions like:

convenience, i.e. what structure do we need in order to conveniently implement the SUT? Adding
the ability to broadcast is an example of this kind of thinking;
"real" implementation rather than "simulation" implementation;
deployment;
operator experience (application specific observability);
upgrades;
performance;
security;
high-availability;
scalability.

Fortunately a lot more people care about most of these topics so we have a lot
more inspiration to draw from than when tackling the top-down/simulation testing
topic.

Lets address these topics in turn.

Solution sketch for convenience, "real" implementation and performance

Most of these ideas come from: the E programming language, Goblins, CapTP, Cap'n Proto.

One event loop per OS process / CPU core
Several event loops on a single computer is possible
Event loops can also run on remote computers
Actors are spawned on event loops
Actors can send synchronous messages and get replies immediately to actors
in the same event loop
Actors can communicate across event loops, but have to do so asynchronously.
In order to not block the event loop, when an async message is sent a
continuation/callback must be provided which will be invoked by the event
loop once the reply comes back from the remote actor.
Similarly filesystem I/O can be implemented to not block the event loop.
An event loop maintains the following data:
1. a heap of actors
2. a stack of local messages (stack grows as local actors call other local
  actors, and shrinks as the replies come back in)
3. a queue of remote messages (these get dequeued onto the stack)
Advantages over current situation
- Non-blocking event loop
- Calls to local actors is synchronous, so we can avoid manually composing
  two actors in order to avoid passing messages between them
  (asynchronously)
- Related to the above, we get replies so we can avoid maintaining
  "sessions" inside the actors
Possible golang event loop libraries we could build on top: panjf2000/gnet and tidwall/evio,
also see the architecture of nginx for inspiration.

Solution sketch for operator experience

In addition to what we sketched out to store inside the event loop, we could
also store messages received and state diffs in a ring-buffer of some size (in
order to use a fixed amount of memory/disk), if we can make the ring-buffer
remotely dumpable we could use this to build a debugger similar for what we
currently have for tests but for live networks.

We could also make it possible to connect to an event loop and print things like:

current actors' state;
supervisor logs about the last crashes;
whatever interesting statistics we'd like to keep track of.

One could also imagine being able to patch the state of an actor that's in some way stuck.

Solution sketch for deployment and high-availability (of actors)

Most of the following ideas come from Erlang.

According to Wikipedia:

In 1998 Ericsson announced the AXD301 switch, containing over a million lines
of Erlang and reported to achieve a high availability of nine "9"s

One of the key ingredients behind this achievement is something called
supervisor trees. Before explaining the tree aspect, lets first explain what a
supervisor is. A supervisor is an actor whose sole purpose is to make sure
that actors below/underneath it are healthy and working. The way this is
implemented is that the supervisor spawns their children in a way so that if the
children throw an exception the supervisor gets that exception. Each supervisor
has a restart strategy associated with it, for example if one of its children
dies then restart only that child, an other strategy would be to restart all
children, etc. The supervisor also has a max restart intensity, if more than X
restarts happen within some period of time then the supervisor terminates all
children and then it terminates itself.

This is where supervisor trees come in. A supervisor might have other
supervisors as its children and together they form a tree with workers at the
leaves. So if a supervisor terminates itself, the supervisor above it in the
tree will get notified and can restart things according to its strategy etc.

A correctly organised supervisor tree spanning over multiple machines can be
extremely resilient. Because the tree is hierarchical and ordered (from left to right),
we also get graceful degradation where if one subtree fails the components in the rest
of the tree can still provide partial service to client requests.
There's also another consequence of this which is more subtle: crashing is
cheap, because restarts are cheap (fine-grained and merely recreating an object on the heap,
rather than coarse-grained and restarting a whole docker image as in the Kubernetes case).
We know that:

almost all (92%) of the catastrophic system failures are the result of
incorrect handling of non-fatal errors explicitly signaled in software.

and

in 58% of the catastrophic failures, the underlying faults could easily have
been detected through simple testing of error handling code.

so by avoiding to write error handling code (just crash instead) we actually
avoid writing a lot of bugs. If the specification isn't clear about some edge
case, simply crash the program there instead of trying to be clever about it,
the supervisor will do the restarts, the client won't notice any downtime
(perhaps having to do a retry) and if the edge case is rare enough it will work
after the restart (clearing any junk from the state), the developers will get
notified that restarts have happened and can then choose to fix the problem in a
principled way (or perhaps not if it's rare enough).

Supervisor trees can also be used as units of deployment, instead of spawning an
actor on an event loop, we could spawn a supervisor on an event loop and have it
spawn its children and make sure they stay alive.

We could imagine giving the root supervisor special treatment where we let
System D, or Kubernetes or whatever make sure that the event loop stays alive
and whenever it dies it gets restarted and the root supervisor spawned on it.

There are some existing implementations of supervisors in golang, e.g.
go-sup (seems unmaintained) and suture, which we might be able to use,
but its also fairly easy to implement from scratch (~250 lines).

Solution sketch for upgrades

This idea comes from Erlang.

If actors were seralisable (this is a BIG if in golang, but trivial in e.g.
Clojure), we could send a new version over the wire to the event loop which
could swap out the old version (after serving any outstanding requests) for the
new one with zero downtime. You can even imagine automatic rollback to previous
version if too many failures are observed.

Solution sketch for security

The E family has a nice story for security based on capabilities, which
can be implemented using scope alone. The basic idea is
that in order to, for example, communicated with an actor you need a reference
to it and you can only get a reference to it if you spawned the actor or got
sent a message containing the reference. This solves the authorisation problem.
The idea can be extended to any other resource, e.g. "you can only request files
from some particular path of my filesystem if you got a token for that and only
I can create such tokens".

Capability-based security is different from the more commonly used "access
control list" security, e.g. in the filesystem example above, access control
lists correspond to UNIX style filesystem permissions, i.e. this group has
access to this directory. For a longer comparison between the two approaches see
the following paper.

Solution sketch for automatic scaling

One could imagine having the root supervisors be able to monitor the resource
consumption of their event loop, and if it goes above or below some value
automatically provision or shutdown some computers and spawn additional or kill
idle workers from some pool which is load balanced.

The middle

Once we are done with this bottom-up approach, we'll also have to consider the
middle -- the gap that's left after top-down and bottom-up. It's tempting to
avoid thinking too much about this now, as it might constrain our bottom-up
thinking unnecessarily, but we will need to reconcile the tension between the
two eventually.

The text was updated successfully, but these errors were encountered:

symbiont-stevan-andjelkovic added this to the v0.1.0 milestone Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event loop run-time system #233

Event loop run-time system #233

symbiont-stevan-andjelkovic commented Apr 22, 2021 •

edited

Loading

Event loop run-time system #233

Event loop run-time system #233

Comments

symbiont-stevan-andjelkovic commented Apr 22, 2021 • edited Loading

Solution sketch for convenience, "real" implementation and performance

Solution sketch for operator experience

Solution sketch for deployment and high-availability (of actors)

Solution sketch for upgrades

Solution sketch for security

Solution sketch for automatic scaling

The middle

symbiont-stevan-andjelkovic commented Apr 22, 2021 •

edited

Loading