-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker-based Async API with await/Promise support (or plain callbacks) #6
Conversation
+ potentially out-of-order RPC-back/result-dequeuing is possible with a type of call-ID generator and callback-map that we prototyped (but not used atm since the worker is only doing sync/blocking internally with the current SRT lib binding).
src/async.js
Outdated
* @param msTimeOut | ||
*/ | ||
epollUWait(epid, msTimeOut, callback) { | ||
return this._createAsyncWorkPromise("epollAddUsock", [epid, msTimeOut], callback); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"epollUWait"
…alse), and a custom timeout value option (default to default timeout), which can potentially be set differently than the general default timeout. + make the timeout value a static class property so that it can be user-defined module-load wide. defaults to constant in module top scope.
+ add a custom timeout value argument to _createAsyncWorkPromise
Thank you for this PR. Lots of thoughts and work put behind this one. Will have a look but I think your approach sounds like the right one for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome contribution!
Eyevinn#6) * fix lint errors in eslint config :) * index.d.s: add /* eslint-disable @typescript-eslint/triple-slash-reference */ * improve eslint calling script * add async-worker version of the SRT JS API with await/Promise support + potentially out-of-order RPC-back/result-dequeuing is possible with a type of call-ID generator and callback-map that we prototyped (but not used atm since the worker is only doing sync/blocking internally with the current SRT lib binding). * add various examples for async API ("classic" callbacks / Promise / await) * add .eslintignore (should go with commit where we just call "eslint .") * index.js: add missing semi * async.js: fix method name litteral in epollUWait * async.js: allow for accept method to use a timeout opt (defaults to false), and a custom timeout value option (default to default timeout), which can potentially be set differently than the general default timeout. + make the timeout value a static class property so that it can be user-defined module-load wide. defaults to constant in module top scope. * async.js: fix for a rejected promise, make sure we don't resolve anymore + add a custom timeout value argument to _createAsyncWorkPromise * async.js: rm an experiment * fix a lint error in example * add types for async api and fix some details on binding API types * export async API to index * async.js: add some missing docs for private methods * include async types in index
I would like to explain a bit how I thought about the pros/cons of doing it like this or another way, why I think this is necessary, and what I would envision in terms of usage possibilities and trade-offs for this Node.js binding to libSRT in the future.
Problem
The current N-API binding layer to libSRT is such that every native call, which are all blocking I/O, runs synchroneously with the wrapping JS function call.
This specific making design of the bindings certainly has advantages, but comes with some usage issues on the JS side as these functions are called from the Node.js proc main-thread / event loop.
As we are trying to achieve higher througputs reading/writing to SRT file-descriptors takes a while. For example, if we read/write 16kbytes from a socket per one JS main thread tick every 20ms, that’s just about 6,4 Mbit/s throughput. That means that we only have this window of 20ms between every call to do other tasks, which is smaller as the r/w operation takes time itself.
Awaiting incoming connections in
accept
mode to implement a server is "tricky" with this sort of blocking API from a JS application point of view (see the current ReadableStream and Server implementations).Also any other blocking SRT operation done will generally impact application performance in an unpredictable way, as other tasks our Node.js service may run in production will backlog waiting for our sync'd binding to return. Generally, running blocking I/O directly on the JS main loop is a source of issues and considered bad practice for reasons better explained in a "theory" section I wrote up below (but feel free to TLDR that if you agree so far).
Solution
To solve this problem, we run the native calls into libSRT i.e the blocking binding methods on a JS Worker thread. The architecture used here is doing generic RPC'ing from a "proxy" class. To do this, the proxy class reflects the method names & signatures of the bindings JS API that we are calling in the Worker message-queue iterations. The proxy class calls a generic function to pass all argument as an array and the method name itself as litteral string value in a message to the worker (like in an RPC, where the remote end here is the other thread running the actual SRT API bindings).
The herewith introduced
AsyncSRT
class which implements this proxy has also the same API as the current version, with a callback function argument added on every method, and each returns a Promise of the result value that the sync API would return.A proxy class instance owns a worker thread instance each and allows therefore to manage queued RPC's and respective results/callbacks to one "SRT thread" each. Concurrent access to underlying native sockets/file-descriptors via libSRT is thus possible in a managed way (for example one thread may accept new connections, while another worker can perform read/write ops on FDs).
This is relatively simple to implement here as you can see, as since the SRT calls are all blocking and no other thread does anything event-loopish and it is all polling etc, the queuing/dequeuing order of RPCs/callbacks between main<->worker threads is fully maintained.
You can also run the SRT async API i.e spawn the worker thread from another parent worker btw, that's all the same to it.
Potentially out-of-order RPC-back/result-dequeuing is possible with a type of call-ID generator and callback-map that we prototyped (but not used atm since the worker is only doing sync/blocking internally with the current SRT lib binding).
Any user passed callback is wraped in the
new Promise
executor, in which the actual worker message-posting is called. The Promise resolver is the corresponding result message from the worker, and it will call bothPromise.resolve
and the plain callback, if any was passed at all (that is optional). By design choice it will get called after the resolve function, but that only plays a role if you want to use both callbacks and Promises- which is a rather rare case.Every promise returned has a timeout which defaults to 3000 ms. We could make this configurable at runtime for all the bindings module, or even allow to pass in an optional timeout value on each async API method in principle. If the timeout is exceeded, the promise gets rejected.
The
accept
method is the only one which does not use/need a timeout, for obvious reasons. Important to note here, see "Worker Termination" in the issues section ...Issues (new or remaining)
Zero-copy
Atm we are doing a copy of
Buffer
objects across (for read/write methods), as sharing/message-passing buffer refs is otherwise non trivial or illegal withWorker
s. This is far from ideal way to handle this. In principle, there are several possible solutions to make this zero-copy (seeSharedArrayBuffer
orTransferable
objects). But these lay some constraints onto the application using them. So we want to make copy the default mode, and allow to enable others by configuration (in further patches).Worker
overheadThe JS
Worker
as a light-weight thread-abstraction makes this solution very easy to implement. However, it certainly will have a different performance overhead (and other constraints in handling dataflow- see zero-copy) due to being in JS land as opposed to C/N-API land.accept
& Worker TerminationThere is atm no way to "cancel" a once posted "accept job". This SRT function call effectively blocks the worker thread until any client connects on a listened socket. Addings such a helper as a feature might be interesting, as one would want to terminate a worker thread properly. It would obviously imply closing the socket from any other concurrent thread (since the thread running
accept()
can't do anything else)- and that can be the main thread too in principle. This issue here is of course as much a relevant one with the plain sync bindings run in any thread. It only remains as much, even with async execution support.Challenging ideas
"Binding layer should do this"
We could have certainly solved this with napi_create_async_work.
But that would have meant to basically rewrite the whole binding layer. The solution here is interesting by how little or nothing it adds to enable powerful usage.
To provide a natively asynchroneous I/O N-API binding layer for libSRT here is certainly a larger task we should envision for the future.
It would in that case still be interesting to keep both native bindings (sync vs async at N-API layer) around, to see if there are pros/cons to implementing async I/O in either the way we do here via
Worker
, or vianapi_create_async_work
?"It would be simpler in main thread"
We can (actually should) keep the current sync'd binding layer around imo, with all the currently existing JS classes that are written directly on top of it to do scheduling tricks of the blocking I/O on the main thread (like the SRT-Readable/Writable classes here). It should be an interesting topic to see if we can optimize these for other cases too. It can be an interesting challenge as well to get the scheduling of these blocking tasks onto main loop execution timeframes right in order to keep it unblocked as much as possible. Both blocking vs async running bindings can exist side-by-side in this codebase and present different topics of research.
A bit of JavaScript Theory
A bit about the theory here why this is needed (mostly have been writing this up to give myself a clear head about it):
Conceptually for one, it is as far as "not a viable concept" for integrating with the JS ecosystem to run any blocking I/O on the main thread, where this environment specifically expects native/system-specific I/O calls to happen in background threads and leverage all advantages from that "async I/O".
-On the other hand, JS does not allow to construct any main-loop-like concept (as in
while(true) { /* process some I/O event/message queue here */ }
); for the simple reason that it does not expose OS-level threads (as do Java or Python or other high-level (dynamic) runtimes).In JS there is just no way to do that, without causing congestion of internal event queuing of the very runtime by doing it (mostly as individual execution ticks take too long for upcoming other events to be processed). That very concept of an event-loop is built into the language or runtime this language specifies and thus expects.
-So far for the theory :)