-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are Web Assembly threads? #104
Comments
It sounds like there is a version of emscripten-fastcomp which uses WebWorkers to implement threads: |
As threads are currently listed as a post-v1 feature, do we need to sort this out before the public announcement? |
Agreed, moving to no milestone. Related to this issue, in #103 I suggest we look at forward progress guarantees as being defined in the C++ standards committee. |
This is an important question. In general, I've taken the stance that we should work hard to avoid duplicating other parts of the Web platform by eventually allowing direct Web API access from WebAssembly. That being said, threads are a core part of execution semantics, making them more like SIMD (obvious builtin) than WebGL (obvious external API). During an initial discussion with @ncbray about workers vs. a pure-WebAssembly version of threads, I viewed the situation as either-or and sided with workers. More recently, though, I realized that the two can be quite complementary. First let me describe both independently before describing how they compose: With worker-based threads, to create a thread, a wasm app would create a worker (in v.1, by calling out to JS; after WebIDL integration, by importing the Worker API and creating the worker directly). To share a wasm heap/global-state between workers, the wasm module object itself would be Pros:
Cons:
So, an alternative is pure-WebAssembly-defined threads. Basically, the spec would define how you create, destroy, join, etc threads. Threads would be logically "inside" a module and all threads would have access to the same globals, imports, etc. What happens when a non-main thread calls an import that (in a browser) is JS? (This whole idea from @ncbray) The calling thread would block and the call would be executed on the main thread of the module (the one on which the module was loaded, which has a well-defined Pros:
Cons:
Not viewing these as mutually exclusive suggests an exciting hybrid:
From a Web platform POV, I think neither of these fundamentally change the model as long as we carefully define things like the state of the stack of script settings objects (again, though, analogous to From a non-Web environment POV: only pure-wasm threads would exist according to the spec (though the spec would specify what happens when some "other" (read: worker) thread calls a function out of the blue). This situation would be symmetric with module imports where the spec only talks about the semantics of a single module, leaving what can be imported up to the host environment. In particular, this means that, with pure-wasm threads, you'd be able to easily write a threaded module w/o any host dependencies. What I especially like is that the hybrid model avoids the requirement for all browsers to slim down their workers (which may take years and require cross-org collaboration). I am, however, quite interested to hear if other browser vendors think that this isn't a big deal. |
We can also evolve the web platform in two ways:
We also want to allow some form of lightweight user-mode threading. Other languages, such as Go, will perform better with this, and C++ itself will gain such features in C++17 or soon after. Let's make sure our approach makes this possible. |
Since we talked I’ve been trying to unravel an unclear, hairy ball of interacting issues and unarticulated starting points (re: api surface + ffi + threads) and parcell them into smaller chunks we can talk through. Hopefully I’ll start posting some issues soon, consider this a hand-waving ill-support stub until then. In general, I think we’re starting to see things from a similar perspective, but there are a lot of related issues that need to be worked through until the details for this issue click into place. Having read through the Web Worker spec, it’s very JS-centric. It doesn’t buy you a whole lot unless a thread has an implicit (JS-style) event loop and a thread-local JS isolate. In that case, it may make sense to treat it as a worker. (But these kind of threads may not exist, depending on other design choices.) I have some other concerns about how worker lifetimes are specified and the fact that the app “origin” could differ per thread, but I think those issues can be deferred for the moment. I don’t believe we want workers to become “threads out of nowhere” by calling into whatever WASM code they please. What pthreads ID do they get? How does TLS work? Unless thread management is hermetic to WASM, there are going to be some tough questions to answer. I do like WASM code interacting with arbitrary workers, however. Something along the line of message ports + bulk data copies, if nothing else? (Yes, this seems like a step back, I’ll try to justify it elsewhere.) Re-importing the associated ES6 modules on postMessage sort of scares me. It solves a few nasty issues but also seems like a big hammer that will inevitably smash something else. I’ll need to think through the consequences. Note: at least in Chrome, I know that many worker APIs are implemented by bouncing through the main thread. So explicitly bottlenecking through the main thread may not hurt performance much, in the short term? Note: even with SAB, the only real way to enqueue a task on a thread with an implicit (JS-style) event loop is postMessage. Alternatively, we could create some sort of “event on futex wake” type functionality, but that might be better suited for an explicit (native-style, rentrant, pumped by the program) event loop. Note: the implicit storage mutex seems like a deadlock waiting to happen, although I cannot find any APIs that actually acquire it in a worker... |
On Tue, Jun 9, 2015 at 2:05 AM, Nick Bray [email protected] wrote:
Event loops might make sense outside of JS, as well. E.g. there are a
|
Agree with @titzer that we may want to support async-style programming by allowing wasm to directly participate in the event loop (which isn't even a stretch of the imagination). At a high level, I think we're going to see:
and I think both use cases will matter a lot for the foreseeable future. |
From a C++ perspective we could map the 2 POVs @lukewagner proposes into:
An interpreter (lua, python, ...) could fit nicely in 2, and implementations could mux JS event loop processing with processing of the wasm module's events, including some file-descriptor and pipe handling that JS typically can't do but wasm module could. To be clear, I'm not saying we expost |
IIUC, |
If I may offer a few thoughts from the server-side of things (specifically node). There are three main use cases for threads (there are probably "proper" names for these, but you'll get the idea):
These have variations on how developers want them implemented. Like does the immediately exiting thread create a new JS stack every time, or use a new one. Is the thread closed when the operation is complete or reused. And also is a thread, sync or async, joined causing the main thread to hault until the spawned thread is complete. Sorry if this was too many examples. What I'm getting at is it seems like wasm's ability to work with threads will either not be low level enough or extensive enough to fit these needs (I don't expect it could concerning the first example). Leaving server applications needing to use their own bindings. This correct? |
I don't have all the answers, but one thing I can comment on is that WebAssembly threads are going to have access to a pthreads-level API, so applications will have quite a lot of control. Decisions like when and how to use pthread_join are largely determined by the application (or by a library linked into the application). Also, WebAssembly itself is being designed to be independent of JS, and it will be possible to have WebAssembly threads with no JS code on the stack. |
Thank you. That's excellent to hear. |
What things here do we need to decide on for the MVP, and what things can wait until we actually introduce shared memory? |
I don't see it directly blocking anything in the MVP. That being said, for something so central, it seems like we should have a pretty clear idea about the feature (and some experimental experience) before the MVP is actually finalized/released. I don't see "blocks_binary", though. |
A few issues on WebWorkers detailed here: https://github.com/lars-t-hansen/ecmascript_sharedmem/issues/2 |
FWIW, both of those are impl issues and ones that we're expecting to address in FF. |
@lukewagner, they are not entirely implementation issues. The worker startup semantics are allowed by the worker spec. The limitation on the number of threads is a consequence of wanting to prevent DOS attacks, but it's very crude and something better, with observability (ie exception thrown on failure to create a worker), would be welcome by many applications, but probably requires a spec change too. Additionally, so far as I can tell there's no way to detect if a worker has gone away, which is a bizarre spec hole, but I've not found a mechanism for it. |
@lars-t-hansen As for workers not starting before returning to the event loop, when you say "allowed by the spec", do you just mean by the spec not specifying when progress is made or does it specifically mention this case? As for the limitation on number of threads, you're right, what is needed (in addition to a higher quota) is some visible error to indicate the quota is exceeded. |
I think a forward-progress guarantee is what we want here, in line with Torvald Riegel's N4439 paper to the C++ standards committee. |
@lukewagner, The service worker spec (https://slightlyoff.github.io/ServiceWorker/spec/service_worker/) provides minimal justification for the "kill a worker" behavior (indeed, lack of UI for a slow script dialog, see the section "Lifetime") but no actual guidance on how to avoid being gunned down. For a computational worker in a SAB context that license to kill is particularly troublesome, as the default mode for such a worker will be that it waits on a condition variable for more work, not that it returns to its event loop. |
Bug filed against the WHATWG spec here: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29039. |
@slightlyoff can probably chime in on web worker + service worker and the "license to kill". |
I guess running webasm in strictly single-process environments would be doable even if the spec would require the presence of a pthreads-level API? As parallelism, as opposed to concurrency, cannot be guaranteed, an implementation would be free to simply treat created "pthreads" as blocks to schedule arbitrarily? Will any guarantees stricter than soft realtime be guaranteed by the spec? |
@emanuelpalm agreed, a valid implementation could emulate a single-processor system. Hard-realtime isn't something I think we can guarantee in general because WebAssembly doesn't know a-priori which machine it'll execute on. I see this limitation as similar to promising constant-time algorithms: the |
I'm really excited about this: Any thoughts if implementing a What could the pipeline look like as a concrete way to experiment? Seems more convenient to have something like: |
@jbondc I think your thinking is closer to the C++ parallelism TS, which requires runtime support. It would be possible, but WebAssembly currently doesn't have ongoing work for GPUs. It's important, but not the primary focus for MVP. The work by Robert's team is pretty different from what WebAssembly would do (though it could benefit from that work). |
Yes this looks right: But more interested in writing my own language that compiles down to Web Assembly (and has parallelism support). |
I think the best thing wasm can and should do is to provide the raw hardware primitives for parallelism (threads and SIMD) within the general-purpose CPU model assumed by WebAssembly to the language/library/tool authors, leaving existing and future Web APIs to access hardware outside the general-purpose CPU (like WebGL for the GPU). Then the language/library/tool authors can build up abstractions that target a specific model of parallelism. (This is basically just an Extensible Web argument.) Based on this, I would think the libraries in the C++ parallelism TS would be provided as libraries on top of the abovementioned primitives of the web platform, not as builtin functions (at least, not in the near-term). |
Thoughts on something like this: Is this a possible story? |
@jbondc It's a higher level abstraction and a framework that I expected could be implemented in wasm with good performance. If someone tried porting this and found some show stoppers then this might suggest extra low level threading support to be considered for wasm? My understand behind the reasons for avoiding such high level abstractions is that they generally have niche uses and also they have inherent assumptions and limitations that do not map well to the hardware, so there might not be the general interest to warrant direct support but perhaps someone can rally support for some other parallel computation models. |
It's not a higher level abstraction. The "Actor Model" (https://en.wikipedia.org/wiki/Actor_model) is a different theoretical model to think about computation (vs. a finite turing machine). Based on hardware (e.g. gpu by Nvidia), you can implement in a VM some messaging passing actorLike thing. The way I see it, if either Chrome, Chakra, Firefox or Webkit would implement pattern matching + an actorLike thing, then we'd get C++ parallelism for free, memory sharing + threads, and pretty much any other concurrent model. |
Related in case someone wants to hack something togheter: |
@jbondc It's not SMP and it doubt it could implement SMP code with any level of efficiency? Consider how it would implement atomically updated stacks and queues and caches etc in a code written for shared memory? The planned wasm support should allow some of these 'Actor Model' programming languages to be implemented in wasm, and the atomic operations might be used to implement fast message queues and memory management to keep the actors apart etc. |
@JSStats I'm more trying to look at good building blocks. Better hardware is already here. There's a good discussion about hardware here: The threads / shared memory design is bad. As a concrete example, this is good work: Page 34. Threads bad. Page 59-60. A ~'binary jit' could apply here: Distributed BIP generator. I'll try some experiments in the coming months, hopefully others will join. |
Moving this milestone to Post-MVP, so that the MVP milestone focuses on actionable tasks for the MVP. It's obviously important to plan for threads, but it's my impression that there's sufficient awareness already, so we don't need this issue to remind us. If people disagree, I'm fine moving it back. @jbondc If you are planning to do some experiments, please open a new issue to report your findings. Thanks! |
@sunfishcode Will do, but largely gave up. For anyone interested in BIP: Is the thinking then with WASM threads, that code like this would compile into atomic load/store ops?
|
@jbondc isn't the idea to provide threads and compare-and-swap etc so that one can build a lock-free actor implementation on top of it? |
@malthe Ya, after some research that's what ponylang does: There's also some interesting patterns of parallelism with HPX: And then GPU programming: |
@binji will address in threads proposal. Closing. |
Are Web Assembly specified in terms of WebWorkers, or are they different? What are the differences, or are WebWorkers just an implementation detail?
Maybe the polyfill is OK with WebWorkers, and Web Assembly does its own thing that's closer to pthreads.
We need to ensure that Web Assembly can work outside of a browser.
The text was updated successfully, but these errors were encountered: