-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to cancel a pending Atomics.waitAsync? #176
Comments
I don't think there's a way of cancelling a waitAsync. ISTR there's generally some concern around promises not having a cancellation facility. May be worthwhile pinging TC39 around that. (I don't know the job semantics well enough to say what happens with pending promises at page shutdown, which is probably the only sense of "deinitialization" that is known to the web.) |
Running the following test <html><body><script>
function test() {
var heap = new Int32Array(new SharedArrayBuffer(128*1024*1024));
var waitIndex = 0;
Atomics.waitAsync(heap, waitIndex, 0, Infinity).value.then((value) => {
console.log(heap[waitIndex]);
});
}
for(var i = 0; i < 10; ++i) {
test();
}
</script></body></html> in Chrome does indeed result in the created SABs being pinned down, and never freed. In e.g. Unity use, it is somewhat common for pages to need to unload a Unity engine build from the page, and maybe open another Wasm build. An example usage is in a 3D asset store web store page, with a preview carousel, where the preview carousel may be a photo, a video, or a live 3D page (written in Unity game engine to wasm) of the product in question. When one navigates the carousel, the page will dynamically populate to show the active selected preview item index. An example page: https://assetstore.unity.com/packages/3d/environments/landscapes/polydesert-107196 Another example usage is a game portal, where game instances are loaded into a child element. This limitation of being unable to gracefully deinitialize does effectively mean that one is mandated to use iframes in order to get JavaScript garbage collection of Wasm heaps to work. |
Trying out a similar test in Chrome with iframes does suggest that Chrome is able to reclaim the memory if one does use iframes:
<html><body><script>
var numLoaded = 0;
function open() {
if (numLoaded++ >= 20) return;
console.log(numLoaded);
var iframe = document.getElementById('container');
if (iframe) {
console.log('unload');
iframe.parentElement.removeChild(iframe);
}
else {
console.log('load');
iframe = document.createElement('iframe');
iframe.id = 'container';
document.body.appendChild(iframe);
iframe.src = 'iframe_child.html';
}
console.log(iframe.src);
}
setInterval(open, 1000);
</script></body></html>
<html><body><script>
var heap = new Int32Array(new SharedArrayBuffer(128*1024*1024));
var waitIndex = 0;
console.log('waitAsync');
Atomics.waitAsync(heap, waitIndex, 0, Infinity).value.then((value) => {
console.log(heap[waitIndex]);
});
</script></body></html> However not sure if this is adequate - requiring wrapping content in an iframe for garbage collection to work can be quite a footgun, so certainly in search of a better way here. |
… emscripten_atomic_cancel_wait_async() and emscripten_atomic_cancel_all_wait_asyncs() functions to allow canceling waitAsync operations. This does have a drawback of spurious wakeups, but necessary in order to work around GC problem WebAssembly/threads#176 .
/cc @syg |
Interesting example, Jukka, thanks! As @lars-t-hansen says, yeah, there's not really a way to cancel, and the question has baggage around cancellation of Promises. Before I pull on threads to see about a Promise cancellation mechanism again, which, given For the Unity use case, would either of the following be sufficient? And do you have a preference?
Also cc @marjakh |
The resolution proposal 1. would likely be sufficient for the GC reclaim scenario - such The resolution proposal 2. may be tricky, in particular the question is whether SAB.cancelAllAsyncWaiters() would cancel all .waitAsync()s in all Workers waiting on any address on the given SAB, or just all .waitAsync()s that were issued in the Worker that calls SAB.cancelAllAsyncWaiters(). If it canceled all Workers, then I think it would work. However, not needing to implement any code would certainly be preferable here - then the GC would "just work" without a danger that JS code could write any piece of code that generates essentially unreachable objects that can never free (in the page lifetime). There is also another scenario that becomes complicated from the lack of cancellation: deleting locks. To make the example concrete, let's say your wasm application has some kind of database of records that are stored in the wasm heap. The records are large enough that SAB/pthread-based Workers are used to download them in the background (think e.g. image files or .json data or something else), and/or maybe on the main thread also. Whenever a download finishes (e.g. via a XHR or Fetch Promise), the code will issue an So essentially we have N Workers that are each Then at some point, e.g. based on user input, it is decided that the database should close (but the wasm application still lives on, maybe opening a new database, or deciding to do something else). If implemented in common OOP fashion, the database lock address will likely have been dynamically allocated uint32 on the heap. In order to deinitialize the lock, safely At first I thought all of this could simply be avoided by using a helper uint32 next to the wait address as a field to denote whether the uint32 lock is valid, and when deinitializing, mark the lock invalid first, then However that does not quite work, but it has a problem that freeing a lock becomes an asynchronous operation! That is, the "is the lock valid" uint32 helper field needs to be kept alive until the last I can't think of a great way to resolve this issue, except to maybe start making some kind of refcounting things in wasm heap, where whenever a Worker issues a Atomics.waitAsync() on a memory address, it increments a refcount, and when a lock is deleted, it is marked deleted, and all pending .waitAsync()s finally resolve, then the last one to resolve will actually do the free(). However that will still have a big issue if one calls Worker.terminate(), since that would leak refcounts from that Worker, unless one keeps a global registry in wasm heap of all addresses that each Worker has .waitAsync()ed. I thought about imposing a restriction to Emscripten that pthread Workers are not allowed to do any .waitAsync()s, but only main thread is, and then only main thread would be allowed to delete any locks. Then it would be able to locally track in JS side state the set of alive locks, and clear out the .waitAsync()s. But this kind of deinitialization restriction seems a bit limiting from a C programmers viewpoint. If there existed a function SAB.cancelAllAsyncWaitersAtAddress(address) that cleared all waiters on all Workers on a given address, then I think this kind of lock deinitialization would be possible to implement synchronously without complications, and one could delete a lock by first issueing a So I wonder, would we be looking at two functions, |
Moi Jukka (&others), I'm mostly worrying about this from the "how do we release memory" point of view. In the example in OP, if you never call notify(), the code in the then-func will never be executed, so that's fine. But if you create waitAsyncs and never notify() them, in the current impl in V8, memory will grow without bound. (Even if the SAB is GCd!) Relevant doc where this use case is discussed: https://docs.google.com/document/d/1aeEGDm1XSqoJkQQKz9F75WqnuAa2caktxGy_O_KpO9Y/edit#heading=h.2cmggae83oqv (that link should link to the "Open question: Problem with location based lists" section). Re: SAB not being released (your example in here: #176 (comment) ). I think what happens here is that the closure you pass to then() keeps "heap" alive which keeps the SAB alive :) So it's not the waitAsync that keeps stuff alive. If you didn't refer to "heap" in your function, the SAB should get GCd just fine. (But we still keep internal data structures alive and leak memory that way.) Re: SAB.cancelAllAsyncWaiters APIs, just to confirm, the intent would be that it's a synchronous operation which does not call any callbacks (e.g., promise rejected callbacks), right? Otherwise it wouldn't help the "freeing a lock must be sync" use case. |
👋
That is exactly correct. It is the closure that keeps the heap alive. If the closure did not reference the heap, then the heap will be freed (as one can test by removing The rationale I now realize I missed explaining in that code is that it is extremely common to need to reference the heap index that one just waited on from within the Promise, so having to access
I presume it could work either by resolving with a special |
I think there are (at least) 2 separate problems here:
When you want to free memory, could you actually notify() everything at that point or is that cumbersome? You can make the closure no-op in that case, so it wouldn't access memory, so it shouldn't matter that you first free memory and only after that notify.
(I hope I didn't mess up that example :) )
That'd require you let the SAB die on your side. I might be grossly oversimplifying things here, feel free to point out what exactly :) Not adding an API would be simpler in a sense; we should just do the right thing anyway :-P But it requires the user to jump through hoops to let the SAB die (as it's indeed canonical use case to refer to it in the closure). So... not sure which is better. |
That would certainly help break the link. One difficulty maintaining such an indirection is due to WebAssembly Memory.grow. When memory growth occurs, one must reinitialize all views to the newly grown buffer, so if there existed multiple copies to these views, then it would have to be coordinated with the heap memory growth code to make sure they don't fall out of sync. This would practically limit the shape of any given codebase to be able to have at most one place where they
A difficulty here is that we are dealing with multiple Workers - so they each would have their own copy of the
It would probably be best if this could automatically be detected, also like what @syg pondered above. If there is a manual procedure one must follow, it can probably appear arcane for developers who are not experts on SAB+Atomics development, and they might not immediately realize there is an extra procedure to follow (and thus cause leaks). |
Ah, I see where I was oversimplifying this :) Thanks for the added info. It might be that coordinating workers wrt "letting the SAB die" (what must die is the backing store shared by all workers) is equally complex as what the cancelAllAsyncWaiters would do internally. The coordination would require some kind of a "drop all your references to the SAB" command sent to the workers. So maybe we want to add the explicit API to save the user from this complexity. I'll need to think about this a bit more. |
This is my current understanding as well after the discussion. My two initial proposed solutions and @marjakh's two problem statements in #176 (comment) are in a cycle of sorts. Ideally, as @juj says, we want cancellation of The ideal, "no user intervention needed" solution then seems to me that I wonder if the canonical use of |
I don't think WeakRefs would help here. Either the worker has an additional strong pointer to the SAB or not (in addition to the closure ptr which would now be weak). The worker's SAB is a normal object on the worker's side, independent of the SAB in the main thread, except that they refer to the same backing store. Has a strong pointer -> making the pointer in the closure weak doesn't change anything; the worker still needs to be told to nullify the strong pointer. Doesn't have a strong pointer -> the SAB will die too early and the worker won't be able to access it when notify() is called. Having a SAB in a different thread, referring to the same backing store, doesn't keep the worker's SAB alive. If that was the case, then maybe a weak pointer would help. Was that the scenario you were thinking about? |
Trying out WeakRefs with the original example, GC does occur properly:
That certainly is workable to solve the first problem of wasm program deinitialization (to my understanding), but it does have two awkward concerns:
The only way I can think of resolving this is to capture the underlying SAB itself, rather than the view, but that has the awkward result that then each .waitAsync resolve will require allocating a new SAB view to process the resolve. I.e. the code would have to look as follows to be correct (compare to the above):
I wonder if it would be feasible&sensible to have a new API |
One issue I note is that Safari does not support WeakRefs at all (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakRef), so it would require Apple to add support for that to Safari in tandem for the SAB support, if that was made a required part of the spec. |
In your latest example, if you force a GC, and then try to notify(), will the WeakRef be valid or invalid? I'd guess invalid. If valid, what is keeping the SAB alive? |
The WeakRef will be invalid, but that would be the intent in this example scenario. In the real application/runtime, the code structure looks something like this: function Module() {
var heap = new SharedArrayBuffer(128*1024*1024);
var weakheap = new WeakRef(heap);
// ...more global state
function waitAsync(waitIndex) {
Atomics.waitAsync(heap, waitIndex, 0, Infinity).value.then((value) => {
var heap = weakheap.deref();
if (heap) {
var view = new Int32Array(heap);
console.log(view[waitIndex]);
}
});
}
// ... more global functions
// main app entry point
function main() {
// ...start executing application, sets up all sorts of event handlers that can call Atomics.waitAsync()
}
// return exports
return {
main: main,
// ...other exports
}
}
// Creating an app
var app = Module();
// Deinitializing an app, this should cause the whole app contents to GC
app = null; In this case, what keeps the SAB alive is the |
Shall I make a proposal? |
The API seems useful (though it's quite subtle why that is :) ). The thought process I used for convincing myself: I wonder whether Is |
I suppose our semantics are already deeply tied into having a critical region for the wait/notify system inside which all ambiguities about racing wait/cancel pairs boil away, so this seems no worse than what we're already doing. My gut feeling says to restrict this to only async waiters, and to call it cancelAsync. Sync waiters will always have access to the memory and can always read a dedicated location to look for eg a cancellation code. I think there would need to be additional justification for making cancel work on these waiters. |
Sorry for probably missing it but: Can someone explain to me why Is it because the canceller needs to be able to do this independently from whomever calls |
Could you spell out some more what you had in mind for the extra parameter? |
async function doSomething({ signal } = {}) {
const { value } = await Atomics.waitAsync(view, waitIndex, 0, Infinity, { signal })
// do more stuff
}
const ac = new AbortController();
doSomething({ signal: ac.signal }); //
ac.abort(); // stops waiting, throws AbortError, just like fetch and other web APIs |
Thank you for the illustrative example. The uncomfortable but real political answer here is that TC39, where JS is developed, is a different standards body than WHATWG, where On technical grounds, the JS spec features also strive to be cross-platform, and AbortController is web platform-specific. |
Some of the non-browser platforms like Node.js and Deno have added support for I am wondering if the spec can be phrased in such a way that ECMAScript will support cancellation with Alternatively if that is impossible to propose a simpler signal-like API that Node can implement on its |
If it were only down to this "political"/layering issue, I would be motivated to work out some kind of general solution. However, when discussing it earlier with @marjakh , I got the impression that it's important for the cancellation to come from a different Worker (compared to where |
Yes the cancellation can certainly come from any Worker or the main thread. I wasn't aware there wasn't a way to share an AbortController across workers, thanks for pointing this out. |
Thinking about this a little more: I suspect we could define AbortController and AbortSignal as serializable objects (assuming they are restricted to only in-memory), but they aren't defined that way right now. But maybe such an interface would be overkill for this application, and index-based cancellation is more manageable and efficient. |
Why? Can you elaborate on this perhaps with a use case where one thread needs the capability of aborting a
I suspect that's the case - though we can also provide a method on atomics to obtain an AbortController for a given index. That kind of bypasses the "the capability to abort something and the capability to listen to cancellation are separate" thing.
I don't understand the problem domain well enough to know if that's a real limitation though |
@juj lays out the use case in the first half of this thread: deleting mutexes. Taking a step back, the primary use case for Since these are C/C++ pthreads programs compiled via emscripten (perhaps interacting with wasm), these waiters, async or otherwise, are always going to be notified from another thread. The fact that same-thread notifications are possible is a result of JS asynchrony, and indeed have use cases in JS itself. But given how low-level the API is (it's futexes), I imagine the majority use case will remain as an implementation building block for multithreaded synchronization mechanisms. |
Thanks @syg, that is very informative I also went back and read juj's comments and did some googling. I think Chrome's behaviour (of the first example) is probably a bug that's worth reporting if it wasn't already (like it does in iframe). I think I think Thanks for explaining it and convincing me. |
To be clear: the reason |
Heya, sorry for the long delay on this thread. I took it upon myself to come up with a test case that would polyfill simulate waitAsync() and cancelAsync() behavior to confirm whether such a But unfortunately I was not able to craft a test case that would not be a bit silly, so let me try to explain the race concern. The following sequence of events seems plausible that it could occur:
If we just have a main thread and no Workers, this race probably cannot happen, since naturally the But what if this scenario is colored with other Workers in the mix? It could be the case that right when main thread calls So it seems that such a |
@juj Indeed that scenario points out that That is indeed going to be harder to implement, nor am I sure it even makes sense. What would you consider to be a All of this makes me feel like that |
That is a good question. I do think if such thing exists, that it should probably not cover any transitively enqueued tasks.
(You probably meant to wrote "cancelAsync(); free();`?)
Indeed, and it seems that this coordination needs to also become asynchronous no matter what. The issue is that the mechanisms that one would use in regular native code to achieve such a guarantee do not seem to work for web/waitAsync code, but that achieving this guarantee fundamentally seems to turn the detaching/joining/parking code async as well. If I step away from porting existing C/C++ code and think of the canonical "webby" ways to use waitAsync instead via JavaScript, let's say for implementing a main thread producer - N Workers consumers message queue:
Say, for example a multithreaded image decoder could work like this, which seems to be a popular feature for web games. In this kind of scenario, when the work queue is idle, all N Workers have an active pending waitAsync active, waiting for C to become nonzero. Now, if one wanted to tear down this setup from the main thread, what would be the preferred way?
I wonder if we should have a Now, it looks like it is possible to emulate such a setTimeout-style token in JS by creating an auxiliary JS object that tracks "alive" waitAsyncs, and a JS-implemented However, that kind of method could potentially leave hundreds or thousands of such stale waitAsync()s alive in a program, since these types of waits could be performed at a high cadence, maybe even once or more per So it seems that there are two problems even on the native web/JS side operation of |
Good idea about the cancellation tokens, I'll need to think more about that. (Also, TC39 is coming up next week and I unfortunately won't really have time to fully re-engage here until afterwards.) |
Paging this back in. I like the cancellation token idea a lot, but ISTM the Of course, that can't be done synchronously since it'd be tantamount to blocking, so Would that work for you? |
Yikes :/ Yeah, I do follow that argument, but reality is that that would not work at all. I cannot defer calling Not sure I have ideas on what to propose next. Pragmatically I am starting to lean towards not including Atomics.waitAsync based access to synchronization primitives at all in the wasm workers API. Also commented in #177 (comment) , which relates to this somewhat. |
Or actually, are you thinking that the tokens would be integers that one can store to the SAB? Because if the model was that |
Ah indeed, that might work! |
In several scenarios, after having issued a
Atomics.waitAsync(...).value.then((value) => { /* do something */ });
operation, it can happen that some kind of resource/application/page deinitialization type of activity occurs, which deinitializes parts of the Wasm app that contain e.g. the data structures for, say, a lock, semaphore or other multithreading synchronization primitive that resided at that address.How would one ensure that if such application deinitialization does take place before the
waitAsync.then()
handler has been invoked, that the promise handler would never be fired (and/* do something */
should never get executed)?At application level, I can use a separate JavaScript variable
to ensure that
/* do something */
will never erroneously execute in case the wasm app deinitializes in between, but what happens if I want to deinitialize and letmyInt32Array
or other JS variables garbage collect altogether? I.e. the promise callback function(value) => { ... }
will capture a scope, that would surely pin downmyInt32Array
and other JS vars in the scope to remain alive indefinitely (since the.then()
will never resolve)?Is there a way to cancel an async wait and/or guarantee that JS garbage collection will safely occur?
The text was updated successfully, but these errors were encountered: