-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make decode() and encodeInto() accept SharedArrayBuffer-backed views #172
Comments
Note that this style of (ArrayBuffer, startIndex, length) is exactly a TypedArray, and consensus has been to use Typed Arrays for this API design in the past. See #69 and https://esdiscuss.org/topic/idiomatic-representation-of-buffer-bytesread . So, that aspect of your before/after code samples would stay the same; you'd still use subarray() with any new APIs we add. Separately, I don't understand how you find the null byte in your second code example. Are you suggesting decodeRange treat null bytes specially? |
An option to stop decoding on null was discussed very early on in the API design. I think we rejected it due to lack of use cases at the time. We'd need to make sure the decoding algorithms play nicely with it, and there is no ambiguity about a null in the middle of a multibyte sequence. (I'd assume this is designed into the encodings, but don't actually know.) |
For SAB-compatibility: is that as simple as adding [AllowShared] to decode? Regarding the garbage created by temporary views, it is a distinct goal of the Host Bindings proposal (explainer quite outdated; refresh coming soon) which we've re-started focusing on recently, including others from the Google Emscripten team like @jgravelle-google. Basically, wasm should be able to pass two For the null-terminating use case: yes, I can definitely see the utility in specifying a terminating code point (like null) to avoid an otherwise superfluous scan through the string. Maybe we can focus on that use case? |
Welcome @juj and others. https://whatwg.org/working-mode and https://whatwg.org/faq#adding-new-features might help provide some additional context, though I think we have enough to move forward here. For SAB, is the copy that https://encoding.spec.whatwg.org/#dom-textdecoder-decode makes acceptable? If not, we'd somehow have to make decoding robust against input changing, right? It's not clear that's desirable. Either way, that would require a fair number of tests. For scanning, we could add something to https://encoding.spec.whatwg.org/#textdecodeoptions API-wise. I'm not sure if encoding libraries expose such primitives though, but if not they could be added I suppose. cc @hsivonen |
On the first look, being able to use
The semantics of For non-UTF-8 encodings, I'm definitely not going to be baking optional zero termination into the conversion loop, so there'd be a DOM-side prescan if the API got an option to look for a zero terminator. For UTF-8, I'm very reluctant to add yet another UTF-8 to UTF-16 converter that had zero termination baked into the conversion loop, but it might be possible to persuade me to add a different UTF-8 to UTF-16 converter that interleaves zero termination into the converter if there's data to show that it would be a big win for Firefox's wasm story for wasm code that uses C strings. It won't be baked in as a flag into the primary UTF-8 to UTF-16 converter, though. (For both SSE2 and aarch64 checking a 16-byte vector for ASCII is one vector instruction followed by one ALU comparison than can be branched on. Adding checking if there's a zero byte in the vector would complicate things in the hottest loop that works fine for C++ and Rust callers.) I think C in wasm should run
UTF-16BE and UTF-16LE are not compatible with C strings. The other encodings defined in the Encoding Standard are. |
|
So basically all DOM code operating on In Gecko, Are we really relying on the compilers we use to write DOM-side code not optimizing too hard on the assumption that memory doesn't change from underneath us in cases where it would be UB for it to change underneath us? |
Yes and no. Currently our DOM code can't extract the buffer pointer without extracting a sharedness flag at the same time, and in some(*) cases where it does that we assert that the sharedness flag is false and don't touch the memory if it is shared. We are aware that if it did touch that memory it would be UB. The plan is to remedy this by using UB-safe primitives for accessing shared memory, see https://bugzilla.mozilla.org/show_bug.cgi?id=1225033. In this case, DOM would have (as the JS engine already has) two code paths, one for known-unshared memory and one for possibly-shared memory, where in non-perf-critical cases we can just use the latter. Every DOM API that can handle shared memory will have to be adapted to greater or lesser extent to deal with that reality. Also see long comment in dom/bindings/TypedArray.h. (*) When the SAB code landed I believe I handled all the cases in DOM at the time by either disallowing shared memory (error out) or by using UB-prone code or by pretending the memory is zero-length, but I may have missed some spots and/or DOM may have grown in the mean time and/or code I wrote may have since been rewritten. (Edited for clarity) |
OK, so from C++ a safely-racy load is a call through a function pointer to a JIT-generated function that the C++ compiler never can see into. I wonder if just using the safely-racy Is there any plan to provide a Rust crate that provides access to these operations? Is there any plan to provide safely-racy loads and stores of 128-bit SIMD vectors? Is there's a reliable way (inline asm with the right amount of In the case of |
Almost certainly. The safely-racy memcpy isn't as fast as a native memcpy but we can improve it to approach native speed; that is not true for a call-per-byte.
There have been no plans for this since plans are driven by need and with SAB being disabled the need hasn't arisen, but I don't think there's any reason in principle we could not do so.
I don't know and until the one-compiler-to-rule-them-all reality that we have now it wasn't really an interesting question, as we have had other compilers and indeed other compilers that don't do inline asm at all. My gut feeling was that it is easier to trust the C++ compiler with a call to a function that it couldn't possibly know anything about than with inline asm that it might try to figure out the meaning of. Patches welcome, along with proofs that they are adequate to avoid UB :)
I don't think I know enough about Rust to have an opinion. Even in the case of C++ it's not completely certain that we had to go as far as we have with the UB-safe primitives, but we now have every reason to believe that they are safe and that UB will not bite us (when they are used properly). My gut feeling is that Rust will have exactly the same problems as C++, esp given that rustc uses llvm. |
If the most obvious path to for-sure non-UB implementation is to make the DOM code use a presently-slower-than- |
Thinking about this more, being able to use C++11/C11 atomic operations for this would be nice (and Rust-compatible). |
C++11/C11 atomic operations are not usable I believe - they assume race-free programs. This also means they may be optimized in certain ways that runs counter to the JS/wasm memory model. gcc is quite explicit about this in its documentation, saying that its __atomic intrinsics may not execute fences if gcc thinks they're not required. (This is getting us into the weeds, probably better handled in email.) |
Off-issue discussion and investigation has convinced me that even if the first round of host implementation wasn't any faster than having the site-provided code make a copy, there's enough opportunity for subsequent host-side optimization for this spec change to make sense. |
I think from a specification-perspective the only thing that's needed here is adding I suspect we might want to add a note for those races and I'd appreciate help drafting that. (And we'll need tests.) @juj if null-terminated strings continue to be a compelling enough problem to address with builtin functionality, could you please file a separate issue on that? I think it's largely separable from shared memory use cases and I'd like not to do too much in one place. |
PR for this is at #182. Still needs web-platform-tests coverage. Anyone willing to volunteer? |
@AnneK can you post link to the exiting tests? |
Heya, https://github.com/web-platform-tests/wpt/tree/master/encoding has the tests. https://web-platform-tests.org/writing-tests/testharness.html#auto-generated-test-boilerplate and https://web-platform-tests.org/writing-tests/testharness-api.html are relevant parts of the documentation for these tests. I think for these tests we mainly want to ensure |
To make sure i'm on the right direction, |
That looks like a great start to me. I generally prefer abstracting/templating as you have done there to ensure coverage stays consistent over time. |
I've opened a PR: I've skipped the |
Tests: web-platform-tests/wpt#19531. Fixes #172.
It might take a while for the standard to be updated with this change due to speced/bikeshed#1537. Tests have landed though and implementation bugs have been filed as well (linked from #182). |
It's updated now (search for |
It is nice that AllowShared was added to TextDecoder spec, however the issue that I posted in the original comment remains, that by changing the existing function, there is no good way to discover whether a browser supports the updated form:
The change has fixed point 1. out of the three points above, but points 2. and 3. from the original comment (generated garbage and need to manually scan a string in O(n) time to find its null termination point before being able to call .decode()) still persist in Wasm applications. Would it be possible to revisit to fix the remaining problems 2. and 3. above? |
2 and 3 were discussed above. 2 can be optimized by engines and we'd have to see some compelling evidence to overturn that decision. See #172 (comment) for 3. |
Converting large strings from linear memory to JS is a lot faster with TextDecoder, but that does not work on SharedArrayBuffers: whatwg/encoding#172 So we avoid using TextDecoder then, and fall back to the path that creates a string one character at a time. That path can be quite pathological, however, incurring quadratic times in the worst case. Instead, with this PR we still use TextDecoder, by copying the data to a normal ArrayBuffer first. The extra copy adds some cost, but it is at least linear and predictable, and benchmarks show it is much faster on large strings.
It looks like neither For example Firefox throws an error |
Node.js v14.15.5 does implement this, and it allows decoding text from shared views. |
It's tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1561594 and might have fallen off the radar. @hsivonen would it be a lot of work to tackle that? |
The Chromium bug is https://crbug.com/1012656 (currently unassigned). |
Should be quite easy using |
As far as I can tell, Node.js's implementation of both Deno doesn't support |
I see this is still the case but for what is worth it, this is all it takes in my code to convert, I just use an |
In conjunction to WebAssembly and multithreading, there is a need for a new
TextDecoder.decode()
API for converting e.g. UTF-8 encoded strings in a typed array to a JavaScript string.Currently to convert a string in WebAssembly heap to a JS string, one can do
There are three shortcomings with this API that are bugging Emscripten/asm.js/WebAssembly uses:
TextDecoder.decode()
does not work with SharedArrayBuffer, so the above fails if wasmHeap viewed a SharedArrayBuffer in a multithreaded WebAssembly program.TextDecoder.decode()
needs a TypedArrayView, and it always converts the whole view. As result, one has to callwasmHeap.subarray()
on the large wasm heap to generate a small view that only encompasses the portion of the memory that the contains the string. This generates temporary garbage that would be needless with a more appropriate API.The semantics of
TextDecoder.decode()
are to always convert the whole input view that is passed to the function. This means that if there exists a null byte\0
in the middle of the view, the generated JS string will have a null UTF-16 code point in it in the middle. I.e. decoding will not stop when the first null byte is found, but continues on from there. This has the effect that in order to use the API from a WebAssembly program that is dealing with null-terminated C strings, JavaScript or Wasm code must first scan the whole string to find the first null byte. This is harmful for performance when dealing with long strings. It would be better to have an API where the decode size that is specified would be amaxBytesToRead
style of size, instead of exact size. That way JS/WebAssembly did not need to pre-scan through each string to find how long the string actually is, improving performance.This kind of code often occurs in compiled C programs, which already provide max sizes of their buffers to deal against buffer overflows in C code. That is,
or
Having to do a O(N) scan to figure out a buffer overflow guard bound would not be ideal.
It would be good to have a new function on
TextDecoder
, e.g.which would allow reading from SharedArrayBuffers, took in startIdx to the array, and optionally the max number of elements to read. This is parallel to what was done to WebGL 2, with the advent of WebAssembly and multithreading: all entry points in WebGL 2 dealing with typed arrays accumulated a new variant of the function that take in SharedArrayBuffers and do not produce temporary garbage: https://www.khronos.org/registry/webgl/specs/latest/2.0/#3.7
Also in case of WebGL, all API entry points were retroactively re-specced to allow SharedArrayBuffers and SharedArrayViews in addition to regular typed arrays and views. That would be nice to happen with
decode()
as well, although if so, it should probably happen exactly at the same time as a new functiondecodeRange()
was added, so that code can feature test via the presence ofdecodeRange()
if the olddecode()
function has been improved or not.With such a new
decodeRange()
function, the JS code at the very top would transform towhich would work with multithreading, improve performance, and be smaller code size.
The reason why this is somewhat important is that with Emscripten and WebAssembly, marshalling strings across wasm and JS language barriers is really common, something most Emscripten compiled applications are doing, and in the absence of multithreading capable text marshalling, applications that need string marshalling have to resort to a manual JS side implementation that loops and appends
String.fromCharCode()
s character by character:https://github.com/emscripten-core/emscripten/blob/c2b3c49f71ab98fbd9ff829d6cbd30445b56a93e/src/runtime_strings.js#L98
It would be good for that code to be able to go away.
CC @kripken, @dschuff, @lars-t-hansen, @binji, @lukewagner, @titzer, @bnjbvr, @aheejin , who have been working on WebAssembly multithreading.
The text was updated successfully, but these errors were encountered: