Support TextDecoder in pthreads mode #14399

kripken · 2021-06-07T21:01:51Z

Converting large strings from linear memory to JS is a lot faster with TextDecoder,
but that does not work on SharedArrayBuffers:

whatwg/encoding#172

So we avoid using TextDecoder then, and fall back to the path that creates
a string one character at a time. That path can be quite pathological, however,
incurring quadratic times in the worst case. Instead, with this PR we still use
TextDecoder, by copying the data to a normal ArrayBuffer first. The extra copy
adds some cost, but it is at least linear and predictable, and benchmarks show
it is much faster on large strings.

sbc100 · 2021-06-08T18:15:05Z

src/library_strings.js

+// in the generated code. The minimal runtime logic here actually runs the
+// library code at compile time (as a way to create a library*.js file around
+// non-library JS), and so we must define it here as well.
+var TextDecoderWrapper = TextDecoder;


I don't really grok this.. but not a blocker on landing :)

sbc100 · 2021-06-08T18:15:42Z

tests/benchmark_utf8.cpp

+    // Create strings of lengths 1-32, because the internals of text decoding
+    // have a cutoff of 16 for when to use TextDecoder, and we wish to test both
+    // (see UTF8ArrayToString).
+    char *str = randomString((rand() % 32) + 1);


Can we avoid using rand() in tests? How about i % 32 instead?

There is already another rand() usage in this file, on line 40. There is an srand so it should be deterministic. Is that ok with you? If not I can look into a larger refactor here for both of them (i is not accessible in the other place, so it's not trivial).

Yeah I did notice the other one... I don't think we need to fix that now, but can we just avoid the introduction of another one.. since its easy to use i here?

RReverser · 2021-06-09T11:36:36Z

src/runtime_strings.js

+// character.
+function TextDecoderWrapper(encoding) {
+  var textDecoder = new TextDecoder(encoding);
+  this.decode = function(data) {


Note that ideally we should use functions on the prototype instead of creating one per instance, but it's only possible with ES6 classes and I guess we currently expect output to be ES5-compatible?

Correct. Also, we just create up to 2 instances of this.

src/runtime_strings.js

RReverser · 2021-06-09T17:46:34Z

src/runtime_strings.js

+#if ASSERTIONS
+    assert(data instanceof Uint8Array);
+#endif
+    if (data.buffer instanceof SharedArrayBuffer) {


Btw, don't we know statically that this wrapper is only used on shared memory?

Hmm, you may be right... I thought there was some possibility, but I think it was some mode we removed in the past.

Changed to assert on it, good idea.

Ah, this can be called on side buffers too, like filesystem buffers. Reverted the assertion and added a comment.

This reverts commit 6e80c1e.

sbc100 · 2021-06-10T15:53:22Z

This change caused asan.test_utf8_textdecoder to start failing : https://ci.chromium.org/ui/p/emscripten-releases/builders/ci/linux-test-suites/b8844807836878211441/overview

The test has some loops that try to remove certain utf8 characters from the end (whitespace? I don't know enough utf8). The loops were missing a check on the string size not being 0. This bug existed before #14399, but before that PR we always used strings of size 8, and apparently were lucky enough to not run into a run of 8 characters that we want to remove. With that PR, the bug unsurfaced as we try various lengths, even 1. The asan test suite will be green again after this.

kripken · 2021-06-10T17:56:19Z

Asan error has been fixed in #14428

kripken added 4 commits June 7, 2021 11:50

wor

f82d090

fix

7fe8c59

better

8e5fc8c

comment [ci skip]

7b99cae

kripken requested review from juj and sbc100 June 7, 2021 21:01

sbc100 reviewed Jun 8, 2021

View reviewed changes

sbc100 approved these changes Jun 8, 2021

View reviewed changes

sbc100 reviewed Jun 8, 2021

View reviewed changes

RReverser reviewed Jun 9, 2021

View reviewed changes

src/runtime_strings.js Show resolved Hide resolved

kripken added 2 commits June 9, 2021 10:12

add assert

0f76328

Merge remote-tracking branch 'origin/main' into pth-td

ca3c156

RReverser reviewed Jun 9, 2021

View reviewed changes

kripken and others added 4 commits June 9, 2021 10:53

Avoid adding rand() usage

b4c9645

assert statically on SAB usage

6e80c1e

Revert "assert statically on SAB usage"

78424fb

This reverts commit 6e80c1e.

comment

3fe0d97

kripken merged commit 9af077b into main Jun 9, 2021

kripken deleted the pth-td branch June 9, 2021 22:19

sbc100 mentioned this pull request Jun 10, 2021

2.0.24 emscripten-core/emsdk#839

Merged

kripken mentioned this pull request Jun 10, 2021

Fix asan.test_utf8_textdecoder #14428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support TextDecoder in pthreads mode #14399

Support TextDecoder in pthreads mode #14399

kripken commented Jun 7, 2021

sbc100 Jun 8, 2021

sbc100 Jun 8, 2021

kripken Jun 9, 2021

sbc100 Jun 9, 2021

RReverser Jun 9, 2021

kripken Jun 9, 2021

RReverser Jun 9, 2021

kripken Jun 9, 2021

kripken Jun 9, 2021

sbc100 commented Jun 10, 2021

kripken commented Jun 10, 2021

Support TextDecoder in pthreads mode #14399

Support TextDecoder in pthreads mode #14399

Conversation

kripken commented Jun 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 commented Jun 10, 2021

kripken commented Jun 10, 2021