Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TextDecoder in pthreads mode #14399

Merged
merged 10 commits into from
Jun 9, 2021
Merged

Support TextDecoder in pthreads mode #14399

merged 10 commits into from
Jun 9, 2021

Conversation

kripken
Copy link
Member

@kripken kripken commented Jun 7, 2021

Converting large strings from linear memory to JS is a lot faster with TextDecoder,
but that does not work on SharedArrayBuffers:

whatwg/encoding#172

So we avoid using TextDecoder then, and fall back to the path that creates
a string one character at a time. That path can be quite pathological, however,
incurring quadratic times in the worst case. Instead, with this PR we still use
TextDecoder, by copying the data to a normal ArrayBuffer first. The extra copy
adds some cost, but it is at least linear and predictable, and benchmarks show
it is much faster on large strings.

@kripken kripken requested review from juj and sbc100 June 7, 2021 21:01
// in the generated code. The minimal runtime logic here actually runs the
// library code at compile time (as a way to create a library*.js file around
// non-library JS), and so we must define it here as well.
var TextDecoderWrapper = TextDecoder;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really grok this.. but not a blocker on landing :)

// Create strings of lengths 1-32, because the internals of text decoding
// have a cutoff of 16 for when to use TextDecoder, and we wish to test both
// (see UTF8ArrayToString).
char *str = randomString((rand() % 32) + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid using rand() in tests? How about i % 32 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already another rand() usage in this file, on line 40. There is an srand so it should be deterministic. Is that ok with you? If not I can look into a larger refactor here for both of them (i is not accessible in the other place, so it's not trivial).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did notice the other one... I don't think we need to fix that now, but can we just avoid the introduction of another one.. since its easy to use i here?

// character.
function TextDecoderWrapper(encoding) {
var textDecoder = new TextDecoder(encoding);
this.decode = function(data) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ideally we should use functions on the prototype instead of creating one per instance, but it's only possible with ES6 classes and I guess we currently expect output to be ES5-compatible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Also, we just create up to 2 instances of this.

#if ASSERTIONS
assert(data instanceof Uint8Array);
#endif
if (data.buffer instanceof SharedArrayBuffer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, don't we know statically that this wrapper is only used on shared memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you may be right... I thought there was some possibility, but I think it was some mode we removed in the past.

Changed to assert on it, good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this can be called on side buffers too, like filesystem buffers. Reverted the assertion and added a comment.

@kripken kripken merged commit 9af077b into main Jun 9, 2021
@kripken kripken deleted the pth-td branch June 9, 2021 22:19
@sbc100
Copy link
Collaborator

sbc100 commented Jun 10, 2021

This change caused asan.test_utf8_textdecoder to start failing : https://ci.chromium.org/ui/p/emscripten-releases/builders/ci/linux-test-suites/b8844807836878211441/overview

kripken added a commit that referenced this pull request Jun 10, 2021
The test has some loops that try to remove certain utf8 characters
from the end (whitespace? I don't know enough utf8). The loops were
missing a check on the string size not being 0.

This bug existed before #14399, but before that PR we always used
strings of size 8, and apparently were lucky enough to not run into
a run of 8 characters that we want to remove. With that PR, the bug
unsurfaced as we try various lengths, even 1.

The asan test suite will be green again after this.
@kripken
Copy link
Member Author

kripken commented Jun 10, 2021

Asan error has been fixed in #14428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants