-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support TextDecoder in pthreads mode #14399
Conversation
// in the generated code. The minimal runtime logic here actually runs the | ||
// library code at compile time (as a way to create a library*.js file around | ||
// non-library JS), and so we must define it here as well. | ||
var TextDecoderWrapper = TextDecoder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really grok this.. but not a blocker on landing :)
tests/benchmark_utf8.cpp
Outdated
// Create strings of lengths 1-32, because the internals of text decoding | ||
// have a cutoff of 16 for when to use TextDecoder, and we wish to test both | ||
// (see UTF8ArrayToString). | ||
char *str = randomString((rand() % 32) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using rand() in tests? How about i % 32
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already another rand()
usage in this file, on line 40. There is an srand
so it should be deterministic. Is that ok with you? If not I can look into a larger refactor here for both of them (i
is not accessible in the other place, so it's not trivial).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I did notice the other one... I don't think we need to fix that now, but can we just avoid the introduction of another one.. since its easy to use i here?
// character. | ||
function TextDecoderWrapper(encoding) { | ||
var textDecoder = new TextDecoder(encoding); | ||
this.decode = function(data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that ideally we should use functions on the prototype instead of creating one per instance, but it's only possible with ES6 classes and I guess we currently expect output to be ES5-compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Also, we just create up to 2 instances of this.
#if ASSERTIONS | ||
assert(data instanceof Uint8Array); | ||
#endif | ||
if (data.buffer instanceof SharedArrayBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, don't we know statically that this wrapper is only used on shared memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you may be right... I thought there was some possibility, but I think it was some mode we removed in the past.
Changed to assert on it, good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this can be called on side buffers too, like filesystem buffers. Reverted the assertion and added a comment.
This change caused |
The test has some loops that try to remove certain utf8 characters from the end (whitespace? I don't know enough utf8). The loops were missing a check on the string size not being 0. This bug existed before #14399, but before that PR we always used strings of size 8, and apparently were lucky enough to not run into a run of 8 characters that we want to remove. With that PR, the bug unsurfaced as we try various lengths, even 1. The asan test suite will be green again after this.
Asan error has been fixed in #14428 |
Converting large strings from linear memory to JS is a lot faster with TextDecoder,
but that does not work on SharedArrayBuffers:
whatwg/encoding#172
So we avoid using TextDecoder then, and fall back to the path that creates
a string one character at a time. That path can be quite pathological, however,
incurring quadratic times in the worst case. Instead, with this PR we still use
TextDecoder, by copying the data to a normal ArrayBuffer first. The extra copy
adds some cost, but it is at least linear and predictable, and benchmarks show
it is much faster on large strings.