-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: port byteLengthUtf8
to JavaScript
#18356
Conversation
Prior to this change the majority of the time spent when calling `Buffer.byteLength` was spent on crossing JS->C++ boundary. This change move the function to JavaScript, making it much faster.
2c35736
to
de2287e
Compare
Performance boost I assume? |
@jasnell forgot to write about it. It is 2x faster on small strings, but is slower on large ones. This is the reason for awaiting for v8 update. |
If the perf loss on the large ones is significant, perhaps split the difference and use the js method for short ones and fallback on longer? |
Perhaps, but I'd rather wait for V8 update. We've been living with C++ code for awhile now. |
Added expected V8 version. |
// NOTE: 0 <= code <= 0xffff | ||
var code = string.charCodeAt(i); | ||
if (code <= 0x7f) { | ||
len++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len += 1;
for consistency with line 348.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking 😉
Looks like this might be ready after #19201 lands. |
Hm... it is still slow even on V8 6.6. cc @bmeurer Here's the benchmark that I used: https://gist.github.com/indutny/0fe9a3ed3b93d558a60deacefc37e378 . The results are:
|
You need to pass |
@bmeurer tbh, I don't see a reason for it to be slow. The performance seems to be dependent on string flattening, which apparently doesn't happen in JS case. Could it potentially happen at all? |
@indutny The reason is that String#charCodeAt() has to dispatch on string type all the time (potentially 8 different types for the "fast case"), whereas the C++ code just accesses the character data. We could try to make TurboFan a bit smarter here if we find a way to do the flattening (which is indeed happening here) in the peeled loop iteration only and then have only sequential one-byte string access inside the actual loop. But that's work that needs to be done and I'm a bit worried that we over-optimize for the particular example. |
BTW V8 tip-of-tree should be faster already thanks to the recent work by @sigurdschneider. |
Not saying it's not worth it, but it's definitely not high priority. Did you check with tip-of-tree? And |
@bmeurer I checked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems ok (did not check the math in detail)… might also compare to http://source.icu-project.org/repos/icu/trunk/icu4c/source/common/unicode/utf8.h which has been highly optimized
Should this stay open? It seems like the JS implementation can not get as fast as the C++ implementation. Ping @indutny |
Closing as this does not seem like it is the right approach in JS. @indutny please reopen in case you disagree and want to continue working on this. |
I think we could potentially apply it to short strings. Anyone up to do some benchmarks? |
Please don't land this PR yet. There is a missing V8 optimization, that is going to be available soon
in V8 6.6.
Prior to this change the majority of the time spent when
calling
Buffer.byteLength
was spent on crossing JS->C++ boundary. Thischange move the function to JavaScript, making it much faster.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
buffer