buffer: port `byteLengthUtf8` to JavaScript #18356

indutny · 2018-01-24T19:24:36Z

Please don't land this PR yet. There is a missing V8 optimization, that is going to be available soon
in V8 6.6.

Prior to this change the majority of the time spent when
calling Buffer.byteLength was spent on crossing JS->C++ boundary. This
change move the function to JavaScript, making it much faster.

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

buffer

Prior to this change the majority of the time spent when calling `Buffer.byteLength` was spent on crossing JS->C++ boundary. This change move the function to JavaScript, making it much faster.

jasnell · 2018-01-24T22:08:36Z

Performance boost I assume?

indutny · 2018-01-24T22:11:36Z

@jasnell forgot to write about it. It is 2x faster on small strings, but is slower on large ones. This is the reason for awaiting for v8 update.

jasnell · 2018-01-24T22:16:16Z

If the perf loss on the large ones is significant, perhaps split the difference and use the js method for short ones and fallback on longer?

indutny · 2018-01-24T22:19:10Z

Perhaps, but I'd rather wait for V8 update. We've been living with C++ code for awhile now.

indutny · 2018-01-24T23:09:41Z

Added expected V8 version.

bnoordhuis · 2018-01-30T19:54:09Z

lib/buffer.js

+    // NOTE: 0 <= code <= 0xffff
+    var code = string.charCodeAt(i);
+    if (code <= 0x7f) {
+      len++;


len += 1; for consistency with line 348.

Nitpicking 😉

indutny · 2018-03-07T19:31:25Z

Looks like this might be ready after #19201 lands.

indutny · 2018-03-07T19:35:33Z

Hm... it is still slow even on V8 6.6. cc @bmeurer

Here's the benchmark that I used: https://gist.github.com/indutny/0fe9a3ed3b93d558a60deacefc37e378 . The results are:

c++: 7.052ms
js: 6.731ms
c++: 235.835ms
js: 5739.299ms

bmeurer · 2018-03-08T08:32:47Z

You need to pass --nountrusted-code-mitigations to disable the Spectre mitigations, which also affect String#charCodeAt. But even then the C++ code is going to be significantly faster. I'm not sure we can fix the difference here.

indutny · 2018-03-08T17:17:48Z

@bmeurer tbh, I don't see a reason for it to be slow. The performance seems to be dependent on string flattening, which apparently doesn't happen in JS case. Could it potentially happen at all?

bmeurer · 2018-03-08T17:54:38Z

@indutny The reason is that String#charCodeAt() has to dispatch on string type all the time (potentially 8 different types for the "fast case"), whereas the C++ code just accesses the character data. We could try to make TurboFan a bit smarter here if we find a way to do the flattening (which is indeed happening here) in the peeled loop iteration only and then have only sequential one-byte string access inside the actual loop. But that's work that needs to be done and I'm a bit worried that we over-optimize for the particular example.

bmeurer · 2018-03-08T17:55:19Z

BTW V8 tip-of-tree should be faster already thanks to the recent work by @sigurdschneider.

indutny · 2018-03-08T18:24:31Z

@bmeurer thank you!

@jasnell and @nodejs/collaborators looks like it'd best to close this.

bmeurer · 2018-03-08T19:15:23Z

Not saying it's not worth it, but it's definitely not high priority.

Did you check with tip-of-tree? And --nountrusted-code-mitigations?

indutny · 2018-03-08T19:41:01Z

@bmeurer I checked --nountrusted-code-mitigations, and although it was faster - it was still significantly slower than C++. Didn't check tip-of-tree yet.

srl295

seems ok (did not check the math in detail)… might also compare to http://source.icu-project.org/repos/icu/trunk/icu4c/source/common/unicode/utf8.h which has been highly optimized

BridgeAR · 2018-04-10T00:05:28Z

Should this stay open? It seems like the JS implementation can not get as fast as the C++ implementation. Ping @indutny

BridgeAR · 2018-04-16T03:17:33Z

Closing as this does not seem like it is the right approach in JS.

@indutny please reopen in case you disagree and want to continue working on this.

indutny · 2018-04-16T03:26:45Z

I think we could potentially apply it to short strings. Anyone up to do some benchmarks?

nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Jan 24, 2018

buffer: port byteLengthUtf8 to JavaScript

de2287e

Prior to this change the majority of the time spent when calling `Buffer.byteLength` was spent on crossing JS->C++ boundary. This change move the function to JavaScript, making it much faster.

indutny force-pushed the feature/js-buffer-length-utf8 branch from 2c35736 to de2287e Compare January 24, 2018 19:28

joyeecheung added the wip Issues and PRs that are still a work in progress. label Jan 24, 2018

addaleax approved these changes Jan 24, 2018

View reviewed changes

maclover7 force-pushed the master branch from bb5575a to 993b716 Compare January 26, 2018 22:03

cjihrig force-pushed the master branch from 993b716 to 082f952 Compare January 26, 2018 22:36

bnoordhuis approved these changes Jan 30, 2018

View reviewed changes

srl295 approved these changes Mar 8, 2018

View reviewed changes

BridgeAR closed this Apr 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer: port `byteLengthUtf8` to JavaScript #18356

buffer: port `byteLengthUtf8` to JavaScript #18356

indutny commented Jan 24, 2018 •

edited

Loading

jasnell commented Jan 24, 2018

indutny commented Jan 24, 2018

jasnell commented Jan 24, 2018

indutny commented Jan 24, 2018

indutny commented Jan 24, 2018

bnoordhuis Jan 30, 2018

indutny Jan 30, 2018

indutny commented Mar 7, 2018

indutny commented Mar 7, 2018

bmeurer commented Mar 8, 2018

indutny commented Mar 8, 2018

bmeurer commented Mar 8, 2018

bmeurer commented Mar 8, 2018

indutny commented Mar 8, 2018

bmeurer commented Mar 8, 2018

indutny commented Mar 8, 2018

srl295 left a comment

BridgeAR commented Apr 10, 2018

BridgeAR commented Apr 16, 2018

indutny commented Apr 16, 2018

buffer: port byteLengthUtf8 to JavaScript #18356

buffer: port byteLengthUtf8 to JavaScript #18356

Conversation

indutny commented Jan 24, 2018 • edited Loading

Checklist

Affected core subsystem(s)

jasnell commented Jan 24, 2018

indutny commented Jan 24, 2018

jasnell commented Jan 24, 2018

indutny commented Jan 24, 2018

indutny commented Jan 24, 2018

bnoordhuis Jan 30, 2018

Choose a reason for hiding this comment

indutny Jan 30, 2018

Choose a reason for hiding this comment

indutny commented Mar 7, 2018

indutny commented Mar 7, 2018

bmeurer commented Mar 8, 2018

indutny commented Mar 8, 2018

bmeurer commented Mar 8, 2018

bmeurer commented Mar 8, 2018

indutny commented Mar 8, 2018

bmeurer commented Mar 8, 2018

indutny commented Mar 8, 2018

srl295 left a comment

Choose a reason for hiding this comment

BridgeAR commented Apr 10, 2018

BridgeAR commented Apr 16, 2018

indutny commented Apr 16, 2018

buffer: port `byteLengthUtf8` to JavaScript #18356

buffer: port `byteLengthUtf8` to JavaScript #18356

indutny commented Jan 24, 2018 •

edited

Loading