Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: speed up Buffer.isEncoding() method #5256

Closed
wants to merge 1 commit into from

Conversation

JacksonTian
Copy link
Contributor

Use automata to avoid toLowerCase(), faster, but more dirty.

@JacksonTian
Copy link
Contributor Author

Following is benchmark result:

$ ./node benchmark/compare.js ./node ~/.tnvm/versions/node/v5.5.0/bin/node -- buffers buffer-is-encoding.js
running ./node
buffers/buffer-is-encoding.js
running /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node
buffers/buffer-is-encoding.js

buffers/buffer-is-encoding.js encoding=hex: ./node: 37865000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 53626000 ......... -29.39%
buffers/buffer-is-encoding.js encoding=utf8: ./node: 32338000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 46584000 ........ -30.58%
buffers/buffer-is-encoding.js encoding=utf-8: ./node: 29117000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 29546000 ........ -1.45%
buffers/buffer-is-encoding.js encoding=ascii: ./node: 32724000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 26424000 ........ 23.84%
buffers/buffer-is-encoding.js encoding=binary: ./node: 26486000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 22335000 ....... 18.59%
buffers/buffer-is-encoding.js encoding=base64: ./node: 29564000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 17563000 ....... 68.33%
buffers/buffer-is-encoding.js encoding=ucs2: ./node: 36896000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 17559000 ........ 110.12%
buffers/buffer-is-encoding.js encoding=ucs-2: ./node: 31430000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 15970000 ........ 96.81%
buffers/buffer-is-encoding.js encoding=utf16le: ./node: 25419000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 14396000 ...... 76.57%
buffers/buffer-is-encoding.js encoding=utf-16le: ./node: 25545000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 12160000 .... 110.07%
buffers/buffer-is-encoding.js encoding=HEX: ./node: 38260000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 6280700 .......... 509.18%
buffers/buffer-is-encoding.js encoding=UTF8: ./node: 33291000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 6278300 ......... 430.25%
buffers/buffer-is-encoding.js encoding=UTF-8: ./node: 28115000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5482000 ........ 412.87%
buffers/buffer-is-encoding.js encoding=ASCII: ./node: 33288000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4861600 ........ 584.71%
buffers/buffer-is-encoding.js encoding=BINARY: ./node: 25741000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5067200 ....... 408.00%
buffers/buffer-is-encoding.js encoding=BASE64: ./node: 32574000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5095100 ....... 539.33%
buffers/buffer-is-encoding.js encoding=UCS2: ./node: 32299000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5052700 ......... 539.24%
buffers/buffer-is-encoding.js encoding=UCS-2: ./node: 32000000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4389800 ........ 628.97%
buffers/buffer-is-encoding.js encoding=UTF16LE: ./node: 23653000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4427300 ...... 434.26%
buffers/buffer-is-encoding.js encoding=UTF-16LE: ./node: 25788000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4756400 ..... 442.17%
buffers/buffer-is-encoding.js encoding=utf9: ./node: 33145000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4541200 ......... 629.87%
buffers/buffer-is-encoding.js encoding=utf-7: ./node: 31347000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4464600 ........ 602.13%
buffers/buffer-is-encoding.js encoding=utf17le: ./node: 31211000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4309600 ...... 624.23%
buffers/buffer-is-encoding.js encoding=utf-17le: ./node: 21892000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 5136700 ..... 326.19%
buffers/buffer-is-encoding.js encoding=Unicode-FTW: ./node: 61616000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4847600 . 1171.06%
buffers/buffer-is-encoding.js encoding=new gnu gun: ./node: 68657000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 4702100 . 1360.13%

Use automata to avoid toLowerCase(), faster, but more dirty.
@ChALkeR ChALkeR added the buffer Issues and PRs related to the buffer subsystem. label Feb 16, 2016
@ChALkeR
Copy link
Member

ChALkeR commented Feb 16, 2016

Ah, ignore my previous comment, I misread your results =). I removed it.

@ronkorving
Copy link
Contributor

Is this not getting out of hand? I appreciate all the performance work being done, and being quite the performance geek myself, I really enjoy this. But how is this maintainable?

@JacksonTian
Copy link
Contributor Author

: )

@ronkorving
Copy link
Contributor

If I may make a suggestion that would keep maintainability at a higher level. Why not generate a function that generates the very code you wrote here (as a string), based on an array of strings you feed into it, then feed that into new Function and reap the benefits from that? See also acorn

@ChALkeR
Copy link
Member

ChALkeR commented Feb 16, 2016

I'm not exactly sure that this is worth it.
By selecting only the popular encodings, we get:

buffers/buffer-is-encoding.js encoding=hex: ./node: 37865000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 53626000 ......... -29.39%
buffers/buffer-is-encoding.js encoding=utf8: ./node: 32338000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 46584000 ........ -30.58%
buffers/buffer-is-encoding.js encoding=utf-8: ./node: 29117000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 29546000 ........ -1.45%
buffers/buffer-is-encoding.js encoding=ascii: ./node: 32724000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 26424000 ........ 23.84%
buffers/buffer-is-encoding.js encoding=binary: ./node: 26486000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 22335000 ....... 18.59%
buffers/buffer-is-encoding.js encoding=base64: ./node: 29564000 /Users/jacksontian/.tnvm/versions/node/v5.5.0/bin/node: 17563000 ....... 68.33%

Also, I doubt that it's the bottleneck in some actual code.

@JacksonTian
Copy link
Contributor Author

@ronkorving

So we have to trade performance off against maintainable.

Feel free to discuss it.

@mscdex
Copy link
Contributor

mscdex commented Feb 16, 2016

I think I'd be more in favor of what @ronkorving is suggesting. It shouldn't be that hard to do either. Also the generated and generator function could be re-used in other places where case-insensitive (static) string matching is done.

@ChALkeR
Copy link
Member

ChALkeR commented Feb 16, 2016

My results, alternative approach (this is compared with your PR, not with the current master):

buffers/buffer-is-encoding.js encoding=hex: ./node: 45905000 ./out/Release/node-pr5256: 35969000 .......... 27.62%
buffers/buffer-is-encoding.js encoding=utf8: ./node: 43540000 ./out/Release/node-pr5256: 30779000 ......... 41.46%
buffers/buffer-is-encoding.js encoding=utf-8: ./node: 42315000 ./out/Release/node-pr5256: 28372000 ........ 49.15%
buffers/buffer-is-encoding.js encoding=ascii: ./node: 32530000 ./out/Release/node-pr5256: 31952000 ......... 1.81%
buffers/buffer-is-encoding.js encoding=binary: ./node: 37935000 ./out/Release/node-pr5256: 25886000 ....... 46.55%
buffers/buffer-is-encoding.js encoding=base64: ./node: 29954000 ./out/Release/node-pr5256: 27233000 ........ 9.99%
buffers/buffer-is-encoding.js encoding=ucs2: ./node: 33028000 ./out/Release/node-pr5256: 29890000 ......... 10.50%
buffers/buffer-is-encoding.js encoding=ucs-2: ./node: 25978000 ./out/Release/node-pr5256: 27828000 ........ -6.65%
buffers/buffer-is-encoding.js encoding=utf16le: ./node: 38147000 ./out/Release/node-pr5256: 27242000 ...... 40.03%
buffers/buffer-is-encoding.js encoding=utf-16le: ./node: 37962000 ./out/Release/node-pr5256: 23881000 ..... 58.96%
buffers/buffer-is-encoding.js encoding=HEX: ./node: 11735000 ./out/Release/node-pr5256: 34252000 ......... -65.74%
buffers/buffer-is-encoding.js encoding=UTF8: ./node: 10640000 ./out/Release/node-pr5256: 27859000 ........ -61.81%
buffers/buffer-is-encoding.js encoding=UTF-8: ./node: 9671900 ./out/Release/node-pr5256: 26836000 ........ -63.96%
buffers/buffer-is-encoding.js encoding=ASCII: ./node: 9061500 ./out/Release/node-pr5256: 24353000 ........ -62.79%
buffers/buffer-is-encoding.js encoding=BINARY: ./node: 9730300 ./out/Release/node-pr5256: 22144000 ....... -56.06%
buffers/buffer-is-encoding.js encoding=BASE64: ./node: 9383700 ./out/Release/node-pr5256: 23397000 ....... -59.89%
buffers/buffer-is-encoding.js encoding=UCS2: ./node: 9742700 ./out/Release/node-pr5256: 28104000 ......... -65.33%
buffers/buffer-is-encoding.js encoding=UCS-2: ./node: 8520900 ./out/Release/node-pr5256: 26971000 ........ -68.41%
buffers/buffer-is-encoding.js encoding=UTF16LE: ./node: 10237000 ./out/Release/node-pr5256: 24840000 ..... -58.79%
buffers/buffer-is-encoding.js encoding=UTF-16LE: ./node: 10102000 ./out/Release/node-pr5256: 20327000 .... -50.30%
buffers/buffer-is-encoding.js encoding=utf9: ./node: 9463400 ./out/Release/node-pr5256: 32503000 ......... -70.88%
buffers/buffer-is-encoding.js encoding=utf-7: ./node: 7779000 ./out/Release/node-pr5256: 28436000 ........ -72.64%
buffers/buffer-is-encoding.js encoding=utf17le: ./node: 10116000 ./out/Release/node-pr5256: 26293000 ..... -61.53%
buffers/buffer-is-encoding.js encoding=utf-17le: ./node: 10322000 ./out/Release/node-pr5256: 23956000 .... -56.91%
buffers/buffer-is-encoding.js encoding=Unicode-FTW: ./node: 12716000 ./out/Release/node-pr5256: 50317000 . -74.73%
buffers/buffer-is-encoding.js encoding=new gnu gun: ./node: 13034000 ./out/Release/node-pr5256: 47377000 . -72.49%

Code:

Buffer.isEncoding = function(encoding) {
  if (!encoding.length)
    encoding = '' + encoding;

  var loweredCase = false;
  for (;;) {
    switch (encoding.length) {
      case 3:
        switch (encoding) {
          case 'hex':
            return true;
        }
        break;
      case 4:
        switch (encoding) {
          case 'utf8':
          case 'ucs2':
            return true;
        }
        break;
      case 5:
        switch (encoding) {
          case 'utf-8':
          case 'ascii':
          case 'ucs-2':
            return true;
        }
        break;
      case 6:
        switch (encoding) {
          case 'binary':
          case 'base64':
            return true;
        }
        break;
      case 7:
        switch (encoding) {
          case 'utf16le':
            return true;
        }
        break;
      case 8:
        switch (encoding) {
          case 'utf-16le':
            return true;
        }
        break;
    }

    if (loweredCase)
      return false;
    encoding = ('' + encoding).toLowerCase();
    loweredCase = true;
  }
};

Note that all the actively used encodings are actually faster this way, and that the code is a bit cleaner.

Results compared with master:

buffers/buffer-is-encoding.js encoding=hex: ./node: 48271000 ./out/Release/node-master: 67270000 ........ -28.24%
buffers/buffer-is-encoding.js encoding=utf8: ./node: 43292000 ./out/Release/node-master: 43090000 ......... 0.47%
buffers/buffer-is-encoding.js encoding=utf-8: ./node: 42688000 ./out/Release/node-master: 29407000 ....... 45.17%
buffers/buffer-is-encoding.js encoding=ascii: ./node: 27191000 ./out/Release/node-master: 26880000 ........ 1.15%
buffers/buffer-is-encoding.js encoding=binary: ./node: 38716000 ./out/Release/node-master: 22406000 ...... 72.79%
buffers/buffer-is-encoding.js encoding=base64: ./node: 29703000 ./out/Release/node-master: 19464000 ...... 52.61%
buffers/buffer-is-encoding.js encoding=ucs2: ./node: 32250000 ./out/Release/node-master: 18487000 ........ 74.44%
buffers/buffer-is-encoding.js encoding=ucs-2: ./node: 25865000 ./out/Release/node-master: 15130000 ....... 70.95%
buffers/buffer-is-encoding.js encoding=utf16le: ./node: 38206000 ./out/Release/node-master: 12219000 .... 212.69%
buffers/buffer-is-encoding.js encoding=utf-16le: ./node: 37975000 ./out/Release/node-master: 9336700 .... 306.72%
buffers/buffer-is-encoding.js encoding=HEX: ./node: 11703000 ./out/Release/node-master: 7395300 .......... 58.24%
buffers/buffer-is-encoding.js encoding=UTF8: ./node: 10564000 ./out/Release/node-master: 6881200 ......... 53.52%
buffers/buffer-is-encoding.js encoding=UTF-8: ./node: 9603500 ./out/Release/node-master: 6410600 ......... 49.81%
buffers/buffer-is-encoding.js encoding=ASCII: ./node: 9010300 ./out/Release/node-master: 6396200 ......... 40.87%
buffers/buffer-is-encoding.js encoding=BINARY: ./node: 9724800 ./out/Release/node-master: 5432600 ........ 79.01%
buffers/buffer-is-encoding.js encoding=BASE64: ./node: 9248300 ./out/Release/node-master: 5398700 ........ 71.30%
buffers/buffer-is-encoding.js encoding=UCS2: ./node: 9863800 ./out/Release/node-master: 5778700 .......... 70.69%
buffers/buffer-is-encoding.js encoding=UCS-2: ./node: 8581300 ./out/Release/node-master: 5278000 ......... 62.59%
buffers/buffer-is-encoding.js encoding=UTF16LE: ./node: 10150000 ./out/Release/node-master: 4933200 ..... 105.75%
buffers/buffer-is-encoding.js encoding=UTF-16LE: ./node: 10312000 ./out/Release/node-master: 4579800 .... 125.16%
buffers/buffer-is-encoding.js encoding=utf9: ./node: 9374900 ./out/Release/node-master: 4416800 ......... 112.26%
buffers/buffer-is-encoding.js encoding=utf-7: ./node: 7697000 ./out/Release/node-master: 4292900 ......... 79.30%
buffers/buffer-is-encoding.js encoding=utf17le: ./node: 9994000 ./out/Release/node-master: 4237500 ...... 135.85%
buffers/buffer-is-encoding.js encoding=utf-17le: ./node: 10442000 ./out/Release/node-master: 4244100 .... 146.04%
buffers/buffer-is-encoding.js encoding=Unicode-FTW: ./node: 12759000 ./out/Release/node-master: 4875500 . 161.70%
buffers/buffer-is-encoding.js encoding=new gnu gun: ./node: 13005000 ./out/Release/node-master: 4993200 . 160.46%

I still doubt that this is worth it, because hex is still slower, and utf8 is sometimes slower, sometimes faster (tending to be slower, actually).

For reference, your PR against master on my PC:

buffers/buffer-is-encoding.js encoding=hex: ./out/Release/node-pr5256: 33124000 ./out/Release/node-master: 67186000 ......... -50.70%
buffers/buffer-is-encoding.js encoding=utf8: ./out/Release/node-pr5256: 32003000 ./out/Release/node-master: 43807000 ........ -26.95%
buffers/buffer-is-encoding.js encoding=utf-8: ./out/Release/node-pr5256: 27405000 ./out/Release/node-master: 29570000 ........ -7.32%
buffers/buffer-is-encoding.js encoding=ascii: ./out/Release/node-pr5256: 30777000 ./out/Release/node-master: 27544000 ........ 11.74%
buffers/buffer-is-encoding.js encoding=binary: ./out/Release/node-pr5256: 27291000 ./out/Release/node-master: 22253000 ....... 22.64%
buffers/buffer-is-encoding.js encoding=base64: ./out/Release/node-pr5256: 29385000 ./out/Release/node-master: 19128000 ....... 53.62%
buffers/buffer-is-encoding.js encoding=ucs2: ./out/Release/node-pr5256: 32516000 ./out/Release/node-master: 18361000 ......... 77.09%
buffers/buffer-is-encoding.js encoding=ucs-2: ./out/Release/node-pr5256: 28762000 ./out/Release/node-master: 14970000 ........ 92.13%
buffers/buffer-is-encoding.js encoding=utf16le: ./out/Release/node-pr5256: 24719000 ./out/Release/node-master: 12204000 ..... 102.55%
buffers/buffer-is-encoding.js encoding=utf-16le: ./out/Release/node-pr5256: 22317000 ./out/Release/node-master: 9641300 ..... 131.47%
buffers/buffer-is-encoding.js encoding=HEX: ./out/Release/node-pr5256: 35672000 ./out/Release/node-master: 7263700 .......... 391.09%
buffers/buffer-is-encoding.js encoding=UTF8: ./out/Release/node-pr5256: 28220000 ./out/Release/node-master: 6985000 ......... 304.01%
buffers/buffer-is-encoding.js encoding=UTF-8: ./out/Release/node-pr5256: 25079000 ./out/Release/node-master: 6383700 ........ 292.87%
buffers/buffer-is-encoding.js encoding=ASCII: ./out/Release/node-pr5256: 22696000 ./out/Release/node-master: 6173500 ........ 267.64%
buffers/buffer-is-encoding.js encoding=BINARY: ./out/Release/node-pr5256: 22915000 ./out/Release/node-master: 5689200 ....... 302.78%
buffers/buffer-is-encoding.js encoding=BASE64: ./out/Release/node-pr5256: 23843000 ./out/Release/node-master: 5414300 ....... 340.37%
buffers/buffer-is-encoding.js encoding=UCS2: ./out/Release/node-pr5256: 29847000 ./out/Release/node-master: 5532300 ......... 439.51%
buffers/buffer-is-encoding.js encoding=UCS-2: ./out/Release/node-pr5256: 27359000 ./out/Release/node-master: 5274800 ........ 418.66%
buffers/buffer-is-encoding.js encoding=UTF16LE: ./out/Release/node-pr5256: 25104000 ./out/Release/node-master: 4864700 ...... 416.03%
buffers/buffer-is-encoding.js encoding=UTF-16LE: ./out/Release/node-pr5256: 22025000 ./out/Release/node-master: 4567800 ..... 382.19%
buffers/buffer-is-encoding.js encoding=utf9: ./out/Release/node-pr5256: 29698000 ./out/Release/node-master: 4326000 ......... 586.50%
buffers/buffer-is-encoding.js encoding=utf-7: ./out/Release/node-pr5256: 26996000 ./out/Release/node-master: 4328400 ........ 523.69%
buffers/buffer-is-encoding.js encoding=utf17le: ./out/Release/node-pr5256: 27189000 ./out/Release/node-master: 4212900 ...... 545.37%
buffers/buffer-is-encoding.js encoding=utf-17le: ./out/Release/node-pr5256: 24378000 ./out/Release/node-master: 4246500 ..... 474.08%
buffers/buffer-is-encoding.js encoding=Unicode-FTW: ./out/Release/node-pr5256: 52608000 ./out/Release/node-master: 4990900 .. 954.07%
buffers/buffer-is-encoding.js encoding=new gnu gun: ./out/Release/node-pr5256: 55166000 ./out/Release/node-master: 4982400 . 1007.20%

Edit: code and results updated.

@ChALkeR
Copy link
Member

ChALkeR commented Feb 16, 2016

Also note that the results should probably be rechecked on v8 4.9, this looks like something v8 should optimize on it's own.

@ChALkeR ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016
@jasnell
Copy link
Member

jasnell commented Feb 16, 2016

A generator function for this kind of parsing is a very good idea. I have deep concerns over making these kinds of micro-optimizations as it does make the code much more opaque and difficult to manage. If we can put this behind a generator function that optimizes either at compile time or module-load time, then we can balance those needs quite well.

@trevnorris
Copy link
Contributor

I'm not -1 for performance improvements, but we need to recognize what we're actually optimizing. Look in lib/buffer.js and you'll notice that there are several locations that have a encoding type lookup. May I suggest something like:

function normalizeEncoding(encoding) {
  // place all the black magic wizardy you want here
}

Then use it like so:

Buffer.isEncoding = function isEncoding(encoding) {
  return normalizeEncoding(encoding) !== null;
};

Which would also be useful for cases like Buffer#write(). e.g.:

switch(normalizeEncoding(encoding)) {
  case 'hex':
    return this.hexWrite(string, offset, length);
...

Make sense?

@brendanashworth
Copy link
Contributor

Which would also be useful for cases like Buffer#write(). e.g.:

switch(normalizeEncoding(encoding)) {
  case 'hex':
    return this.hexWrite(string, offset, length);
...

There are a few spots in core where encoding names are accepted and some where they aren't (e.g. 'UTF-8' (turned into buffer), so this benefits in more than one way.

@mscdex
Copy link
Contributor

mscdex commented Feb 17, 2016

I had a little extra time today so I wrote a case-insensitive string comparison function generator and it seems like you pretty much need to keep the existing switch there that checks for the lowered cases in order to keep the performance from regressing for those (common) cases and just use the custom comparison function in place of the .toLowerCase() code.

I've also added a pre-optimize step in the benchmark in this PR before calling bench.start() to make things as fair as possible (you can see how to do this in the path and other benchmarks).

FWIW here are the results I get with my changes:

buffers/buffer-isencoding.js encoding=hex n=100000000: ./node: 42032000 ./node-before: 39292000 .......... 6.97%
buffers/buffer-isencoding.js encoding=utf8 n=100000000: ./node: 37179000 ./node-before: 35457000 ......... 4.86%
buffers/buffer-isencoding.js encoding=utf-8 n=100000000: ./node: 30648000 ./node-before: 29186000 ........ 5.01%
buffers/buffer-isencoding.js encoding=ascii n=100000000: ./node: 29270000 ./node-before: 26816000 ........ 9.15%
buffers/buffer-isencoding.js encoding=binary n=100000000: ./node: 26409000 ./node-before: 24497000 ....... 7.80%
buffers/buffer-isencoding.js encoding=base64 n=100000000: ./node: 23973000 ./node-before: 21901000 ....... 9.46%
buffers/buffer-isencoding.js encoding=ucs2 n=100000000: ./node: 21117000 ./node-before: 19959000 ......... 5.80%
buffers/buffer-isencoding.js encoding=ucs-2 n=100000000: ./node: 19334000 ./node-before: 18743000 ........ 3.16%
buffers/buffer-isencoding.js encoding=utf16le n=100000000: ./node: 17433000 ./node-before: 16605000 ...... 4.99%
buffers/buffer-isencoding.js encoding=utf-16le n=100000000: ./node: 15808000 ./node-before: 15745000 ..... 0.40%
buffers/buffer-isencoding.js encoding=HEX n=100000000: ./node: 14810000 ./node-before: 9373600 .......... 58.00%
buffers/buffer-isencoding.js encoding=UTF8 n=100000000: ./node: 14805000 ./node-before: 8956500 ......... 65.30%
buffers/buffer-isencoding.js encoding=UTF-8 n=100000000: ./node: 14587000 ./node-before: 8565300 ........ 70.30%
buffers/buffer-isencoding.js encoding=ASCII n=100000000: ./node: 14361000 ./node-before: 8739700 ........ 64.31%
buffers/buffer-isencoding.js encoding=BINARY n=100000000: ./node: 14236000 ./node-before: 8297500 ....... 71.56%
buffers/buffer-isencoding.js encoding=BASE64 n=100000000: ./node: 14234000 ./node-before: 8109800 ....... 75.52%
buffers/buffer-isencoding.js encoding=UCS2 n=100000000: ./node: 14652000 ./node-before: 7721100 ......... 89.76%
buffers/buffer-isencoding.js encoding=UCS-2 n=100000000: ./node: 14613000 ./node-before: 7525100 ........ 94.19%
buffers/buffer-isencoding.js encoding=UTF16LE n=100000000: ./node: 13613000 ./node-before: 7112700 ...... 91.40%
buffers/buffer-isencoding.js encoding=UTF-16LE n=100000000: ./node: 13367000 ./node-before: 7211300 ..... 85.37%
buffers/buffer-isencoding.js encoding=utf9 n=100000000: ./node: 12552000 ./node-before: 6386300 ......... 96.54%
buffers/buffer-isencoding.js encoding=utf-7 n=100000000: ./node: 12450000 ./node-before: 6395700 ........ 94.66%
buffers/buffer-isencoding.js encoding=utf17le n=100000000: ./node: 12738000 ./node-before: 6399700 ...... 99.05%
buffers/buffer-isencoding.js encoding=utf-17le n=100000000: ./node: 12333000 ./node-before: 6793400 ..... 81.54%
buffers/buffer-isencoding.js encoding=Unicode-FTW n=100000000: ./node: 17596000 ./node-before: 7299400 . 141.06%
buffers/buffer-isencoding.js encoding=new gnu gun n=100000000: ./node: 17258000 ./node-before: 7583100 . 127.58%

@ChALkeR
Copy link
Member

ChALkeR commented Feb 17, 2016

@mscdex Have you tried to split the existing switch by the length of the input string?

@mscdex
Copy link
Contributor

mscdex commented Feb 17, 2016

@ChALkeR I just tested that additional change and it does improve things:

buffers/buffer-isencoding.js encoding=hex n=100000000: ./node: 40356000 ./node-pre-string: 39489000 .......... 2.20%
buffers/buffer-isencoding.js encoding=utf8 n=100000000: ./node: 39834000 ./node-pre-string: 36809000 ......... 8.22%
buffers/buffer-isencoding.js encoding=utf-8 n=100000000: ./node: 38004000 ./node-pre-string: 29532000 ....... 28.69%
buffers/buffer-isencoding.js encoding=ascii n=100000000: ./node: 33636000 ./node-pre-string: 28023000 ....... 20.03%
buffers/buffer-isencoding.js encoding=binary n=100000000: ./node: 35756000 ./node-pre-string: 24973000 ...... 43.18%
buffers/buffer-isencoding.js encoding=base64 n=100000000: ./node: 31033000 ./node-pre-string: 22689000 ...... 36.77%
buffers/buffer-isencoding.js encoding=ucs2 n=100000000: ./node: 34393000 ./node-pre-string: 19936000 ........ 72.52%
buffers/buffer-isencoding.js encoding=ucs-2 n=100000000: ./node: 29396000 ./node-pre-string: 18719000 ....... 57.04%
buffers/buffer-isencoding.js encoding=utf16le n=100000000: ./node: 36172000 ./node-pre-string: 16855000 .... 114.60%
buffers/buffer-isencoding.js encoding=utf-16le n=100000000: ./node: 34160000 ./node-pre-string: 15633000 ... 118.52%
buffers/buffer-isencoding.js encoding=HEX n=100000000: ./node: 30110000 ./node-pre-string: 9410000 ......... 219.98%
buffers/buffer-isencoding.js encoding=UTF8 n=100000000: ./node: 26229000 ./node-pre-string: 9099000 ........ 188.26%
buffers/buffer-isencoding.js encoding=UTF-8 n=100000000: ./node: 22442000 ./node-pre-string: 8706300 ....... 157.77%
buffers/buffer-isencoding.js encoding=ASCII n=100000000: ./node: 22335000 ./node-pre-string: 8691300 ....... 156.98%
buffers/buffer-isencoding.js encoding=BINARY n=100000000: ./node: 22724000 ./node-pre-string: 8394200 ...... 170.70%
buffers/buffer-isencoding.js encoding=BASE64 n=100000000: ./node: 22631000 ./node-pre-string: 7945700 ...... 184.82%
buffers/buffer-isencoding.js encoding=UCS2 n=100000000: ./node: 25254000 ./node-pre-string: 7735400 ........ 226.47%
buffers/buffer-isencoding.js encoding=UCS-2 n=100000000: ./node: 22591000 ./node-pre-string: 7618800 ....... 196.51%
buffers/buffer-isencoding.js encoding=UTF16LE n=100000000: ./node: 23471000 ./node-pre-string: 7152400 ..... 228.16%
buffers/buffer-isencoding.js encoding=UTF-16LE n=100000000: ./node: 22622000 ./node-pre-string: 7259100 .... 211.64%
buffers/buffer-isencoding.js encoding=utf9 n=100000000: ./node: 22312000 ./node-pre-string: 6422100 ........ 247.42%
buffers/buffer-isencoding.js encoding=utf-7 n=100000000: ./node: 20268000 ./node-pre-string: 6468300 ....... 213.35%
buffers/buffer-isencoding.js encoding=utf17le n=100000000: ./node: 23559000 ./node-pre-string: 6395500 ..... 268.37%
buffers/buffer-isencoding.js encoding=utf-17le n=100000000: ./node: 21861000 ./node-pre-string: 6777700 .... 222.54%
buffers/buffer-isencoding.js encoding=Unicode-FTW n=100000000: ./node: 39908000 ./node-pre-string: 7257100 . 449.92%
buffers/buffer-isencoding.js encoding=new gnu gun n=100000000: ./node: 39264000 ./node-pre-string: 7596700 . 416.86%

EDIT: updated results, there was a typo previously which affected things a bit

@ChALkeR
Copy link
Member

ChALkeR commented Feb 17, 2016

@mscdex Can I take a look at the code? ;-)

@mscdex
Copy link
Contributor

mscdex commented Feb 17, 2016

@ChALkeR I just pushed it here.

@JacksonTian
Copy link
Contributor Author

I am confuse why my first version faster than master, but look like slower after force optimization with following code:

// Force optimization before starting the benchmark
Buffer.isEncoding(encoding);
v8.setFlagsFromString('--allow_natives_syntax');
eval('%OptimizeFunctionOnNextCall(Buffer.isEncoding)');
Buffer.isEncoding(encoding);

@JacksonTian
Copy link
Contributor Author

Here is an another implementation. It use template to generate isEncoding method.

@mscdex
Copy link
Contributor

mscdex commented Feb 18, 2016

@JacksonTian Did you check the output when adding at least --trace-opt --trace-deopt to the command line to see if there is anything obvious happening (e.g. permanent deopt, constant re-optimizations, etc.)? Otherwise you can view/compare the resulting instructions in IRHydra2.

@trevnorris
Copy link
Contributor

Buffer.isEncoding() is not a common code path. Though the code itself is common throughout Buffer. If this can be generalized for other similar cases into its own function and used everywhere applicable then I'll +1 this. But while I respect the performance tuning done here, it's significant enough on an uncommon code path that I may say it's not worth it.

@jasnell
Copy link
Member

jasnell commented Feb 19, 2016

+1 to what @trevnorris said.

@ronkorving
Copy link
Contributor

Is there something to say for the Node community to play a bigger role in V8 development directly? It seems to me that this type of optimization should be completely unnecessary if V8 was able to do it itself (in fact, it should be even much faster I expect). Now I'm no V8 expert in the slightest, so don't see myself lead an effort like this, but the Node community has a lot of working groups. Does it not make sense to have a group dedicated to the JS engine that Node so deeply depends upon?

@bnoordhuis
Copy link
Member

Does it not make sense to have a group dedicated to the JS engine that Node so deeply depends upon?

#3741 - which, admittedly, I dropped the ball on. I wasn't (and still am not) quite sure how to flesh it out properly. It doesn't help that I currently don't have much time to dedicate.

@ronkorving
Copy link
Contributor

Ah, I didn't know that was already ongoing. Great to see so much interest. I hope that process can resume at some point.

@jasnell
Copy link
Member

jasnell commented Mar 21, 2016

@JacksonTian ... ping ... what's the status on this one?

@RReverser
Copy link
Member

RReverser commented Jun 8, 2016

It this going to live in the light of #7207?

@brendanashworth
Copy link
Contributor

I'm going to go ahead and close this — after #7207 the PR must be entirely redone if it is even still relevant, and the discussion over a generator function should probably occur upon code landing in a new PR. Thanks!

@JacksonTian JacksonTian deleted the faster_is_encoding branch May 4, 2017 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants