-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: optimize decoding wrapped base64 data #12146
Conversation
I haven't run the whole benchmark suite yet, only symlinked the exiting and the new benchmarks into a new directory locally:
|
// eslint-disable-next-line no-unescaped-regexp-dot | ||
data.match(/./); // Flatten the string | ||
const buffer = Buffer.allocUnsafe(bytesCount); | ||
buffer.write(data, 0, bytesCount, 'base64'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit but this can be simplified just a bit by doing...
const line = 'abcd'.repeat(charsPerLine / 4) + '\n';
const buffer = Buffer.alloc(bytesCount, line);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for the suggestion!
The fast base64 decoder used to switch to the slow one permanently when it saw a whitespace or other garbage character. Since the most common situation such characters may be encountered in is line-wrapped base64 data, a more profitable strategy is to decode a single 24-bit group with the slow decoder and then continue running the fast algorithm. Refs: nodejs#12114
1fb578c
to
ff77c72
Compare
Rebased to incorporate changes from #11995. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Can I get a fresh CI run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I'm qualified to sign off on this but LGTM
/cc @bnoordhuis (based on Git history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM but it would be really nice to have some cctests for this header file
(to be clear, the cctest can be added separately :-) ...) |
The fast base64 decoder used to switch to the slow one permanently when it saw a whitespace or other garbage character. Since the most common situation such characters may be encountered in is line-wrapped base64 data, a more profitable strategy is to decode a single 24-bit group with the slow decoder and then continue running the fast algorithm. PR-URL: #12146 Ref: #12114 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Trevor Norris <[email protected]> Reviewed-By: James M Snell <[email protected]>
Landed in e77a83f |
This commit adds C++ tests for `base64_encode()` and `base64_decode()` functions defined in `base64.h`. The functionality is already being tested indirectly in JavaScript tests for Buffer, but it won't hurt to test the low-level functions too, especially given that they aren't only used in the internal Buffer implementation, Chrome inspector protocol support relies upon them too. Refs: nodejs#12146 (comment)
This commit adds C++ tests for `base64_encode()` and `base64_decode()` functions defined in `base64.h`. The functionality is already being tested indirectly in JavaScript tests for Buffer, but it won't hurt to test the low-level functions too, especially given that they aren't only used in the internal Buffer implementation, Chrome inspector protocol support relies upon them too. PR-URL: #12238 Refs: #12146 (comment) Reviewed-By: James M Snell <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Bevenius <[email protected]>
The fast base64 decoder used to switch to the slow one permanently when it saw a whitespace or other garbage character. Since the most common situation such characters may be encountered in is line-wrapped base64 data, a more profitable strategy is to decode a single 24-bit group with the slow decoder and then continue running the fast algorithm. PR-URL: nodejs#12146 Ref: nodejs#12114 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Trevor Norris <[email protected]> Reviewed-By: James M Snell <[email protected]>
should we backport? Assuming if so we should wait a bit |
It seems to be the root cause of #13657, so no. |
This commit adds C++ tests for `base64_encode()` and `base64_decode()` functions defined in `base64.h`. The functionality is already being tested indirectly in JavaScript tests for Buffer, but it won't hurt to test the low-level functions too, especially given that they aren't only used in the internal Buffer implementation, Chrome inspector protocol support relies upon them too. PR-URL: #12238 Refs: #12146 (comment) Reviewed-By: James M Snell <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Bevenius <[email protected]>
This commit adds C++ tests for `base64_encode()` and `base64_decode()` functions defined in `base64.h`. The functionality is already being tested indirectly in JavaScript tests for Buffer, but it won't hurt to test the low-level functions too, especially given that they aren't only used in the internal Buffer implementation, Chrome inspector protocol support relies upon them too. PR-URL: #12238 Refs: #12146 (comment) Reviewed-By: James M Snell <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Bevenius <[email protected]>
This commit adds C++ tests for `base64_encode()` and `base64_decode()` functions defined in `base64.h`. The functionality is already being tested indirectly in JavaScript tests for Buffer, but it won't hurt to test the low-level functions too, especially given that they aren't only used in the internal Buffer implementation, Chrome inspector protocol support relies upon them too. PR-URL: #12238 Refs: #12146 (comment) Reviewed-By: James M Snell <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Bevenius <[email protected]>
The fast base64 decoder used to switch to the slow one permanently when
it saw a whitespace or other garbage character. Since the most common
situation such characters may be encountered in is line-wrapped base64
data, a more profitable strategy is to decode a single 24-bit group with
the slow decoder and then continue running the fast algorithm.
Refs: #12114
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
buffer