Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer#toString unexpected / incorrect results with encoding #455

Closed
SheetJSDev opened this issue Jul 8, 2022 · 5 comments
Closed

Buffer#toString unexpected / incorrect results with encoding #455

SheetJSDev opened this issue Jul 8, 2022 · 5 comments
Labels
bug Something isn't working node.js Compatibility with Node.js APIs

Comments

@SheetJSDev
Copy link
Contributor

Digging into https://github.com/Jarred-Sumner/bun/issues/406 , one issue pertained to Buffer semantics. SheetJS assumes NodeJS documented Buffer behavior in toString. Bun diverges in a few ways:

A) utf16le spurious data. The UTF16LE representation of the ASCII character A is [ 0x41, 0 ].

// node prints [ 65 ] while bun prints [ 65, 209 ]
console.log(Buffer.from([65, 0]).toString("utf16le").split("").map(x => x.charCodeAt(0)));

NodeJS prints the expected [ 65 ] (one character, 0x41 == 65) but Bun generates two characters [ 65, 209 ]

B) base64 encoding does not include padding:

// bun and node both print QQA
console.log(Buffer.from([65, 0]).toString("base64url"))
// node prints QQA= with the padding while bun omits the padding
console.log(Buffer.from([65, 0]).toString("base64"))

As noted in the NodeJS docs, base64url omits padding while base64 includes padding.

@nullhook
Copy link
Contributor

when parsing the encoding type for utf16le the returned enum is set to ucs2, a trivial change would be to return the proper enum ::utf16le here - This isn't the actual fix because both encodings (utf16le and ucs2) go to the same codepath.

interestingly, there's something happening in the allocation function. i'm not sure how Zig allocators work yet but the documentation states that items are set to undefined. however, in this case i see it spits with actual values.

even for a simple test case you'll see an output of arbitrary value:

Buffer.from([0,0]).toString("utf16le") // [ 170, 170 ]
(unsigned char *) $0 = 0x00000573043400b8 "\xaa\xaa4\U00000004s\U00000005" {
  (unsigned char) [0] = '\xaa'
  (unsigned char) [1] = '\xaa'
}

@binary-koan
Copy link

Maybe related, something else I discovered investigating #1184

Calling toString with binary/latin1 returns wildly different results than Node:

In Node:

Buffer.from('{"alg":"RS256","typ":"JWT"}', 'latin1').toString('latin1') // '{"alg":"RS256","typ":"JWT"}'

In Bun:

Buffer.from('{"alg":"RS256","typ":"JWT"}', 'latin1').toString('latin1') // '≻污≧∺卒㔲∶∬祴≰∺坊≔}'

It looks like the Buffer's Uint8Array is the same in both Bun and Node, so the problem is just in the conversion to string.

@SheetJSDev
Copy link
Contributor Author

@binary-koan you are describing #1016 . It works in 0.1.5 but regressed afterwards. Using 0.1.5 on Intel Mac:

% curl -LO https://github.com/oven-sh/bun/releases/download/bun-v0.1.5/bun-darwin-x64.zip
% unzip bun-darwin-x64.zip
% cat <<EOF > t.js 
console.log(Buffer.from('{"alg":"RS256","typ":"JWT"}', 'latin1').toString('latin1'))
EOF
% ./bun-darwin-x64/bun --version; ./bun-darwin-x64/bun t.js
0.1.5
{"alg":"RS256","typ":"JWT"}

Trying the same with 0.1.6:

% curl -LO https://github.com/oven-sh/bun/releases/download/bun-v0.1.6/bun-darwin-x64.zip
% unzip bun-darwin-x64.zip
% cat <<EOF > t.js 
console.log(Buffer.from('{"alg":"RS256","typ":"JWT"}', 'latin1').toString('latin1'))
EOF
% ./bun-darwin-x64/bun --version; ./bun-darwin-x64/bun t.js
0.1.6
≻污≧∺卒㔲∶∬祴≰∺坊≔}Р찜˵

@nullhook
Copy link
Contributor

nullhook commented Sep 1, 2022

looks like Zig in safe mode intentionally sets values to undefined / 0xaa in allocators .alloc and not in-Drelease-fast version

An ideal solution would be to zero out or set custom type before sending it to JSC so it ignores it.

@rosshadden
Copy link

Can you try again with the latest Bun, @SheetJSDev ? Both of your examples work for me in both Bun 0.1.13 (stable) and the latest built from source. If you still have the problem I'm wondering whether this is a platform-specific problem, which would help for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working node.js Compatibility with Node.js APIs
Projects
None yet
Development

No branches or pull requests

5 participants