Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start moving to Uint8Array in new APIs? #41588

Open
benjamingr opened this issue Jan 19, 2022 · 32 comments
Open

Start moving to Uint8Array in new APIs? #41588

benjamingr opened this issue Jan 19, 2022 · 32 comments
Labels
buffer Issues and PRs related to the buffer subsystem. discuss Issues opened for discussions and feedbacks.

Comments

@benjamingr
Copy link
Member

benjamingr commented Jan 19, 2022

There was a suggestion by @jasnell to use Uint8Arrays in new APIs over Buffers as well as a weigh-in by @sindresorhus saying it is easier to author cross-platform APIs when using Uint8Arrays.

Here is a context #41553 (comment)

That is, the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs.

What does everyone think? Should we stick to Buffer (which is a subclass of Uint8Array as a reminder) or prefer Uint8Arrays over buffers when possible in new APIs?

cc @nodejs/buffer @nodejs/streams

@benjamingr benjamingr added feature request Issues that request new features to be added to Node.js. buffer Issues and PRs related to the buffer subsystem. discuss Issues opened for discussions and feedbacks. and removed feature request Issues that request new features to be added to Node.js. labels Jan 19, 2022
@ronag
Copy link
Member

ronag commented Jan 19, 2022

That will cause soo much confusion. I believe there are several methods/props that Buffer override and which act differently than Uint8Array.

@ronag
Copy link
Member

ronag commented Jan 19, 2022

In particular .slice work differently. Not sure if there are others.

@ronag
Copy link
Member

ronag commented Jan 19, 2022

I'm fine with using Uint8Array for Web apis which define the type. But for node api's I think we should stick with Buffer.

@mcollina
Copy link
Member

It really depends on the subsystems we are targeting. It's impossible to make a generic call.

@paulmillr
Copy link

paulmillr commented Jan 19, 2022

I have a big problem with Buffers.

  1. buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45
  2. They are not supported in browsers and will never be. Adding an additional in-browser shim for buffers is bad.
  3. They expose private information to a global variable. Imagine you're developing some secure software. You reason about zeroing data etc. Buffer.from hex creates 8Kb buffers instead of 32 bytes #41467
// Somewhere in your code
const privateBuf = Buffer.from(privateKey, 'hex');

// Rogue package can access
Buffer.from('1').buffer
// Which will of course show the contents of `privateBuf`
// No need in complex memory dumps!

This happens because there is 8KB shared buffer reused for all Buffer.from calls! There is zero need in making this as subtle as it is right now. Buffer.allocUnsafe seems like a good name for "using a part of global shared buffer", Buffer.from is not. Search GitHub code for the snippet and tell me how many people know about the "feature".

@benjamingr
Copy link
Member Author

buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45

FWIW: I agree having an API that behaves like a subclass but "lies" about keeping the same API structure is super-confusing.

@benjamingr
Copy link
Member Author

What about the following suggestion:

  • Old modules that use Buffer will use Buffer (for example readable streams that build heavily on them).
  • New modules and APIs will prefer Uint8Arrays whenever possible for example the new HTTP API (if that's fetch that's easy).
  • Every API that takes a Buffer will accept a Uint8Array (that might already be the case).
  • Every API that returns a Buffer will continue returning a Buffer to not break the ecosystem (stuff like .slice).

Less likely to reach consensus but I'd still like that:

  • "Soft deprecate" buffer.slice and recommend .subarray() much more strongly.

@jasnell
Copy link
Member

jasnell commented Jan 19, 2022

We will never be able to get rid of Buffer. The generic rules of thumb I have in mind are:

  1. unless an API specifically calls for the unique features/differences of Buffer, it should return Uint8Array instead. ( Also keep in mind that it's trivial to create a Buffer from a Uint8Array. e.g. Buffer.from(u8.buffer).
    )

  2. Any API that accepts a Buffer should also accept Uint8Array (that is, Buffer should never be required as the only option) and docs should reflect that Uint8Array is accepted.

Examples where Buffer may be needed:

  1. The API needs to work with hex or base64 encoded data.
  2. The API needs Buffer's idea of slice
  3. The code path is particularly performance sensitive and Buffer's pooling/uninitialized memory is needed

@addaleax
Copy link
Member

"Soft deprecate" buffer.slice and recommend .subarray() much more strongly.

Yes please :)

  1. Also keep in mind that it's trivial to create a Buffer from a Uint8Array. e.g. Buffer.from(u8.buffer).

This is a great example to show that it’s not actually trivial, because Buffer.from(u8.buffer) works 95 % of the time, and will do the absolute wrong thing the other 5 % of the time. And unfortunately, this has become a fairly common bug to encounter. 😕 Maybe we should add something like Buffer.fromView(abv: ArrayBufferView) as a shorthand for Buffer.from(abv.buffer, abv.byteOffset, abv.byteLength)?

Any API that accepts a Buffer should also accept Uint8Array (that is, Buffer should never be required as the only option) and docs should reflect that Uint8Array is accepted.

That’s also the current state of things – even Buffer.prototype’s own methods work when called on Uint8Arrays.

@benjamingr
Copy link
Member Author

This is a great example to show that it’s not actually trivial, because Buffer.from(u8.buffer) works 95 % of the time, and will do the absolute wrong thing the other 5 % of the time.

I ran into that as well (even though I knew about the bug before) so I'd like to echo it's a problem.

That’s also the current state of things – even Buffer.prototype’s own methods work when called on Uint8Arrays.

That's neat I didn't know that.

@jasnell
Copy link
Member

jasnell commented Jan 19, 2022

...even Buffer.prototype's own methods work when called on Uint8Arrays.

Sadly not in all cases...
image

@addaleax
Copy link
Member

@jasnell That’s a bug then :)

@paulmillr
Copy link

Examples where Buffer may be needed:

The API needs to work with hex or base64 encoded data.

That needs to be implemented as a hex/base64 decoder in stdlib, no need in using Buffers for this. The new stdlib decoders will work with Uint8arrays etc.

The API needs Buffer's idea of slice

Which is the same as .subarray()

The code path is particularly performance sensitive and Buffer's pooling/uninitialized memory is needed

Pooling can be easily done with Uint8Arrays. And it could be implemented in a way which does not expose private contents to a global variable. I'm doing this in https://github.com/paulmillr/noble-hashes (see src/sha256.ts).

@benjamingr
Copy link
Member Author

@paulmillr to address some of the comment status:

buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45

We are (likely) going to docs-depercate (and --pending-deprecations probably) buffer.slice() (which effectively means everyone using TypeScript type definitions will see get warnings and suggestions to not use it). It's not great but it's a start. Progress here: #41596

They are not supported in browsers and will never be. Adding an additional in-browser shim for buffers is bad.

The ask here is that any API that aims to be compatible with web APIs (for example fetch or future modules) should strongly prefer typed arrays to buffers right? I think that's also a good point.

They expose private information to a global variable. Imagine you're developing some secure software. You reason about zeroing data etc. Buffer.from hex creates 8Kb buffers instead of 32 bytes #41467

I can see why this is done but this is very unfortunate and I did not realize this before. I'm wondering if this can be fixed while keeping the pooling behaviour? (By overriding .buffer)

@benjamingr
Copy link
Member Author

(and similarly - this is very much an area Node could use more people involved and more contributions in!)

@peq42
Copy link

peq42 commented Jan 20, 2022

I'm agaisnt the idea. It will not only cause confusion and compatibility issues, but also decrease in (micro) performance

@jimmywarting
Copy link

jimmywarting commented Mar 8, 2022

I'm a bit against using node:buffer myself for cross compatible reasons so I'm 👍 for this!
I think it have always been bad that .slice() have been overridden to do what you don't expect from an Uint8Array-like object, there are cases where you would expect the array to truly be immutable. The Blob class for example demands that you copy the data so that it can't be changed later.

.slice() is suppose to copy the data, period. currently the only rellyable way to do it is directly on the buffer.buffer.slice which is a bit annoying to use
If the user really intend to not copy it then they should actually be using subarray!

I'm comparing this a bit to this article: https://jakearchibald.com/2021/function-callback-risks/

At some point somebody might just say: screw it: we are going to switch to using Uint8Array instead (cuz we already treat it as Uint8Array-like) so it can change the hole underlying infrastructure when it uses slice


I don't think that extending any built in types are such a great idea, what if buffer added a custom .at() function in the early days when we did not have native at?

I think Buffer is bloated with stuff that we don't actually need (specially for browser)
TextEncoder/TextDecoder should be used instead, it also runs much faster in browser (up to 10x faster - using this comparison) on some random web page they also handle BOM
DataView can replace many things to read/edit the array.

I even wonder why Blob exist on the buffer object at all...
should the buffer package on npm have to add a Blob class and whatwg streams as well?

Maybe it would be a good idea to fleshed out all of the utility method like to/from hex and base64 got added into a own sperate util module you could import from import { toHex } from node:binary-utility
We also have this binary-encoding proposal and some even argue against this proposal and string to/from binary encoding (worth reading)


honestly, i just wish to see Buffer being deprecated and gone all together. but that is likely never going to happen

@slavafomin
Copy link

I'm having a somewhat related problem with Buffer. I need to convert Uint8Array to a Base64 string, but this functionality is tied to the Buffer class. So I need to first convert Uint8Array to Buffer and then to call Buffer.toString('base64'). This extra operation just destroys all the performance benefits of using the native implementation of the Base64 encoding. We have to use our own function for encoding that operates on Uint8Array directly, which is kinda ridiculous :)

@sindresorhus
Copy link

I wrote a blog post on my intent to move away from Buffer in all my packages: https://sindresorhus.com/blog/goodbye-nodejs-buffer

@sindresorhus
Copy link

Examples where Buffer may be needed:
The API needs to work with hex or base64 encoded data.

It looks like it's being worked on: https://github.com/tc39/proposal-arraybuffer-base64

@sindresorhus
Copy link

I think the first step would be to add a note here about preferring Uint8Array for new code. Would that be acceptable?

@paulmillr

This comment was marked as off-topic.

@jimmywarting
Copy link

I wish that buffer never got added to the global scope in the first place. It had led to many bundler polyfilling it when it is seen as optional.

I can only hope that a new spec'ed web worker dose not include globalThis.buffer globalThis.process and globalThis.global by default. It should be explicit imported when needed

@benjamingr
Copy link
Member Author

I think the first step would be to add a note here about preferring Uint8Array for new code. Would that be acceptable?

I don't believe we have consensus on that at all.

@targos
Copy link
Member

targos commented Oct 24, 2023

It would be quite a low effort to create a PR with that note, and probably the best way to seek consensus.

@ronag
Copy link
Member

ronag commented Oct 24, 2023

I don't think there is a good alternative for Buffer at the moment. Uint8Array doesn't provide all the functionality. That being said, in cases where it's not necessary and the instance is only accessible by node internals I do think it would be good to move to Uint8Array.

https://github.com/sindresorhus/uint8array-extras does not provide helpers with the same performance.

@targos
Copy link
Member

targos commented Oct 24, 2023

TBH I'm not sure I understand why this issue is still open. Uint8Array is already supposed to be supported everywhere Buffer is (in core).

@benjamin when you say "the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs." what do you mean by "prefer"? Is it about the type of the value returned by new APIs?

@ronag it's easy to create a Buffer from an Uint8Array if necessary for a certain use case.

@jimmywarting
Copy link

jimmywarting commented Oct 24, 2023

I don't think there is a good alternative for Buffer at the moment. Uint8Array doesn't provide all the functionality. That being said, in cases where it's not necessary and the instance is only accessible by node internals I do think it would be good to move to Uint8Array.

https://github.com/sindresorhus/uint8array-extras does not provide helpers with the same performance.

I would be in favor of this package, if it meant that i didn't need to ship node:buffer to browser and other runtimes. even if it meant a perf loss.

TBH I'm not sure I understand why this issue is still open. Uint8Array is already supposed to be supported everywhere Buffer is (in core).

Maybe stands as a reminder that newly developed api's should prefer to return Uint8Array instead?

@benjamin when you say "the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs." what do you mean by "prefer"? Is it about the type of the value returned by new APIs?

Can only speak for myself, but i guess he means that ppl should avoid spending time on reading up on the Buffer class and all stuff in it and resort to using Uint8Array/DataView/TextEncoder/Decoder instead as it's better cross compatible with other env. not about changing the return type of what new api's should return.


it's quite funny how atob/btoa and this contradicts each other.
this says avoid buffer and atob/btoa suggest that you should use Buffer -

i think this statment should be removed or changed:
Use buf.toString('base64') instead
Use Buffer.from(data, 'base64') instead

@tniessen
Copy link
Member

@jimmywarting I don't think it's appropriate to open the atob/btoa can of worms yet again. We already discussed this at length, see #40754 (comment) for a summary. Buffer was added because there was no appropriate web API at the time, and while there is Uint8Array now, there's still no good standardized API for base64 conversion.

@panva
Copy link
Member

panva commented Oct 25, 2023

@jimmywarting I don't think it's appropriate to open the atob/btoa can of worms yet again. We already discussed this at length, see #40754 (comment) for a summary. Buffer was added because there was no appropriate web API at the time, and while there is Uint8Array now, there's still no good standardized API for base64 conversion.

+ hex and url-safe base64

@bnoordhuis
Copy link
Member

newly developed api's should prefer to return Uint8Array instead?

No, because that creates bifurcation. Better to be consistently wrong than inconsistently right.

@bakkot
Copy link
Contributor

bakkot commented Oct 25, 2023

Once my proposal for native Uint8Array <-> base64 is shipping, which I am hoping will be within a year or two, I think it would be reasonable to suggest to readers of the Buffer documentation that the native Uint8Array class should be able to fulfill their needs.


  • hex and url-safe base64

Those are also part of the JS proposal for base64 on Uint8Arrays, incidentally. I just haven't been emphasizing hex because it's a lot easier (and simpler) than base64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. discuss Issues opened for discussions and feedbacks.
Projects
None yet
Development

No branches or pull requests