Skip to content
This repository has been archived by the owner on Jul 31, 2018. It is now read-only.

Buffer.from/Buffer.alloc/Buffer.zalloc/Buffer() soft-deprecate #4

Closed
wants to merge 0 commits into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Jan 16, 2016

@indutny
Copy link
Member

indutny commented Jan 16, 2016

You have my +1 on this

@ChALkeR
Copy link
Member

ChALkeR commented Jan 16, 2016

+1 for the primary goal, except for the new method names. That is not a nitpick, I consider those to be very important.
I mentioned that a few times already, but making sure that this discussion does not miss it.

With the currently included method names, this proposal fixes only problem 2, but does not fix problem 1 from my note — inexperienced users will still use an unsafe method in a dangereous way without reading the docs first, and there will be more bugs in the community modules. See also q/a question 18.

Also, I am a bit concerned about the secondary goal — even if that change is sometime pushed to v0.12.n+1, users that would use v0.12.n will be endangered. That does not change over time, replace «n» with the latest released version to that time (and «v0.12» with any branch). This point is valid until Buffer(number) is hard-deprecated. Also, the --unitialized-buffers command line option would work as a kill switch if any author of any lib assumes that Buffer(number) will be zero-filled, and they would assume that, once this change is brought to the public. I can already imagine «Node.js zero-fills Buffer(number) now» news and tweets.

@rwaldron
Copy link

Is the intention to define Buffer.from such that it overrides the built-in class-side from method that Buffer inherits from Uint8Array? Take a look:

function Buffer() {} 

console.log(Buffer.from === undefined); // true

// See buffer.js#L47-L71, nodejs/node @ 66b9c0d (linked above) 
Object.setPrototypeOf(Buffer.prototype, Uint8Array.prototype);
Object.setPrototypeOf(Buffer, Uint8Array);

console.log(Buffer.from === undefined); // false
console.log(Buffer.from === Uint8Array.from); // true

See also:

@jasnell
Copy link
Member Author

jasnell commented Jan 16, 2016

Yes, it overrides the built in from that is inherited from Uint8Array.
On Jan 16, 2016 8:58 AM, "Rick Waldron" [email protected] wrote:

Is the intention to define Buffer.from such that it overrides the
built-in class-side from method that Buffer inherits
https://github.com/nodejs/node/blob/66b9c0d8bdb7f9fb542f53c2c18bacd5bbca272d/lib/buffer.js#L47-L68
from Uint8Array? Take a look:

function Buffer() {}
console.log(Buffer.from === undefined); // true
// See buffer.js#L47-L71, nodejs/node @ 66b9c0d (linked above) Object.setPrototypeOf(Buffer.prototype, Uint8Array.prototype);Object.setPrototypeOf(Buffer, Uint8Array);
console.log(Buffer.from === undefined); // falseconsole.log(Buffer.from === Uint8Array.from); // true

See also:


Reply to this email directly or view it on GitHub
#4 (comment).

@rwaldron
Copy link

Yes, it overrides the built in from that is inherited from Uint8Array.

Thank you for clarifying these semantics. If this proposal is accepted, the docs should make mention that Buffer.from is an override and does not offer the mapping function argument.

@glenjamin
Copy link

I would also like to add that I think it should be made marginally "harder" to use alloc than zalloc.

That is: If a developer only glances at the API without properly reading docs, it would be beneficial if they picked the method which is safest. Those who need / want the extra performance can find it by reading the documentation more closely. Someone who wants to allocate some memory may not read every buffer method - they'll likely stop when they find one which fits the bil

Something like zalloc -> alloc + alloc -> allocRaw would acheive this, or having a scary warning very clearly in the docs for the alloc which doesn't zero.

@mafintosh
Copy link
Member

Looks good! This would have solved the security problems we've found in bittorrent-dht and ws. Thanks for turning this into an EPS @jasnell.

@joepie91
Copy link

@glenjamin: I would also like to add that I think it should be made marginally "harder" to use alloc than zalloc.

That is: If a developer only glances at the API without properly reading docs, it would be beneficial if they picked the method which is safest. Those who need / want the extra performance can find it by reading the documentation more closely. Someone who wants to allocate some memory may not read every buffer method - they'll likely stop when they find one which fits the bil

+1 on this. Again, I feel that 'unsafe' is the most accurate and beginner-friendly way to convey this, rather than 'raw' or other more 'neutral' terms.

@wbl
Copy link

wbl commented Jan 17, 2016

This proposal does not actually fix the bug. new Buffer(variable) will continue to result in information disclosure vulnerabilities until existing code is changed. I still don't understand why this behavior change isn't being made.

@alfiepates
Copy link

@joepie91: I feel that 'unsafe' is the most accurate and beginner-friendly way to convey this, rather than 'raw' or other more 'neutral' terms.

+1. buffer.alloc() and buffer.unsafealloc() would most definitely keep people away from the second option unless they knew what they were doing.

@feross
Copy link

feross commented Jan 17, 2016

Nice work. This solves the core problem that affected ws and bittorrent-dht, which is Buffer(variable) getting tricked into taking a number argument.

@rvagg
Copy link
Member

rvagg commented Jan 18, 2016

@nodejs/ctc this is probably a good place to move technical discussion for now.

@jasnell can we have clarification on what "soft-deprecation" means? Is it doc-deprecation? With "hard-deprecation" meaning util.deprecate()?

(apologies for the long prose to follow but this seems like the appropriate to finally lay out my current thoughts)

Backporting to v0.10, v0.12, and v4 could actually make this more painful than it should be, we'll then have a pretty big feature-difference between patch-versions. I'm guessing the intention is to make the new factory functions available to everyone using Node.js but that makes the assumption that the majority of users are using the latest release in each line, which is not the case at all. I've seen companies justify the ongoing use of older v0.10 in production with typical risk-aversion combined with claims that being behind load-balancers and TLS terminators protect them from most of the security flaws that would normally force people to upgrade. There's also situations where Node is being used completely disconnected and therefore less at risk to security vulnerabilities. Even beyond commercial use, node is used as part of a utility toolchain so widely now and many node users either don't care or don't know what version they are using (if they know they are using it at all). I don't endorse any reasons people give for not upgrading when we have patch-releases but it happens all the time and will continue to happen. If we end up with the npm ecosystem shifting to the new factory methods when they are only available in the latest of a release line we're going to have a significant amount of pain. We've had a couple of historical instances like this that we should learn from. The last, most painful one that I recall is the introduction of ^ in npm 1/2 way through v0.10, leaving older v0.10 and v0.8 installs without it. When it was turned on as the default for npm install -S we had soo much breakage. The number of bug reports that many of us has to deal with in our various projects related to users' npm being unable to parse the ^ in "dependencies" was phenomenal. And consider here that the solution for many people was to just npm install npm -g to upgrade (except when npm itself broke old npm for the same reason!) or simply installing the latest v0.10. The demonstration here was clear that we can't use "it's in the latest release" as a mechanism to solve anything.

I'd really prefer we stick to semver as much as possible as it solves a lot of these kinds of problems. At this stage I think my preference, if we are determined to act, would be something like:

  1. introduce new factory methods on v5 and and a v4 semver-minor
    • aside: I don't particularly like zalloc and alloc, I also don't like the word "safe" in these, but that seems like a minor discussion when we move past the big issues
  2. document deprecation of the constructor in all cases, make it super-clear in the Buffer docs what the issues are
  3. (maybe!) introduce zero-fill-by-default for the constructor and and cmdline flag to turn it off in all active release lines, including v0.10 and v0.12
  4. at the end of 2016, do a util.deprecate() on the constructor, v0.10 and v0.12 will be EOL by then
  5. (much later) remove the constructor

I don't think the accidental-number problem is something we can hit straight away except through education. There's too much breakage in taking forceful action while v0.10 and v0.12 are still active. It may be safe enough to backport a warning on new Buffer(number, enc) to both v0.10 and v0.12, although this doesn't exactly solve the problem.

To re-state what I said above, introducing factory methods to v0.10 and v0.12 will not solve the version discrepancies because it assumes people use the latest, by not adding the factory methods to those versions people are much more likely to write and use feature-detection mechanisms, which is good, and we could even encourage this in the docs (var buf = Buffer.zalloc ? Buffer.zalloc(1024) : new Buffer(1024).fill(0)). This all goes away at the end of 2016, like saying goodbye to IE6.

While I accept that we have a usability problem here and that core has to take some responsibility for the way people use Node, I don't accept that we should treat such problems in the same way that we treat core security vulnerabilities. Hence my insistence on taking a longer view on this with v0.10 and v0.12 to prioritise greater ecosystem stability over the possibility of ecosystem packages containing similar API misuse to what we are seeing. Vulnerabilities in individual npm packages are almost always going to be of much lesser impact that vulnerabilities in core and they are also much easier to patch and distribute without the kind of breakage and uncertainty that happens when we do it with core.

And lastly, something I keep on repeating, it's essential that we do our best to educate people to think about Node as a server-side programming platform where evil can wreak havoc if you don't think through what you're doing carefully and be attune to the kinds of common vulnerabilities that can be created. Node.js is a platform that attracts users who are used to a browser sandbox and are not accustomed to thinking as carefully about input sanitisation and careful interaction with system resources. This is going to be a continual theme (TrendMicro being the current big news on this front) and while we can tackle it in part by making the API safer we also need to accept that nannying users risks lulling them into a perception that Node.js is more of a sandbox than it is. We must accept a caveat emptor approach.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 18, 2016

@jasnell can we have clarification on what "soft-deprecation" means? Is it doc-deprecation? With "hard-deprecation" meaning util.deprecate()?

This is what I meant in all my messages, and I hope that this is what it means here.

Backporting to v0.10, v0.12, and v4 could actually make this more painful than it should be … I'd really prefer we stick to semver as much as possible

v0.10 and v0.12 are not semver, but I understand your concerns. Though, note that this EPS does not describe backporting this feature to v0.10/v0.12, and I personally view that as a separate question. If you want to know my opinion on this — backporting the new API methods to v0.10 and v0.12 would assert faster ecosystem migration to the safer methods, and will motivate people who are using older patch releases of v0.10 and v0.12 to update to the latest patch version (which I also view as a positive change), but only when they will update their libraries. Also note the the popular libraries could have these concerns themselves, so they could intoduce migration to the new Buffer api in a semver-major version of the library. Also, merging this to all the supported branches will allow new libraries to use the safer API from the start. Still, I view this discussion as controversial and less important, thus I do not want to discuss this in the current pull, unless backporting new API changes is included to the EPS itself.

I don't think the accidental-number problem is something we can hit straight away except through education.

Of course it wouldn't be hit straight away, partially because the old methods would be only soft-deprecated for quite a while, partially because people generally don't react very fast and there are a lot of packages that need to be changed. The primary goal of this is to provide safer API for use in the future (as it seems to me),

aside: I don't particularly like zalloc and alloc, I also don't like the word "safe" in these, but that seems like a minor discussion when we move past the big issues

Again: I view this to be very important, because choosing wrong method names would fix only the second problem (accidental Buffer(number) hitting instead of Buffer(value)), but not the first one (inaccurate uninitialized Buffer(number) usage, even for not performance-critical methods), but will leave us in a situation where that is even less likely to be fixed than it is now — because renaming methods would break more things than introducing new ones.

(maybe!) introduce zero-fill-by-default for the constructor and and cmdline flag to turn it off in all active release lines, including v0.10 and v0.12

Please, no. While new API will cause the previous versions (in the branches it gets backported to) to crash (if libraries start relying on new API), this change will make them leak uninitialized buffers unnoticed (if the libraries start relying on zero-fill-by-default), which is far far worse. Also note that you can't stop them from relying on zero-fill-by-default once it's introduced — almost noone will want to have two zero-fills.

And lastly, something I keep on repeating, it's essential that we do our best to educate people to think about Node as a server-side programming platform where evil can wreak havoc if you don't think through what you're doing carefully and be attune to the kinds of common vulnerabilities that can be created.

Strongly agreed, moreover, there are several more specific education-related issues that I wanted to discuss after this gets fixed (and one other change).

and while we can tackle it in part by making the API safer we also need to accept that nannying users risks lulling them into a perception that Node.js is more of a sandbox than it is

The problem is (as you already said) that they already have this perception, and there are no reasons not to guard them from common pitfalls (like the Buffer API), leaving more dangereous methods to be the less obvious to use ones and reachable by more experienced developers. Btw, if your statement was in protection of not changing the API, then it was a very bad reasoning for not making the ecosystem safer — those effects are incomparable, Buffer() API is hurting everyone right now. If I misread — please, excuse me.

@joepie91
Copy link

@rvagg Well, @ChALkeR has said most of it, but just to weigh in on the naming issue once again: it is critical that the unsafe terminology or a similarly dangerous-looking word is used, to steer people away from the dangerous option unless they know what they are doing.

That being said, I do not particularly care about using safe anywhere - so having a Buffer.alloc and buffer.unsafeAlloc method, for example, would be completely okay as far as I'm concerned. I do understand that it may be a bad idea to call something safe when there may be other possible issues, so this might be the best way to approach the naming.

@glenjamin
Copy link

To re-state what I said above, introducing factory methods to v0.10 and v0.12 will not solve the version discrepancies because it assumes people use the latest, by not adding the factory methods to those versions people are much more likely to write and use feature-detection mechanisms, which is good, and we could even encourage this in the docs (var buf = Buffer.zalloc ? Buffer.zalloc(1024) : new Buffer(1024).fill(0)). This all goes away at the end of 2016, like saying goodbye to IE6.

Do you think this is a good case for doing something similar to readable-stream from the 0.8 to 0.10 transition? Is that something which was considered successful and/or a good idea with hindsight?

ie. a widely used userland module which provides these semantics, using native methods where available.

@joepie91
Copy link

Do you think this is a good case for doing something similar to readable-stream from the 0.8 to 0.10 transition? Is that something which was considered successful and/or a good idea with hindsight?

I realize you weren't asking me specifically, but: it's actually not entirely clear to me what the purpose of the readable-stream module is, or rather, whether it still plays a relevant role in the current state of Node.js. I have seen many others voice the same.

Aside from that, many people still perceive a 'cost' to using a userland module, even if there isn't any such cost. "I don't want to depend on third-party modules for this" is still a commonly heard argument in #Node.js, and it is still an extra hurdle for people to take. The same concern applies to "recommending people to write a polyfill".

@jasnell
Copy link
Member Author

jasnell commented Jan 18, 2016

@rvagg ... thank you for the lengthy review... I think you and I are on the same page.

@ChALkeR @joepie91 and others... the key challenge with the unsafe terminology is that the method itself is not unsafe, only the incorrect usage of it is. There are perfectly legitimate and quite safe uses for allocating uninitialized memory. The goal here is to come up with a compromise that is a good balance between all of the various points of view, I'm quite certain that we're not going to make everyone happy in this case but we need to try to settle on names that are good enough to address the core problem.

Currently I'm thinking:

  • rename zalloc to alloc to ensure that it (a) comes first in the alphabetical listing
  • rename alloc to ualloc for "uninitialized allocation" ... this will make it appear to be the non-default choice and will ensure that it comes later in the alphabetical listing.

@joepie91
Copy link

the key challenge with the unsafe terminology is that the method itself is not unsafe, only the incorrect usage of it is.

No, it isn't. An unitialized memory allocation is always unsafe from a memory safety point of view. That you can mitigate the issue doesn't change that the operation itself is unsafe. That there are 'safe uses' also doesn't change that. None of that concerns the operation itself.

EDIT: And if you (general you, not personal you) are sufficiently competent to understand how to use an uninitialized allocation, then you are also capable of understanding that "unsafe" concerns memory safety. Yes, this makes it "look scary" - that is precisely the point. There's no technical argument not to.

@glenjamin
Copy link

@jasnell I think the rust approach to this is quite nice: https://doc.rust-lang.org/book/unsafe.html

The general theme is that pretty much everything else is safe, and unchecked usage of this function is unsafe. Therefore the method gets marked unsafe, and you are required to wrap it in some safety.

Node.js and rust have quite different target audiences and use-cases, but I think in this case there is some overlap.

Another argument for making them more different is that if you're skimming through code, a single letter difference isn't always immediately visible. Something that can potentially expose runtime memory should look visually distinct, and jump out at the reader.

As with everything else I've written, this is all IMO.

@jasnell
Copy link
Member Author

jasnell commented Jan 18, 2016

To be clear, personally I have no problem with using unsafe in the term. Others have raised objections to that language in the various threads and my goal is to come up with a compromise that works for the most part.

@joepie91
Copy link

@jasnell I understand - I believe you've explicitly remarked on representing objections from others in the previous thread, so I've been more or less assuming that throughout the course of the threads :) Hence my attempt to re-explain why the use of the unsafe terminology is appropriate here.

If there are people who still object to the usage of the term unsafe, I'd suggest they raise their objection in this thread personally, and preferably that they also address the distinction I've made regarding memory safety.

I've made this argument several times now, but technical counter-arguments have not been forthcoming - and it seems silly to block the decision on an objection that's not backed by a technical argument, or which may even already have been retracted because of the "memory safety" distinction.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 18, 2016

@jasnell

I have no hard feeling on the need of the unsafe word in the method name itself, i.e. alloc/ualloc IMO is much better than zalloc/alloc, but alloc/unsafeAlloc or alloc/uninitializedAlloc is slightly better than alloc/ualloc.

That is valid if and only if the deprecation message (both the soft-deprecation docs one and the future hard-deprecation code one) of Buffer(number) links only to Buffer.alloc() and not to Buffer.ualloc() (so that everyone not reading the docs carefully will choose the safe method), and Buffer.alloc documentation has a reference to Buffer.ualloc, marking it as not being zero-filled and potentially unsafe (i.e. «double check that you are sure what you are doing and if you need this» type of warning).

Let's wait for what the opponents of the unsafe word in the method name will say.

@glenjamin
Copy link

I really like alloc/uninitializedAlloc, it ticks all my boxes!

  • visually distinct from the safe version
  • safer one looks nicer
  • does what it says on the tin

And for bonus points, doesn't include unsafe.

👍

@jasnell
Copy link
Member Author

jasnell commented Jan 18, 2016

Yes but uninitializedAlloc has soo many characters to type... lol... I'd be much happier with unsafeAlloc :-)

@trevnorris
Copy link
Contributor

I take offence at your insistence that usage of uninitialized memory is always unsafe. Take the following:

var b = Buffer.alloc(n).fill('abc');

or maybe this:

var b = Buffer.alloc(n);
fs.read(fd, b, 0, b.length, 0, cb);

or possibly I'm just not being lazy and actually keep track of what I'm doing. The rebuttal of having a moment of potentially "unsafe" memory is pointless. There's a potential for anything to happen if a developer isn't careful. Allowing unchecked variables to be processed from an outside source is a pretty big mess up regardless of what the results are. Don't tell me what's unsafe. Just tell me what's technically happening.

That said, alloc() and allocRaw() seems appropriate enough. While allocUninitialized() is more technically correct I'd smack someone if I had to be typing that in constantly.

@trevnorris
Copy link
Contributor

@jasnell There was a mess-up in how these EPS's are supposed to be processed. The document is supposed to start with XXX and it's numbered based on what order it lands. I'm going to actually land Fedor's first one today and get this kicked off.

@wbl
Copy link

wbl commented Jan 18, 2016

What the programer did was assume that new Buffer(object) always serializes a representation of object. That doesn't happen when object is a number. Pointless pontificating about "server-side can unleash evil" and the as-yet unsuccessful idea of sufficiently smart programmers doesn't change the underlying problem, which can be fixed by a simple change that requires no action by any programmer.

The current approach in this branch doesn't fix this. Instead it creates new constructors, which existing code will need to be rewritten to use. In the meantime existing code remains vulnerable. The funny overload behavior will remain.

@trevnorris
Copy link
Contributor

@jasnell Also before this is merged the API documentation (not full, just technical overview) should be placed directly in this EPS.

@feross
Copy link

feross commented Jan 19, 2016

I just found a few more modules affected by this issue:

Not a huge deal, since I think they're pretty hard to exploit. But wanted to share the additional data points.

@joepie91
Copy link

@jasnell I've thought about the backporting strategy a bit more, overnight. I can see the following options:

  1. We backport automatic zero-filling, back all the way to 0.10. This would likely be a justifiable change (given your earlier explanation), and would provide immediate protection for everybody using a supported version, as long as they update.

    The upside is that it does not introduce additional methods in older versions. The downside is that it fails deadly when people use an outdated version, as it will silently fail to zero-fill while userland module developers believe that it'll zero-fill correctly.

  2. We backport alloc, allocUnsafe and from, back all the way to 0.10. This would add API surface - however, per @ChALkeR's data, this should not cause any conflicts (and further, the recommendation to not modify prototypes you don't own has been around for a very long time, for precisely this kind of reason).

    This fails safe, in that if people use an outdated version of Node, it will simply throw a hard error when these methods are used, instead of quietly doing the wrong thing.

  3. We don't backport either of these, or we only do it for some of the newer versions. As explained before, this will likely cause adoption problems. Many developers in the Node.js ecosystem (unfortunately) still have an aversion towards third-party dependencies and 'bloat', so a userland implementation or polyfill would likely not work very well here - rather than defaulting to the safe thing, you'll have to convince every single individual developer to care about it. This seems like the least workable solution to me.

All of the above options assume that an opt-in flag exists (and is backported) to forcibly zero-fill all buffers, application-wide. I think we're all in agreement about that part.

@ChALkeR You say that Buffer.from() already exists - but this is not documented, at least not in the Node.js documentation. I think it would be safe to override it, for that reason?

@ChALkeR
Copy link
Member

ChALkeR commented Jan 19, 2016

@jasnell

Perhaps the more difficult part to measure is how extensive of an impact changing the default behavior of new Buffer(number) would have.

About 8% of npm modules use Buffer() directly. About 2% of npm modules use Buffer(constant number) (e.g. Buffer(10)) directly. It would be harder to count how many people actualy zero-fill manually allocated buffers (because that's often done later in the code, sometimes even manually using a loop and not through .fill(0)), but from what I have seen, a considerable number of packages does that.

I don't think that we can numerically estimate the potential damage introduced by people removing the zero-fill.

@trevnorris
Copy link
Contributor

@jasnell Note on why Buffer.from() can't be used: nodejs/node#4682 (comment)

@ChALkeR
Copy link
Member

ChALkeR commented Jan 19, 2016

Buffer.value(), Buffer.fromValue(), Buffer.create(), Buffer.val(), Buffer.make(), Buffer.toBuffer(), Buffer.init().

@jasnell
Copy link
Member Author

jasnell commented Jan 19, 2016

I've got Buffer.from working correctly in the PR with no issues. All tests
passing.
On Jan 19, 2016 2:06 PM, "Сковорода Никита Андреевич" <
[email protected]> wrote:

Buffer.value(), Buffer.fromValue(), Buffer.create(), Buffer.val(),
Buffer.make(), Buffer.toBuffer().


Reply to this email directly or view it on GitHub
#4 (comment).

@trevnorris
Copy link
Contributor

Is Buffer() eventually meant to not be a constructor? Don't think that's really possible.

@lime
Copy link

lime commented Jan 20, 2016

@joepie91: We backport automatic zero-filling, back all the way to 0.10. [...] The downside is that it fails deadly when people use an outdated version, as it will silently fail to zero-fill while userland module developers believe that it'll zero-fill correctly.

It seems like the most significant argument against a backported change to new Buffer(10) is this, that people will start depending on its new, predictable behaviour. I'm inclined to believe that this could likely happen.

However, the conclusion drawn from this has been to paint any backported change as dangerous. I think this can be revisited.

-- A silly proposal --

Consider this: the only requirement for a less-dangerous new Buffer(10) is that it is initialized with something. At no point was it a requirement that it is initialized with zeroes. It can be anything at all, and as long as it's not your private keys, we should be fine.

This was touched upon in nodejs/node#4660 (comment):

The existing API specifies that the memory is uninitialized, which means it can have any value. Zeros are a valid value.

Zeroes are indeed a valid value. So are ones. So are the bytes in the ASCII string "banana", repeated over and over. Or the current Node.js version string, followed by all zeroes. You name it.

Now don't get me wrong, zeroing out is absolutely the most sensible choice in every other way. It's backwards compatible, in the sense that all-zeroes was always a valid outcome with uninitialized memory. It's much more performant than repeating an ASCII string, or some other silly initialization. The only drawback is that it creates a new pitfall, making people wrongly assume that zeroing out happens in all versions.

If (and only if) that pitfall is determined to be a deal breaker, I propose that we initialize new Buffer(10) with something other than zeroes. This affects only the constructor, .alloc() and .allocUnsafe() do not change.

-- Why that's undesirable --

  1. Performance: at multiple occasions, there have been people proposing that nothing at all should be backported because of performance concerns. Seeing as zeroing with calloc is very likely the most performant initialization we can do, any other initialization would be difficult to justify.
  2. Determinism: just as people would start depending on the all-zeroes implementation, there will be a non-zero number of developers who (ab)use any deterministic initialization. "Need to write 'banana'? Just do new Buffer(6).toString()." This is largely a non-issue though, as long as it's different from .alloc().

-- The tradeoff --

Is the depending-on-all-zeroes pitfall so great that we can sacrifice some more performance in initialization?

@ChALkeR
Copy link
Member

ChALkeR commented Jan 20, 2016

@trevnorris

Is Buffer() eventually meant to not be a constructor? Don't think that's really possible.

But it could throw on all values. Eventually.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 20, 2016

@lime

I propose that we initialize new Buffer(10) with something other than zeroes.

That makes no actual difference with being zero-filled.

Is the depending-on-all-zeroes pitfall so great that we can sacrifice some more performance in initialization?

I do not view this (zero-fill-all-the-buffers) approach to be a performance vs security tradeoff. I view it as a bad ecosystem security vs even worse ecosystem security tradeoff.

When I mention performance concerns, I mention that those are the concerns that some people would have when updating to new Node.js (hence delaying update and staying on the unpatched version) and other people would have when deciding whether or not keep their own initialization of allocated Buffers (e.g. .fill(0), .fill(42), for loops, or whatever — I saw various types of those) on top of the default one in the updated Node.js. Combine that and it goes boom.

Also, the proponents of just-zero-fill-all-the-buffer-and-dont-touch-the-api approach are forgetting that it alone would not even fix the issue completely even in the long term (taking only patched versions of Node.js into an account).

@jasnell
Copy link
Member Author

jasnell commented Jan 20, 2016

Proposal PR has been updated to use the names alloc() and allocUnsafe()

@jasnell
Copy link
Member Author

jasnell commented Jan 20, 2016

Thinking further about the issue over the default behavior of Buffer(number) moving forward. While the deprecation of the constructor would be doc only (i.e. "soft deprecation") one concrete step we can take to begin educating users to move away from use of the constructor is print a one time warning to stderr similarly to how we print the "possible memory leak" warning when adding more than 10 listeners to an EventEmitter event. This would at least give us the ability to notify users that something is potentially amiss and should be looked at.

@joepie91
Copy link

@lime: The issue is not so much that people would expect a zero-filled Buffer, but that they would expect a safely filled Buffer - ie. not containing stuff that was previously in memory. That problem is not resolved by filling the Buffer with non-zero (or even random) data, unfortunately, and the same consequences remain applicable.

@jasnell, @trevnorris: How likely is it that .from will break (ie. tests not passing and no real way to fix it) with V8 updates? This sounds a bit hacky, although I don't really know enough about V8 internals to say that with any certainty.

Aside from that:

  • Buffer.fromValue, while a little longer, still sounds like a good alternative name to me, if it were needed.
  • Buffer.value and Buffer.val might create confusion for those who have come to expect a getter/setter function from something named that (eg. jQuery users).
  • Buffer.toBuffer sounds confusing to me (are we creating a Buffer from a Buffer? Huh?)
  • Buffer.make, Buffer.create and Buffer.init don't sound bad to me, but their purpose may not be obvious from reading the code.

While the deprecation of the constructor would be doc only (i.e. "soft deprecation") one concrete step we can take to begin educating users to move away from use of the constructor is print a one time warning to stderr similarly to how we print the "possible memory leak" warning when adding more than 10 listeners to an EventEmitter event. This would at least give us the ability to notify users that something is potentially amiss and should be looked at.

While I like this idea, we should expect some backlash from some users who don't like warnings to appear in their stderr. I've seen some people complain about this for deprecation warnings in other projects in the past. I personally don't feel that it's a major problem, though.

@lime
Copy link

lime commented Jan 20, 2016

@joepie91 & @ChALkeR: Since my proposal is slightly tongue-in-cheek, I won't make any big attempts to defend it. However, I'd like to try clarifying my intent just a bit.

The way I see it, the documented behaviour could go something like this:

  • alloc(): Allocates a zero-initialized buffer.
  • allocUnsafe(): Allocates an uninitialized buffer.
  • new Buffer(number): DEPRECATED. Allocates a buffer, making no guarantees about memory safety.

Now, the problem at hand is that documentation isn't enough. Instead of reading / remembering this documented behaviour, people will take a look at the observable behaviour of new Buffer(number) on their version, and assume it applies for all versions.

If the observable behaviour of new Buffer(number) is different from alloc() in whatever way, it's ever so slightly harder to make dangerous assumptions based on that behaviour.

@lime
Copy link

lime commented Jan 20, 2016

The suggested deprecation warning printed to stderr would do a lot to alleviate these concerns. It's much harder to make dangerous assumptions from observable behaviour, when that behaviour includes a written warning. :)

@ChALkeR
Copy link
Member

ChALkeR commented Jan 20, 2016

@lime

new Buffer(number): DEPRECATED. Allocates a buffer, making no guarantees about memory safety.

That would work only in an ideal world where everyone reads docs and follows them. But, as I already said above, it should be good to zero-fill Buffer(number) once that is hard-deprecated.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 20, 2016

The suggested deprecation warning printed to stderr would do a lot to alleviate these concerns. It's much harder to make dangerous assumptions from observable behaviour, when that behaviour includes a written warning against such assumptions. :)

Given how many of modules are currently using Buffer(), and how it would not be possible to simply switch to the new API without losing compatibility with the old Node.js versions (including, for example, 4.2.x and 5.4.1) or adding a feature-detection hack, I doubt that that warning will do anything good in the nearest future, it will likely only annoy people.

@lime
Copy link

lime commented Jan 20, 2016

That would work only in an ideal world where everyone reads docs and follows them.

I believe I specifically said, in the comment you quoted, that documentation isn't enough. That is why I suggest that new Buffer(number) should do something to differentiate it's observable behaviour from alloc().

That something can be a different kind of initialization. FWIW, I don't think this would be desirable.

That something can also be a deprecation warning, as suggested by @jasnell. Personally, I think this sounds like the best approach.

@trevnorris
Copy link
Contributor

@ChALkeR Buffer at minimum must be the Uint8Array constructor. ES6 inheritance requires this. So throwing on all values also isn't an option.

@rvagg
Copy link
Member

rvagg commented Jan 20, 2016

For completeness we need to talk about the native node::Buffer API and whether any changes are in scope or not and be explicit why not if they are not on the table. We need to be careful to set expectations properly if they are going to operate differently.

@jasnell
Copy link
Member Author

jasnell commented Jan 20, 2016

@rvagg ... Yep, I was going to turn my attention to the native side next
once it was clear we had things fairly well settled on the js side.
On Jan 20, 2016 4:08 AM, "Rod Vagg" [email protected] wrote:

For completeness we need to talk about the native node::Buffer API and
whether any changes are in scope or not and be explicit why not if they are
not on the table. We need to be careful to set expectations properly if
they are going to operate differently.


Reply to this email directly or view it on GitHub
#4 (comment).

@jasnell
Copy link
Member Author

jasnell commented Jan 25, 2016

Closed this because I realized I had screwed the pull request up. Opening a new corrected PR. Discussion can continue here, of course.

@jasnell
Copy link
Member Author

jasnell commented Jan 25, 2016

Refs: #7

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.