Skip to content

Commit

Permalink
fixup! doc: improve Buffer documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
addaleax committed Mar 7, 2020
1 parent 5eac91a commit 9c85a8a
Showing 1 changed file with 77 additions and 54 deletions.
131 changes: 77 additions & 54 deletions doc/api/buffer.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
In Node.js, `Buffer` objects are used to represent binary data in the form
of a sequence of bytes. Many Node.js APIs, for example streams and file system
operations, support `Buffer`s.
operations, support `Buffer`s, as interactions with the operating system or
other processes generally always happen in terms of binary data.

The `Buffer` class is a subclass of the [`Uint8Array`][] class that is built
into the JavaScript language. A number of additional methods are supported
Expand All @@ -25,7 +26,8 @@ would need to ever use `require('buffer').Buffer`.
// Creates a zero-filled Buffer of length 10.
const buf1 = Buffer.alloc(10);

// Creates a Buffer of length 10, filled with bytes which all have the value 1.
// Creates a Buffer of length 10,
// filled with bytes which all have the value `1`.
const buf2 = Buffer.alloc(10, 1);

// Creates an uninitialized buffer of length 10.
Expand All @@ -35,14 +37,20 @@ const buf2 = Buffer.alloc(10, 1);
// contents.
const buf3 = Buffer.allocUnsafe(10);

// Creates a Buffer containing the bytes [0x1, 0x2, 0x3].
// Creates a Buffer containing the bytes [1, 2, 3].
const buf4 = Buffer.from([1, 2, 3]);

// Creates a Buffer containing the UTF-8 bytes [0x74, 0xc3, 0xa9, 0x73, 0x74].
const buf5 = Buffer.from('tést');
// Creates a Buffer containing the bytes [1, 1, 1, 1] – the entries
// are all truncated using `(value & 255)` to fit into the range 0–255.
const buf5 = Buffer.from([257, 257.5, -255, '1']);

// Creates a Buffer containing the UTF-8-encoded bytes for the string 'tést':
// [0x74, 0xc3, 0xa9, 0x73, 0x74] (in hexadecimal notation)
// [116, 195, 169, 115, 116] (in decimal notation)
const buf6 = Buffer.from('tést');

// Creates a Buffer containing the Latin-1 bytes [0x74, 0xe9, 0x73, 0x74].
const buf6 = Buffer.from('tést', 'latin1');
const buf7 = Buffer.from('tést', 'latin1');
```

## Buffers and Character Encodings
Expand Down Expand Up @@ -77,26 +85,26 @@ console.log(Buffer.from('fhqwhgads', 'utf16le'));
The character encodings currently supported by Node.js are the following:

* `'utf8'`: Multi-byte encoded Unicode characters. Many web pages and other
document formats use UTF-8. This is the default character encoding.
document formats use [UTF-8][]. This is the default character encoding.
When decoding a `Buffer` into a string that does not exclusively contain
valid UTF-8 data, the Unicode replacement character `U+FFFD` � will be used
to represent those errors.

* `'utf16le'`: Multi-byte encoded Unicode characters. Unlike `'utf8'`, each
character in the string will be encoded using either 2 or 4 bytes.
Node.js only supports the little-endian variant of UTF-16.
Node.js only supports the [little-endian][endianness] variant of [UTF-16][].

* `'latin1'`: Latin-1 stands for ISO-8859-1. This character encoding only
* `'latin1'`: Latin-1 stands for [ISO-8859-1][]. This character encoding only
supports the Unicode characters from `U+0000` to `U+00FF`. Each character is
encoded using a single byte. Characters that do not fit into that range are
truncated and will be mapped to characters in that range.

Node.js also supports the following two binary-to-text encodings. For
binary-to-text encodings, converting a `Buffer` into a string is typically
referred to as encoding, rather than decoding as is in the case of character
encodings like the ones listed above, and vice versa.
referred to as encoding. In the case of character encodings, like the ones
listed above, the naming is reversed.

* `'base64'`: Base64 encoding. When creating a `Buffer` from a string,
* `'base64'`: [Base64][] encoding. When creating a `Buffer` from a string,
this encoding will also correctly accept "URL and Filename Safe Alphabet" as
specified in [RFC 4648, Section 5][].

Expand All @@ -106,10 +114,10 @@ encodings like the ones listed above, and vice versa.

The following legacy character encodings are also supported:

* `'ascii'`: For 7-bit ASCII data only. When encoding a string into a `Buffer`,
this is equivalent to using `'latin1'`. When decoding a `Buffer` into a
string, using encoding this will additionally unset the highest bit of each
byte before decoding as `'latin1'`.
* `'ascii'`: For 7-bit [ASCII][] data only. When encoding a string into a
`Buffer`, this is equivalent to using `'latin1'`. When decoding a `Buffer`
into a string, using encoding this will additionally unset the highest bit of
each byte before decoding as `'latin1'`.
Generally, there should be no reason to use this encoding, as `'utf8'`
(or, if the data is known to always be ASCII-only, `'latin1'`) will be a
better choice when encoding or decoding ASCII-only text. It is only provided
Expand Down Expand Up @@ -151,8 +159,11 @@ changes:
description: The `Buffer`s class now inherits from `Uint8Array`.
-->

`Buffer` instances are also [`Uint8Array`][] instances. However, there are
subtle incompatibilities with [`TypedArray`][].
`Buffer` instances are also [`Uint8Array`][] instances, which is the language’s
built-in class for working with binary data. [`Uint8Array`][] in turn is a
subclass of [`TypedArray`][]. Therefore, all [`TypedArray`][] methods are also
available on `Buffer`s. However, there are subtle incompatibilities between
the `Buffer` API and the [`TypedArray`][] API.

In particular:

Expand Down Expand Up @@ -753,8 +764,10 @@ The index operator `[index]` can be used to get and set the octet at position
range is between `0x00` and `0xFF` (hex) or `0` and `255` (decimal).

This operator is inherited from `Uint8Array`, so its behavior on out-of-bounds
access is the same as `Uint8Array`. In other words, getting returns `undefined`
and setting does nothing for out-of-bounds indices.
access is the same as `Uint8Array`. In other words, `buf[index]` returns
`undefined` when `index` is negative or `>= buf.length`, and
`buf[index] = value` does not modify the buffer if `index` is negative or
`>= buf.length`.

```js
// Copy an ASCII string into a `Buffer` one byte at a time.
Expand Down Expand Up @@ -902,8 +915,9 @@ added: v0.1.90
Copies data from a region of `buf` to a region in `target`, even if the `target`
memory region overlaps with `buf`.

[`TypedArray#set()`][] performs a similar operation, and is available for all
TypedArrays, including Node.js `Buffer`s.
[`TypedArray#set()`][] performs the same operation, and is available for all
TypedArrays, including Node.js `Buffer`s, although it takes different
function arguments.

```js
// Create two `Buffer` instances.
Expand All @@ -917,6 +931,8 @@ for (let i = 0; i < 26; i++) {

// Copy `buf1` bytes 16 through 19 into `buf2` starting at byte 8 of `buf2`.
buf1.copy(buf2, 8, 16, 20);
// This is equivalent to:
// buf2.set(buf1.subarray(16, 20), 8);

console.log(buf2.toString('ascii', 0, 25));
// Prints: !!!!!!!!qrst!!!!!!!!!!!!!
Expand Down Expand Up @@ -1326,7 +1342,7 @@ added: v12.0.0
* Returns: {bigint}

Reads a signed 64-bit integer from `buf` at the specified `offset` with
the specified endianness (`readBigInt64BE()` reads as big endian,
the specified [endianness][] (`readBigInt64BE()` reads as big endian,
`readBigInt64LE()` reads as little endian).

Integers read from a `Buffer` are interpreted as two's complement signed values.
Expand All @@ -1342,7 +1358,7 @@ added: v12.0.0
* Returns: {bigint}

Reads an unsigned 64-bit integer from `buf` at the specified `offset` with
the specified endianness (`readBigUInt64BE()` reads as big endian,
the specified [endianness][] (`readBigUInt64BE()` reads as big endian,
`readBigUInt64LE()` reads as little endian).

```js
Expand Down Expand Up @@ -1371,7 +1387,7 @@ changes:
* Returns: {number}

Reads a 64-bit double from `buf` at the specified `offset` with the specified
endianness (`readDoubleBE()` reads as big endian, `readDoubleLE()` reads as
[endianness][] (`readDoubleBE()` reads as big endian, `readDoubleLE()` reads as
little endian).

```js
Expand Down Expand Up @@ -1401,7 +1417,7 @@ changes:
* Returns: {number}

Reads a 32-bit float from `buf` at the specified `offset` with the specified
endianness (`readFloatBE()` reads as big endian, `readFloatLE()` reads as
[endianness][] (`readFloatBE()` reads as big endian, `readFloatLE()` reads as
little endian).

```js
Expand Down Expand Up @@ -1460,7 +1476,7 @@ changes:
* Returns: {integer}

Reads a signed 16-bit integer from `buf` at the specified `offset` with
the specified endianness (`readInt16BE()` reads as big endian,
the specified [endianness][] (`readInt16BE()` reads as big endian,
`readInt16LE()` reads as little endian).

Integers read from a `Buffer` are interpreted as two's complement signed values.
Expand Down Expand Up @@ -1492,7 +1508,7 @@ changes:
* Returns: {integer}

Reads a signed 32-bit integer from `buf` at the specified `offset` with
the specified endianness (`readInt32BE()` reads as big endian,
the specified [endianness][] (`readInt32BE()` reads as big endian,
`readInt32LE()` reads as little endian).

Integers read from a `Buffer` are interpreted as two's complement signed values.
Expand Down Expand Up @@ -1585,7 +1601,7 @@ changes:
* Returns: {integer}

Reads an unsigned 16-bit integer from `buf` at the specified `offset` with
the specified endianness (`readUInt16BE()` reads as big endian, `readUInt16LE()`
the specified [endianness][] (`readUInt16BE()` reads as big endian, `readUInt16LE()`
reads as little endian).

```js
Expand Down Expand Up @@ -1619,7 +1635,7 @@ changes:
* Returns: {integer}

Reads an unsigned 32-bit integer from `buf` at the specified `offset` with
the specified endianness (`readUInt32BE()` reads as big endian,
the specified [endianness][] (`readUInt32BE()` reads as big endian,
`readUInt32LE()` reads as little endian).

```js
Expand Down Expand Up @@ -2005,9 +2021,9 @@ added: v12.0.0
satisfy: `0 <= offset <= buf.length - 8`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with the specified endianness
(`writeBigInt64BE()` writes as big endian, `writeBigInt64LE()` writes as little
endian).
Writes `value` to `buf` at the specified `offset` with the specified
[endianness][] (`writeBigInt64BE()` writes as big endian, `writeBigInt64LE()`
writes as little endian).

`value` is interpreted and written as a two's complement signed integer.

Expand All @@ -2031,7 +2047,7 @@ added: v12.0.0
satisfy: `0 <= offset <= buf.length - 8`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with specified endianness
Writes `value` to `buf` at the specified `offset` with specified [endianness][]
(`writeBigUInt64BE()` writes as big endian, `writeBigUInt64LE()` writes as
little endian).

Expand Down Expand Up @@ -2060,10 +2076,10 @@ changes:
satisfy `0 <= offset <= buf.length - 8`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with the specified endianness
(`writeDoubleBE()` writes as big endian, `writeDoubleLE()` writes as little
endian). `value` must be a JavaScript number. Behavior is undefined when
`value` is anything other than a JavaScript number.
Writes `value` to `buf` at the specified `offset` with the specified
[endianness][] (`writeDoubleBE()` writes as big endian, `writeDoubleLE()` writes
as little endian). `value` must be a JavaScript number. Behavior is undefined
when `value` is anything other than a JavaScript number.

```js
const buf = Buffer.allocUnsafe(8);
Expand Down Expand Up @@ -2095,7 +2111,7 @@ changes:
satisfy `0 <= offset <= buf.length - 4`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with specified endianness
Writes `value` to `buf` at the specified `offset` with specified [endianness][]
(`writeFloatBE()` writes as big endian, `writeFloatLE()` writes as little
endian). `value` must be a JavaScript number. Behavior is undefined when
`value` is anything other than a JavaScript number.
Expand Down Expand Up @@ -2161,9 +2177,9 @@ changes:
satisfy `0 <= offset <= buf.length - 2`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with the specified endianness
(`writeInt16BE()` writes as big endian, `writeInt16LE()` writes as little
endian). `value` must be a valid signed 16-bit integer. Behavior is
Writes `value` to `buf` at the specified `offset` with the specified
[endianness][] (`writeInt16BE()` writes as big endian, `writeInt16LE()` writes
as little endian). `value` must be a valid signed 16-bit integer. Behavior is
undefined when `value` is anything other than a signed 16-bit integer.

`value` is interpreted and written as a two's complement signed integer.
Expand Down Expand Up @@ -2194,9 +2210,9 @@ changes:
satisfy `0 <= offset <= buf.length - 4`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with the specified endianness
(`writeInt32BE()` writes aS big endian, `writeInt32LE()` writes AS little
endian). `value` must be a valid signed 32-bit integer. Behavior is
Writes `value` to `buf` at the specified `offset` with the specified
[endianness][] (`writeInt32BE()` writes aS big endian, `writeInt32LE()` writes
as little endian). `value` must be a valid signed 32-bit integer. Behavior is
undefined when `value` is anything other than a signed 32-bit integer.

`value` is interpreted and written as a two's complement signed integer.
Expand Down Expand Up @@ -2294,9 +2310,9 @@ changes:
satisfy `0 <= offset <= buf.length - 2`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with the specified endianness
(`writeUInt16BE()` writes as big endian, `writeUInt16LE()` writes as little
endian). `value` must be a valid unsigned 16-bit integer. Behavior is
Writes `value` to `buf` at the specified `offset` with the specified
[endianness][] (`writeUInt16BE()` writes as big endian, `writeUInt16LE()` writes
as little endian). `value` must be a valid unsigned 16-bit integer. Behavior is
undefined when `value` is anything other than an unsigned 16-bit integer.

```js
Expand Down Expand Up @@ -2331,9 +2347,9 @@ changes:
satisfy `0 <= offset <= buf.length - 4`. **Default:** `0`.
* Returns: {integer} `offset` plus the number of bytes written.

Writes `value` to `buf` at the specified `offset` with the specified endianness
(`writeUInt32BE()` writes as big endian, `writeUInt32LE()` writes as little
endian). `value` must be a valid unsigned 32-bit integer. Behavior is
Writes `value` to `buf` at the specified `offset` with the specified
[endianness][] (`writeUInt32BE()` writes as big endian, `writeUInt32LE()` writes
as little endian). `value` must be a valid unsigned 32-bit integer. Behavior is
undefined when `value` is anything other than an unsigned 32-bit integer.

```js
Expand Down Expand Up @@ -2667,11 +2683,12 @@ performed.

For example, if an attacker can cause an application to receive a number where
a string is expected, the application may call `new Buffer(100)`
instead of `new Buffer("100")`, it will allocate a 100 byte buffer instead
instead of `new Buffer("100")`, leading it to allocate a 100 byte buffer instead
of allocating a 3 byte buffer with content `"100"`. This is commonly possible
using JSON API calls. Since JSON distinguishes between numeric and string types,
it allows injection of numbers where a naive application might expect to always
receive a string. Before Node.js 8.0.0, the 100 byte buffer might contain
it allows injection of numbers where a naively written application that does not
validate its input sufficiently might expect to always receive a string.
Before Node.js 8.0.0, the 100 byte buffer might contain
arbitrary pre-existing in-memory data, so may be used to expose in-memory
secrets to a remote attacker. Since Node.js 8.0.0, exposure of memory cannot
occur because the data is zero-filled. However, other attacks are still
Expand Down Expand Up @@ -2784,5 +2801,11 @@ introducing security vulnerabilities into an application.
[`buffer.constants.MAX_STRING_LENGTH`]: #buffer_buffer_constants_max_string_length
[`buffer.kMaxLength`]: #buffer_buffer_kmaxlength
[`util.inspect()`]: util.html#util_util_inspect_object_options
[ASCII]: https://en.wikipedia.org/wiki/ASCII
[Base64]: https://en.wikipedia.org/wiki/Base64
[ISO-8859-1]: https://en.wikipedia.org/wiki/ISO-8859-1
[UTF-8]: https://en.wikipedia.org/wiki/UTF-8
[UTF-16]: https://en.wikipedia.org/wiki/UTF-16
[binary strings]: https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary
[endianness]: https://en.wikipedia.org/wiki/Endianness
[iterator]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols

0 comments on commit 9c85a8a

Please sign in to comment.