Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder: make write after end to reset #16594

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/api/string_decoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,13 @@ Returns a decoded string, ensuring that any incomplete multibyte characters at
the end of the `Buffer` are omitted from the returned string and stored in an
internal buffer for the next call to `stringDecoder.write()` or
`stringDecoder.end()`.

### stringDecoder.reset([encoding])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit: it seems this should go before the stringDecoder.write(), ABC-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

<!-- YAML
added: REPLACEME
-->

* `encoding` {string} The character encoding the `StringDecoder` will use.
Defaults to the current `encoding` of the decoder.

Flushes all the internal data and the decoder will behave like a new object.
27 changes: 22 additions & 5 deletions lib/string_decoder.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,36 +41,52 @@ function normalizeEncoding(enc) {
// characters.
exports.StringDecoder = StringDecoder;
function StringDecoder(encoding) {
if (encoding === this.encoding && encoding !== undefined) {
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(this._nb || 0);
this._closed = false;
return;
}

this.encoding = normalizeEncoding(encoding);
var nb;
switch (this.encoding) {
case 'utf16le':
this.text = utf16Text;
this.end = utf16End;
nb = 4;
this._nb = 4;
break;
case 'utf8':
this.fillLast = utf8FillLast;
nb = 4;
this._nb = 4;
break;
case 'base64':
this.text = base64Text;
this.end = base64End;
nb = 3;
this._nb = 3;
break;
default:
this.write = simpleWrite;
this.end = simpleEnd;
this._nb = 0;
this._closed = false;
return;
}
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(nb);
this.lastChar = Buffer.allocUnsafe(this._nb);
this._closed = false;
}

StringDecoder.prototype.reset = StringDecoder;

StringDecoder.prototype.write = function(buf) {
if (this._closed === true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== true is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we explicitly compare strictly against the value, the performance would be better right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with TurboFan, no. I asked about this in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, I think this is the previous discussion in question: #16397 (comment)

this.reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic be in StringDecoder.prototype.end instead? Make the per-encoding this.end switching in the constructor actually set to an internal symbol-named property, and use a shared StringDecoder.prototype.end that calls that internal encoding-specific function and clean up after itself. Would probably help with write performance too if StringDecoders of different encodings (e.g. UTF-8 which does not have an own property end and UTF-16LE that does) are used simultaneously.


if (buf.length === 0)
return '';

var r;
var i;
if (this.lastNeed) {
Expand Down Expand Up @@ -210,6 +226,7 @@ function utf8Text(buf, i) {
// character.
function utf8End(buf) {
const r = (buf && buf.length ? this.write(buf) : '');
this._closed = true;
if (this.lastNeed)
return r + '\ufffd';
return r;
Expand Down
30 changes: 30 additions & 0 deletions test/parallel/test-string-decoder-end.js
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,33 @@ function testBuf(encoding, buf) {
assert.strictEqual(res1, res3, 'one byte at a time should match toString');
assert.strictEqual(res2, res3, 'all bytes at once should match toString');
}

{
// test to check if the write after end doesn't accumulate the data
const decoder = new SD('utf8');
const euroPart1 = Buffer.from([0xE2]);
const euroPart2 = Buffer.from([0x82, 0xAC]);
decoder.end(euroPart1);
const result = decoder.write(euroPart2);
assert.notStrictEqual(result, '€');
}

{
// test to check if write after end reopens the decoder
const decoder = new SD();
assert.strictEqual(decoder._closed, false);
decoder.end();
assert.strictEqual(decoder._closed, true);
decoder.write(Buffer.from([0xE2]));
assert.strictEqual(decoder._closed, false);
}

{
// test to check if reset after end reopens the decoder
const decoder = new SD();
assert.strictEqual(decoder._closed, false);
decoder.end();
assert.strictEqual(decoder._closed, true);
decoder.reset();
assert.strictEqual(decoder._closed, false);
}