Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder: make write after end to reset #16594

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/api/string_decoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,16 @@ substitution characters appropriate for the character encoding.
If the `buffer` argument is provided, one final call to `stringDecoder.write()`
is performed before returning the remaining input.

### stringDecoder.reset([encoding])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an implementation detail users don't have to know about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is, there is no way for the users to reset the string decoder, right? They have to create a new instance if needed. This will enable reusability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new way to reset the state each time was not requested as far as I read the issue. So I also prefer not to expose the reset function. If I am correct the following should work.

const { StringDecoder } = require('string_decoder');
const decoder = new StringDecoder('utf8');

decoder.write(Buffer.from([0xE2, 0x82])); // => ''
decoder.end(); // => '�'
decoder.write(Buffer.of(0x61)); // => 'a'

<!-- YAML
added: REPLACEME
-->

* `encoding` {string} The character encoding the `StringDecoder` will use.
Defaults to the current `encoding` of the decoder.

Flushes all the internal data and the decoder will behave like a new object.

### stringDecoder.write(buffer)
<!-- YAML
added: v0.1.99
Expand All @@ -82,3 +92,4 @@ Returns a decoded string, ensuring that any incomplete multibyte characters at
the end of the `Buffer` are omitted from the returned string and stored in an
internal buffer for the next call to `stringDecoder.write()` or
`stringDecoder.end()`.

65 changes: 42 additions & 23 deletions lib/string_decoder.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,36 +41,43 @@ function normalizeEncoding(enc) {
// characters.
exports.StringDecoder = StringDecoder;
function StringDecoder(encoding) {
if (encoding === this.encoding && encoding !== undefined) {
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(this._nb || 0);
return;
}

this.encoding = normalizeEncoding(encoding);
var nb;
switch (this.encoding) {
case 'utf16le':
this.text = utf16Text;
this.end = utf16End;
nb = 4;
this._nb = 4;
break;
case 'utf8':
this.fillLast = utf8FillLast;
nb = 4;
this._nb = 4;
break;
case 'base64':
this.text = base64Text;
this.end = base64End;
nb = 3;
this._nb = 3;
break;
default:
this.write = simpleWrite;
this.end = simpleEnd;
this._nb = 0;
return;
}
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(nb);
this.lastChar = Buffer.allocUnsafe(this._nb);
}

StringDecoder.prototype.reset = StringDecoder;

StringDecoder.prototype.write = function(buf) {
if (buf.length === 0)
return '';

var r;
var i;
if (this.lastNeed) {
Expand All @@ -87,7 +94,7 @@ StringDecoder.prototype.write = function(buf) {
return r || '';
};

StringDecoder.prototype.end = utf8End;
StringDecoder.prototype.end = end;

// Returns only complete characters in a Buffer
StringDecoder.prototype.text = utf8Text;
Expand Down Expand Up @@ -208,9 +215,9 @@ function utf8Text(buf, i) {

// For UTF-8, a replacement character is added when ending on a partial
// character.
function utf8End(buf) {
const r = (buf && buf.length ? this.write(buf) : '');
if (this.lastNeed)
function utf8End(ctx, buf) {
const r = (buf && buf.length ? ctx.write(buf) : '');
if (ctx.lastNeed)
return r + '\ufffd';
return r;
}
Expand Down Expand Up @@ -242,11 +249,11 @@ function utf16Text(buf, i) {

// For UTF-16LE we do not explicitly append special replacement characters if we
// end on a partial character, we simply let v8 handle that.
function utf16End(buf) {
const r = (buf && buf.length ? this.write(buf) : '');
if (this.lastNeed) {
const end = this.lastTotal - this.lastNeed;
return r + this.lastChar.toString('utf16le', 0, end);
function utf16End(ctx, buf) {
const r = (buf && buf.length ? ctx.write(buf) : '');
if (ctx.lastNeed) {
const end = ctx.lastTotal - ctx.lastNeed;
return r + ctx.lastChar.toString('utf16le', 0, end);
}
return r;
}
Expand All @@ -267,10 +274,10 @@ function base64Text(buf, i) {
}


function base64End(buf) {
const r = (buf && buf.length ? this.write(buf) : '');
if (this.lastNeed)
return r + this.lastChar.toString('base64', 0, 3 - this.lastNeed);
function base64End(ctx, buf) {
const r = (buf && buf.length ? ctx.write(buf) : '');
if (ctx.lastNeed)
return r + ctx.lastChar.toString('base64', 0, 3 - ctx.lastNeed);
return r;
}

Expand All @@ -279,6 +286,18 @@ function simpleWrite(buf) {
return buf.toString(this.encoding);
}

function simpleEnd(buf) {
return (buf && buf.length ? this.write(buf) : '');
function simpleEnd(ctx, buf) {
return (buf && buf.length ? ctx.write(buf) : '');
}

const endMappings = new Map([
['utf16le', utf16End],
['utf8', utf8End],
['base64', base64End]
]);

function end(buf) {
const result = (endMappings.get(this.encoding) || simpleEnd)(this, buf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks "elegant" but using a switch case for the encoding and calling the function directly is probably faster.

this.reset(this.encoding);
return result;
}
10 changes: 10 additions & 0 deletions test/parallel/test-string-decoder-end.js
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,13 @@ function testBuf(encoding, buf) {
assert.strictEqual(res1, res3, 'one byte at a time should match toString');
assert.strictEqual(res2, res3, 'all bytes at once should match toString');
}

{
// test to check if the write after end doesn't accumulate the data
const decoder = new SD('utf8');
const euroPart1 = Buffer.from([0xE2]);
const euroPart2 = Buffer.from([0x82, 0xAC]);
decoder.end(euroPart1);
const result = decoder.write(euroPart2);
assert.notStrictEqual(result, '€');
}