-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: mark String decoder as legacy #39301
Comments
I think |
|
My bad, I mixed it with |
In a quick test, TextDecoder is an order of magnitude slower to decode a Uint8Array to a String than StringDecoder using Node 16. And I can't find anything that works faster than StringDecoder. Please don't deprecate StringDecoder until the performance is at least equivalent. |
@mojavelinux would you mind sharing a performence test case? ... with test results? We maybe won't go as far as deprecating it and eventually removing it. Have you also considered that TextEncoder takes longer cuz it can solve some problems StringDecoder dose not handle? It maybe solves something better that StringDecoder dose not do, like BOM for instance Also curios how the browserified version compares to native TextDecoder, this is the main reason why i want to see less use StringDecoder... To stop ppl from exporting node modules into browsers and increasing the bundle size too much and writing better cross compatible code string_decoder depends on Buffer so depending on https://www.npmjs.com/package/string_decoder in the browser adds a hole bunch of bloatware to your bundle, and the browser Buffer is not as fast as node's Buffer |
const { StringDecoder } = require('string_decoder');
const data = new Uint8Array([239, 187, 191, 49]) // BOM + 1(49)
// fails
JSON.parse(new StringDecoder('utf8').write(data))
JSON.parse(Buffer.from(data.buffer).toString())
// works
JSON.parse(new TextDecoder().decode(data))
new Response(data).json() I belive fetch decodes data similar to how TextDecoder dose it, more correctly according to a spec... |
A pretty basic test can demonstrate the order of magnitude change: const iterations = 10000000
const stringDecoder = new (require('string_decoder').StringDecoder)()
const textDecoder = new (require('util').TextDecoder)()
const SAMPLE = new Uint8Array([112,97,103,101,45,111,110,101,46,97,100,111,99])
function variationA (arr) {
return stringDecoder.write(arr)
}
function variationB (arr) {
return textDecoder.decode(arr)
}
let result
const variation = process.argv[2] === 'A' ? variationA : variationB
console.time(variation.name)
for (let i = 0; i < iterations; i++) {
result = variation(SAMPLE)
}
console.timeEnd(variation.name) On my machine, the results are as follows:
In Chrome, I get:
Perhaps I don't need that behavior ;) I also don't think BOM processing warrants an order of magnitude change difference in execution time. |
Missing variationA in browser |
I don't know what the API in the browser is for StringDecoder. |
something like:
// file.js
x = require('string_decoder') |
I'm fine with removing the public use of StringDecoder. Perhaps under the covers in Node.js, TextDecoder can use the StringDecoder code when the encoding is utf-8 and ignoreBOM is true. I just don't want to have to bear such a tremendous drop in performance as it will affect the performance of my application (which calls this method hundreds of thousands of times). |
All that's doing is testing the implementation in Node.js in a browser. I wouldn't expect there to be any difference since it's the same code (and the same JavaScript engine). I thought you were comparing to a native implementation in the browser. |
As I suspected, the performance is just as good in the browser for StringDecoder. In Chrome, I get:
What we see is that the performance of the native TextDecoder in Chrome is better than the TextDecoder in Node.js. But there is still a measurable difference when compared to StringDecoder. |
I modified the test a bit and run the test on some webpages with a realistic larger sample set const SAMPLE = new TextEncoder().encode(document.body.innerText) it shows variantA is much slower... (in chrome after being browserified) const iterations = 1000
const stringDecoder = new (require('string_decoder').StringDecoder)()
const textDecoder = (new TextDecoder())
const SAMPLE = new TextEncoder().encode(document.body.innerText)
function variationA (arr) {
return stringDecoder.write(arr)
}
function variationB (arr) {
return textDecoder.decode(arr)
}
let result
console.time(variationA.name)
for (let i = 0; i < iterations; i++) {
result = variationA(SAMPLE)
}
console.timeEnd(variationA.name)
console.time(variationB.name)
for (let i = 0; i < iterations; i++) {
result = variationB(SAMPLE)
}
console.timeEnd(variationB.name) on nodejs.org innerText variationA: 31.10400390625 ms |
here is my browser bundle: https://pastebin.com/DDMKi3yH (for those who do not want to run browserify and wants to test it out for themself) the test are at the bottom, the fact that it requires ~2400 lines of code in the browser and run much slower is exactly the reason why i want to discourage use of it EDIT: here is a cdn solution: const {StringDecoder} = await import('https://jspm.dev/string_decoder')
const iterations = 1000
const stringDecoder = new StringDecoder()
const textDecoder = new TextDecoder()
const SAMPLE = new TextEncoder().encode(document.body.innerText)
function variationA (arr) {
return stringDecoder.write(arr)
}
function variationB (arr) {
return textDecoder.decode(arr)
}
let result
console.time(variationA.name)
for (let i = 0; i < iterations; i++) {
result = variationA(SAMPLE)
}
console.timeEnd(variationA.name)
console.time(variationB.name)
for (let i = 0; i < iterations; i++) {
result = variationB(SAMPLE)
}
console.timeEnd(variationB.name) |
I can confirm that TextDecoder does seem to perform better for larger sample sizes. Perhaps there is a problem with the implementation that it performs so poorly for a small sample size. I don't know enough at this point to begin to be able to explain what's going on. These are certainly useful observations. |
I did other runs (of #39301 (comment)) with the code from master instead of a release and here's what I get: The 60% of "self time" in Line 432 in e2148d7
|
Marking an API as legacy is not the same as deprecating it. APIs that are marked legacy are unlikely to ever change or be deprecated. |
Thanks for pointing that out. Sorry for creating any confusion by mixing up the terminology. |
The
This feature is very convenient when printing outputs byte by byte, for example when implementing the streaming interface of LLM inference. |
You can do that with
|
📗 API Reference Docs Problem
https://nodejs.org/docs/latest-v16.x/api/string_decoder.html
Description
Like with util inherit and querystring can you also mark string_decoder as legacy and say something in terms of:
Would be best for cross platform coding... Maybe even possible deprecate it?
The text was updated successfully, but these errors were encountered: