-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream: Readable batched iteration #34207
Comments
@nodejs/streams @benjamingr |
Another possible implementation with maybe better performance:
stream.read(0);
const buffer = state.length > 0 ? state.buffer.splice(0, batchLen) : null;
if (buffer) yield buffer;
else ... Though then it becomes a question on whether it's an array or just an iterable. |
I think this is great. what do you think @jasnell? |
Giving this further thought, If you want there always to be max concurrency we could also go for a pattern like this: for await (const result of Readable.parallel(stream, async (chunk) => {
// Process chunk
return String(chunk).toUpperCase()
}, 128)) {
// Result chunk in order
}
for await (const result of Readable.unorderedParallel(stream, async (chunk) => {
// Process chunk
return String(chunk).toUpperCase()
}, 128)) {
// Result chunk not in order
} Or something... for await (const chunks of pipeline(
Readable.from([1,2,3]),
Readable.parallel(async function* (chunk) {
yield chunk * 2
}, 128),
// [2,4,6]
Readable.batched(async function* (chunks) {
yield chunks
}, 128),
// [[2,4,6]]
Readable.parallel(async function* (chunks) {
yield* chunks
yield* chunks
}, 128),
// [2,4,6,2,4,6]
Readable.batched(128),
// [[2,4,6,2,4,6]]
)) {
for (const chunk of chunks) {
console.log(chunk)
}
} |
Anyway, going back to to original proposal I think it's a good idea to put the stream parameter as the last argument, i.e. |
I would use an option-object instead. |
How would this effectively differ from increasing the readable HWM? The |
Would differ quite a bit in objectMode.
What I suggest here would basically bring the HWM buffer from the stream directly into the iteration. |
Just want to pitch in my 2 cents, instead of fixed batch size, how about accepting size dynamically? The syntax is actually quite simple: const { asyncReadable } = require('async-readable');
const stream = createReadStream('./sample.gif');
const { read, off } = asyncReadable(stream);
const [ G, I, F, EIGHT ] = await read(4);
const [ SEVEN_OR_NINE, A ] = await read(2);
const width = (await read(2)).readUInt16LE(0);
const height = (await read(2)).readUInt16LE(0);
off();
console.info({ width, height }); Based on that, Async Generator could also comes in play: const { toReadableStream } = require('async-readable');
const chop = toReadableStream(async function* ({ read }) {
while (true) {
const size = rand(1, 9);
const chunk = await read(size);
yield { size, chunk };
}
});
for await (const { size, chunk } of chop(stream)) {
console.info({ size, chunk });
} More realistic examples are like 1) bitcoin block parser, or 2) socket5 client negotiator. If you were interested, the underlying implementations are only around 60 lines of code, not sure it would be a good fit in core, would love to have any feedback, thanks. edit: code sample correction |
I'm not sure I see how that differs from my example? for await (const { size, chunk } of chop(stream)) {
console.info({ size, chunk }); // size === length, chunk == items
} vs for await (const chunks of Readable.batched(128, stream)) {
console.info(chunks); // chunks.length === length, chunks == items
} |
Its |
Ah, I think that's a different topic though? |
Sorry I don't quite follow, the topic here been around batch reading methodology and its implementation, no? |
Here is another case where this would be useful: const file = await fs.open()
try {
for await (const chunks of Readable.batched(source)) {
await file.writev(chunks)
}
} finally {
await file.close()
} |
whatwg talked about having a extendable queuingStrategy
|
@jimmywarting that would not help in this case. A significant part of the cost is having a promise per chunk. Essentially we'll need to have an alternative async iteration implementation that returns an array. In node/lib/internal/streams/readable.js Line 1105 in 537da19
|
I'm honestly fine with just landing one of the APIs Robert mentioned above as experimental and going from there they seem adequate. |
I think we should just add this in. |
This is a continuation of #34035 and the promises session we had on OpenJS about async iteration performance of streams. One alternative discussed was to batch reading.
I was thinking we could do something along the lines of:
Which would make the following possible:
It's still not perfect since if one element takes very long it would reduce concurrency. However, it would still be a step forward. Also reducing the async iteration overhead.
The text was updated successfully, but these errors were encountered: