stream: added experimental support for for-await #17755

mcollina · 2017-12-19T12:36:12Z

Adds support for Symbol.asyncIterator into the Readable class.

Fixes: #15709

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

stream

mcollina · 2017-12-19T12:37:25Z

cc @vsemozhetbyt @Fishrock123 @benjamingr @calvinmetcalf @bmeurer

mcollina · 2017-12-19T12:38:21Z

Currently make lint is not passing, as it does not recognize for await as valid JS. I'm not sure how to make it understand that, can someone help me?

targos · 2017-12-19T12:48:38Z

Currently make lint is not passing, as it does not recognize for await as valid JS. I'm not sure how to make it understand that, can someone help me?

ESLint only supports ES features when they reach stage 4. The async iteration proposal is still at stage 3, so we would need to install babel-eslint: https://github.com/babel/babel-eslint

benjamingr

Very nice start - I see a bigger issue though, this approach should work with for...await loops but async iterators in general have no guarantee.

We need to deal with backpressure here :)

benjamingr · 2017-12-19T12:52:18Z

doc/api/stream.md

@@ -1159,6 +1159,31 @@ readable stream will release any internal resources.
 Implementors should not override this method, but instead implement
 [`readable._destroy`][readable-_destroy].

+##### readable[Symbol.asyncIterator]


Offtopic: I almost want to weep at how nice this is, I've seen it before (with the package and when we tested this a year ago) but finally having I/O that looks like Python's async/io and performs well is awesome - no more callbacks everywhere, clean asynchronous code :)

benjamingr · 2017-12-19T12:54:38Z

doc/api/stream.md

+print(fs.createReadStream('file')).catch(console.log);
+```
+
+If you break or throw from within the for-await loop, the stream will be


We ~~might want to~~ should elaborate exactly what methods breaking and throwing call on the underlying ReadableStream.

I mean:

for await (const k of readable) { break; }

this would call return() on the AsyncIterator which will call .destroy().
This is debatable, as after .destroy() the readable cannot be used anymore.

However:

for await (const k of readable) { throw new Error('kaboom'); }

would call return() as well on the AsyncIterator which will call .destroy().
This is the correct behavior, otherwise we would want to do all the time:

try { for await (const k of readable) { throw new Error('kaboom') } } finally { readable.destroy() }

Which will be prone to file descriptor leaking.

I'm not aware of a way to distinguish these two flows.

Should I document it in this way? Do you agree with the behavior?

I agree with the behavior 100%, once you've started consuming an async iterator for a for await loop it should probably close if I break.

Pinging @zenparsing - I remember a lot of discussion about this - what should be the behavior in your opinion?

As for documentation - that is exactly what I meant - we should document that destroy will be called (rather than that the stream is destroyed) since it is a better guarantee for people subclassing ReadableStream.

Calling destroy is consistent with how async generator functions behave after calling return on their iterators, so 👍 .

benjamingr · 2017-12-19T12:58:59Z

lib/internal/streams/async_iterator.js

+const { promisify } = require('util');
+
+class Item {
+  constructor(value, done) {


Pinging @caitp and @bmeurer - it would be interesting to know if we can avoid this being so expensive (allocating an object for the value of the iteration). In regular iterators V8 optimizes away for..of to regular iteration and other nice optimizations.

I think we should get a sense of how hard/easy this is to optimize, and if it's hard consider recycling objects for the iterator here which is dangerous but might be the only way we get reasonable performance outside of "scripting".

Or at least, do away with the done slot for the vast majority of objects.

why not just use plain object with 2 fields? Why everything has to be a class?

@YurySolovyov not everything has to be a class, but in this particular case there are several benefits in naming the objects:

When looking at heap allocation profiles, it makes them easier to recognize (rather than just Object).

When debugging and looking at stack traces, you get more useful information since the objects are named.

As a platform, this makes naming objects appealing. I'm not sure it's worth it but it's definitely a reasonable call. I would name it AsyncIteratorRecord to be ore similar to the spec.

This shouldn't have a big perf impact either way. I looked at perf of object / class instance creation in my work on nextTick and it's negligible.

I don't like the idea of reusing it as that means the returned object can no longer be stored.

I would also add that it make sure that objects created in multiple places maintain the same shape, this helps V8 in the optimization process.

do away with the done slot for the vast majority of objects.

We can't do that if we want to conform to the spec for what next should return.

The one suggestion I might have here is making done = false so we don't have to repeat that everywhere. It's the default state after all...

@apapirovski I meant - we can put done = false on the prototype but I suspect it'll be slow (and might not conform).

I want this to eventually be fast to be useful :)

benjamingr · 2017-12-19T13:01:33Z

lib/internal/streams/async_iterator.js

+
+    stream.on('end', () => {
+      if (this.lastResolve !== null) {
+        this.lastResolve(new Item(null, true));


I'm fine with this design decision (emitting true on the iterator as a separate value with null). Just making sure it's explicitly fine.

benjamingr · 2017-12-19T13:03:47Z

lib/internal/streams/async_iterator.js

+    // destroy(err, cb) is a private API
+    // we can guarantee we have that here, because we control the
+    // Readable class this is attached to
+    const destroy = promisify(this.stream.destroy.bind(this.stream));


We should check if this is reasonably fast.

This is not called often, only with break or throw within the loop.

I wouldn't say that breaking inside a loop is an edge case, note this is also called when you return inside a loop which is also pretty common.

benjamingr · 2017-12-19T13:03:56Z

lib/internal/streams/async_iterator.js

+    // Readable class this is attached to
+    const destroy = promisify(this.stream.destroy.bind(this.stream));
+    await destroy(null);
+    return new Item(null, true);


benjamingr · 2017-12-19T13:11:26Z

So this is a very nice start, but I'm not sure we work for a lot of cases where the async iterator isn't consumed in a for await.

Can you add tests for consuming streams without for..await, calling next() several times before the previous next call resolved and seeing what happens when destroy is called when we still have pending promises for next?

apapirovski

This seems great to me. Scattered some actionable and non-actionable comments.

I'm not really clear on the back-pressure concerns brought up, it seems to work similar to consuming a stream the normal way. Maybe I'm misunderstanding something...

apapirovski · 2017-12-19T13:37:49Z

lib/internal/streams/async_iterator.js

+      process.nextTick(readAndResolve, this);
+    });
+
+    stream.on('end', () => {


I think, in general, bind would be better here for performance.

That said, this makes me wish we already had WeakRefs in JS because we could just make stream[kAsyncIterator] and get rid of closures altogether.

apapirovski · 2017-12-19T13:41:28Z

lib/internal/streams/async_iterator.js

+const { promisify } = require('util');
+
+class Item {
+  constructor(value, done) {


This shouldn't have a big perf impact either way. I looked at perf of object / class instance creation in my work on nextTick and it's negligible.

I don't like the idea of reusing it as that means the returned object can no longer be stored.

apapirovski · 2017-12-19T13:48:21Z

lib/internal/streams/async_iterator.js

+      if (data) {
+        resolve(new Item(data, false));
+      } else if (this.lastResolve !== null) {
+        throw new Error('next can be called only once');


Hmmm... given that readable is handled on nextTick, it seems like it could be possible to have this.lastResolve !== null and at the same time have data. Maybe that condition should go first, before even calling read()? (And yes, I know, obscure edge case...)

apapirovski · 2017-12-19T13:51:52Z

lib/internal/streams/async_iterator.js

+    this.lastResolve = null;
+    this.lastReject = null;
+    this.error = null;
+    this.ended = false;


Could we store all of these in a way that's not publicly accessible? Whenever anything is made unintentionally public, we usually regret it later. 😞

I'm actually considering a ReadableAsyncInterator a private object, meaning it's consumed by for await, and it's not really user facing. Should we treat it as user-facing? I will replace those with symbols then.

Right, but it can be returned by readable[Symbol.asyncIterator] so it's not truly private. I think the fact that it's really easy to get it and there might be legitimate use cases for it, makes me uneasy about exposing these props publicly.

Regarding the "instance creation" comment above (can't comment on it for some reason) - the scary part isn't creating the objects it's the GC afterwards :)

apapirovski · 2017-12-19T13:55:37Z

lib/_stream_readable.js

+var warningEmitted = false;
+Readable.prototype[Symbol.asyncIterator] = function() {
+  if (!warningEmitted) {
+    process.emitWarning(


There's an emitExperimentalWarning somewhere in internal utils. We should start using it since it avoids needing to track warningEmitted. If it needs more features, we should extend it as needed.

apapirovski · 2017-12-19T13:59:08Z

lib/internal/streams/async_iterator.js

+const { promisify } = require('util');
+
+class Item {
+  constructor(value, done) {


do away with the done slot for the vast majority of objects.

We can't do that if we want to conform to the spec for what next should return.

apapirovski · 2017-12-19T14:00:51Z

calling next() several times before the previous next call resolved

That would throw.

mcollina · 2017-12-19T14:03:07Z

@benjamingr the current code works only with for await, it's not built for general usage. I went for the simplest possible code that could work there.

I'm not familiar with the tc39 proposal format, and I couldn't point where it is defined the behavior attached to the iterator object regarding things like backpressure and such. At this point, only the latest promise returned by next() will be resolved as expected. Handling next() call in parallel is not currently not supported.

I'm keen in not implementing a backpressure mechanism which will be needed to support multiple parallel next() calls. It would need a significant amount of work to be optimized, and we will likely loose a lot of throughput there.

Supporting only one next() would likely guarantee the maximum performance, which is a problem of async iterators.

Can you point me to cases where having parallel next() support can be useful for Readable?

apapirovski · 2017-12-19T14:15:28Z

I'm not familiar with the tc39 proposal format, and I couldn't point where it is defined the behavior attached to the iterator object regarding things like backpressure and such. At this point, only the latest promise returned by next() will be resolved as expected. Handling next() call in parallel is not currently not supported.

The spec calls for an internal queue that tracks outstanding next calls. So each next would still return a Promise that would await the one prior to it, then we could store them all in a singly linked list. Maybe there's a better way tho...

That said, I don't really know a practical situation where this would be useful...

benjamingr · 2017-12-19T14:47:26Z

@benjamingr the current code works only with for await, it's not built for general usage. I went for the simplest possible code that could work there.

I see, but it is not a valid implementation of Symbol.asyncIterator in general - I can help you with any questions you have about the spec and we can iterate it.

Note that it's also fine to make the for...await version fast and the non-for...await version slower.

I'm not familiar with the tc39 proposal format, and I couldn't point where it is defined the behavior attached to the iterator object regarding things like backpressure and such.

With iterators unlike push-streams or observables backpressure is actually very easy since the consumer is the one asking for items. Instead of things being pushed to the consumer they are explicitly asked for with next.

At this point, only the latest promise returned by next() will be resolved as expected. Handling next() call in parallel is not currently not supported.

Yes, although it's not supposed to be much more work:

When next is called, if lastResolve is null - do exactly what you're doing now.
Otherwise, push it to a queue (slow case)
Resolve things in the queue in order when data becomes available from the stream.

This should remain fast while supporting the spec in its entirety.

I'm excited about this feature and the PR and I feel very strongly about giving users an API that would behave like a normal AsyncIterator.

benjamingr · 2017-12-19T14:52:00Z

@apapirovski

That said, I don't really know a practical situation where this would be useful...

Actually, come to think of it I think we might get away with rejecting the promise with an OutOfOrderIterationError if you call next() before the previous next resolved. I'll check what Python does.

As long as it's not ignored I guess we can say this is expected behavior with readable streams and document it.

It's not as good user experience but it's much better than what we do here and arguably simpler than supporting waiting for multiple values. It should also be tested.

apapirovski · 2017-12-19T14:55:49Z

@benjamingr @mcollina Here's a quick PoC of what I was talking about: c875223

It's not ready for usage or anything but it works as expected. Could likely be optimized quite a bit. Might have bugs.

mcollina · 2017-12-19T14:58:37Z

Actually, come to think of it I think we might get away with rejecting the promise with an OutOfOrderIterationError if you call next() before the previous next resolved. I'll check what Python does.

I will do the OutOfOrderIterationError for now. I prefer to avoid to create that many promises.
@apapirovski in your code we are creating one for each async function and one when we do new Promise().

As I said, the next step is writing a benchmark, so we can make those tradeoffs with informed numbers.

benjamingr · 2017-12-19T14:58:48Z

@apapirovski I'm not sure why we'd want a LinkedList implementation for this rather than just an array. It's not faster (we always push to the end), it's more allocations, less optimizable and less cache local. We actually want a deque most likely - but we can totally just use an array here - if we're concerned about shift() we can increment a counter instead of .shift()ing and setting it to 0 when the queue is empty.

mcollina · 2017-12-19T15:00:01Z

@benjamingr OutOfOrderIterationError is not currently defined on master.

benjamingr · 2017-12-19T15:01:31Z

@mcollina I just liked that name - sorry for being confusing. Such an error would have no meaning on a regular iterator - I was just suggesting an error name.

apapirovski · 2017-12-19T15:02:58Z

I'm not sure why we'd want a LinkedList implementation for this rather than just an array. It's not faster (we always push to the end), it's more allocations, less optimizable and less cache local. We actually want a deque most likely - but we can totally just use an array here - if we're concerned about shift() we can increment a counter instead of .shift()ing and setting it to 0 when the queue is empty.

The same reason we switched to using LinkedList in nextTick. Doing both shift & push on an Array is worse for performance. If instead we're only clearing the Array when the queue is done then we run into the same issues as on nextTick where we can run out of memory. (Never underestimate users' ability to do things that make no sense like calling nextTick 1e8 times... or, in this case, .next().)

@apapirovski in your code we are creating one for each async function and one when we do new Promise().

Yeah, I'm aware. It's just a PoC, if we had benchmarks we could start optimizing that.

apapirovski · 2017-12-19T15:09:01Z

@benjamingr Anyway, linked list and array are both overkill, I think we can just store latest promise since each new one just depends on the one before it.

benjamingr · 2017-12-19T15:15:20Z

@apapirovski

The same reason we switched to using LinkedList in nextTick. Doing both shift & push on an Array is worse for performance.

This is getting a little offtopic - so feel free to open an issue about it. A huge performance gain in bluebird is by using a double ended queue rather than an array or a linked list see this file.

apapirovski · 2017-12-19T16:41:40Z

@mcollina This version is a lot simpler and no extra Promise required: 02b6336

We could likely simplify the conditionals even further. In the for..await scenario it only does an extra if check, so it should be almost equally as fast.

This is getting a little offtopic - so feel free to open an issue about it. A huge performance gain in bluebird is by using a double ended queue rather than an array or a linked list see this file.

Technically singly linked list (head-tail linked list) is the most common implementation of an unknown-length queue, and we don't need a double end queue (since we only remove from head and add to tail). Bluebird can get away with doing some things that we can't since they specify capacity — the trade-off is that it has to have a resize operation. I'm not 100% certain that what's implemented there is the fastest solution possible. Anyway, this is purely academic at this point since we're not using either. :)

mcollina · 2017-12-19T16:58:10Z

@apapirovski I like your suggestion and I've included it.

I will need to test and check what happens if the stream is destroyed in the meanwhile, but if it's not working is an easy fix.

benjamingr · 2017-12-19T17:11:28Z

I'm +1 on the suggestion and its inclusion. @apapirovski I've moved our linked list discussion to mail to keep the thread clean.

benjamingr · 2017-12-19T17:15:11Z

lib/internal/streams/async_iterator.js

+    // destroy(err, cb) is a private API
+    // we can guarantee we have that here, because we control the
+    // Readable class this is attached to
+    return new Promise((resolve, reject) => {


I'm not sure that's faster than what you had before this change (with promisify) :D

I'm sorry if I was confusing in the comment.

I actually think that this is faster because we would have to call promisify on every destroy method, as some instances override it. We will have to benchmark.

mcollina · 2018-01-11T09:26:21Z

Rebased, PTAL.

CI: https://ci.nodejs.org/job/node-test-pull-request/12499/

targos · 2018-01-11T09:30:10Z

lib/_stream_readable.js

@@ -922,6 +924,12 @@ Readable.prototype.wrap = function(stream) {
  return this;
 };

+Readable.prototype[Symbol.asyncIterator] = function() {
+  emitExperimentalWarning('Readable[Symbol.AsyncIterator]');


asyncIterator

targos · 2018-01-11T09:42:58Z

test/parallel/test-stream-readable-async-iterators.js

+
+    let err;
+    try {
+      /*eslint no-unused-vars: 0*/


i think this disables the rule for the rest of the file. Instead you can use // eslint-disable-next-line no-unused-vars

Adds support for Symbol.asyncIterator into the Readable class. The stream is destroyed when the loop terminates with break or throw. Fixes: nodejs#15709

mcollina · 2018-01-11T10:19:23Z

CI: https://ci.nodejs.org/job/node-test-pull-request/12500/

(last before landing)

mcollina · 2018-01-11T12:29:59Z

Landed as 61b4d60.

Adds support for Symbol.asyncIterator into the Readable class. The stream is destroyed when the loop terminates with break or throw. Fixes: #15709 PR-URL: #17755 Fixes: #15709 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Anatoli Papirovski <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Vse Mozhet Byt <[email protected]> Reviewed-By: Michaël Zasso <[email protected]>

This is required because we need to add the babel-eslint dependency and it has to be able to resolve "eslint". babel-eslint is required to support future ES features such as async iterators and import.meta. Refs: #17755 PR-URL: #17820 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>

vsemozhetbyt · 2018-08-23T16:00:13Z

Will it remain experimental in v11?

mcollina · 2018-08-23T16:26:41Z

@vsemozhetbyt I hope we could get it out of experimental before v10 goes to LTS.

WHATWG streams is implementing the same thing atm, and I would like the semantics and APIs to match so that code that uses this would work identically.

cc @benjamingr @devsnek

benjamingr · 2018-08-23T16:37:35Z

@mcollina I've also been talking to @jakearchibald about it

benjamingr · 2018-08-23T16:38:05Z

In short: I'm +1 on unflagging and using asnyc iterators as an interop mechanism between whatwg and node streams.

betomoretti · 2018-11-24T20:33:13Z

Hi all! I'm aware that this is an experimental feature but, Is there a way to enable it? I couldn't find how to do it neither in docs or node --help output. I'm using node version 10.7.0.

Thanks!

rauschma · 2018-11-24T21:31:28Z

@betomoretti I’ve blogged about it – no flag needed: http://2ality.com/2018/04/async-iter-nodejs.html

vsemozhetbyt · 2018-11-24T23:57:51Z

@rauschma And we have readline async iterator in master now: #23916

nodejs-github-bot added build Issues and PRs related to build files or the CI. stream Issues and PRs related to the stream subsystem. labels Dec 19, 2017

mcollina requested a review from jasnell December 19, 2017 12:36

mcollina requested a review from benjamingr December 19, 2017 12:37

benjamingr requested changes Dec 19, 2017

View reviewed changes

apapirovski reviewed Dec 19, 2017

View reviewed changes

mcollina force-pushed the asynciterators branch from da33625 to 37aaf36 Compare December 19, 2017 14:33

mcollina force-pushed the asynciterators branch from 37aaf36 to 65eef32 Compare December 19, 2017 14:54

mcollina force-pushed the asynciterators branch from 65eef32 to 9acefcd Compare December 19, 2017 16:57

benjamingr reviewed Dec 19, 2017

View reviewed changes

targos approved these changes Jan 11, 2018

View reviewed changes

stream: added experimental support for for-await

de549cd

Adds support for Symbol.asyncIterator into the Readable class. The stream is destroyed when the loop terminates with break or throw. Fixes: nodejs#15709

mcollina force-pushed the asynciterators branch from acdfb3a to de549cd Compare January 11, 2018 10:17

mcollina closed this Jan 11, 2018

mcollina deleted the asynciterators branch January 11, 2018 12:30

vsemozhetbyt mentioned this pull request Jan 28, 2018

NodeJS Stream Meeting Notes - Async Iterators tc39/proposal-async-iteration#74

Closed

vsemozhetbyt mentioned this pull request Feb 6, 2018

readline: add support for asynchronous iteration (tracking feature request) #18603

Closed

prog1dev mentioned this pull request Feb 21, 2018

readline: add support for async iteration #18904

Closed

4 tasks

benjamingr mentioned this pull request Mar 8, 2018

Idea: built-in support for promises mobxjs/mobx#1370

Closed

BridgeAR mentioned this pull request Oct 9, 2018

Summit Topic: Promises openjs-foundation/summit#119

Closed

alanshaw mentioned this pull request Oct 25, 2018

Awesome Endeavour: Async Iterators ipfs/js-ipfs#1670

Closed

rektide mentioned this pull request Mar 6, 2019

Advance Readable[Symbol.asyncIterator] out of experimental #26479

Closed

Gasol mentioned this pull request Jun 26, 2019

Use for-await sindresorhus/get-stdin#25

Closed

stream: added experimental support for for-await #17755

stream: added experimental support for for-await #17755

Conversation

mcollina commented Dec 19, 2017 • edited Loading

Checklist

Affected core subsystem(s)

mcollina commented Dec 19, 2017

mcollina commented Dec 19, 2017

targos commented Dec 19, 2017

benjamingr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjamingr commented Dec 19, 2017

apapirovski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apapirovski commented Dec 19, 2017

mcollina commented Dec 19, 2017 • edited Loading

apapirovski commented Dec 19, 2017 • edited Loading

benjamingr commented Dec 19, 2017

benjamingr commented Dec 19, 2017 • edited Loading

apapirovski commented Dec 19, 2017

mcollina commented Dec 19, 2017

benjamingr commented Dec 19, 2017

mcollina commented Dec 19, 2017

benjamingr commented Dec 19, 2017

apapirovski commented Dec 19, 2017 • edited Loading

apapirovski commented Dec 19, 2017

benjamingr commented Dec 19, 2017

apapirovski commented Dec 19, 2017 • edited Loading

mcollina commented Dec 19, 2017

benjamingr commented Dec 19, 2017

Choose a reason for hiding this comment

mcollina Dec 20, 2017 • edited Loading

Choose a reason for hiding this comment

mcollina commented Jan 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcollina commented Jan 11, 2018

mcollina commented Jan 11, 2018

vsemozhetbyt commented Aug 23, 2018

mcollina commented Aug 23, 2018

benjamingr commented Aug 23, 2018

benjamingr commented Aug 23, 2018

betomoretti commented Nov 24, 2018

rauschma commented Nov 24, 2018

vsemozhetbyt commented Nov 24, 2018

mcollina commented Dec 19, 2017 •

edited

Loading

mcollina commented Dec 19, 2017 •

edited

Loading

apapirovski commented Dec 19, 2017 •

edited

Loading

benjamingr commented Dec 19, 2017 •

edited

Loading

apapirovski commented Dec 19, 2017 •

edited

Loading

apapirovski commented Dec 19, 2017 •

edited

Loading

mcollina Dec 20, 2017 •

edited

Loading