-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callback APIs should come first. Evented APIs should be scaffolded around them #188
Comments
-1 the only change i want to see in stream's API is for it to use https://github.com/whatwg/streams |
I feel that the current relationship between Streams and EventEmitters is inverted. So yep, no argument from me here.
min-stream, I think, fully backs up your point here.
Callback-based streams are still going to have to document the sequence of events -- or put another way: no matter what, there will be a state machine that needs documented.
Agreed. No argument here!
Determining when and where to forward errors is tricky no matter what. Basing streams on callbacks does not necessarily imply that
Agreed, though I disagree with "watchdog value." All values should be in-alphabet, the normal/ended state of the stream should be kept as a property of the stream. A read from an empty, "ended" stream should signal the "end" event to the reader. A stream may be ended but still have data buffered, waiting to be read. Sentinel "out of alphabet" values are an enticing, but brittle, solution.
Yes and no. There will always be underlying sources that push data at you -- TCP sockets, for example -- that will have to be handled.
Hopefully this isn't too laconic! I think you would be pleasantly surprised by the whatwg stream spec that @jonathanong pointed out. The intent of core, as I understand it, is to move in that direction. The WHATWG streams, having the benefit of hindsight, pull apart a lot of the concerns of Node streams quite nicely -- especially with regards to content introspection, buffering, and backpressure. That said, the path from where we are today to where we'd like to be isn't set in stone. There are a lot of packages that depend on streams. Somewhat more onerously, those streams depend on a specific To @jonathanong's point: it is unlikely that will be able to flip the switch and go directly from node streams to whatwg streams. There will almost certainly be a middle ground state that is reached -- if nothing else, because promises would have to be accepted into core before the stream spec. In any case, thanks for the well-written issue. |
@chrisdickinson Thanks a lot for the detailed reply. It feels good to engage in constructive discussions. I had looked at the whatwg specs and yes, it seems to clean up a number of things. But I still find it hairy and I think that it mixes two levels: the device level and the stream level. If we distinguish these two levels we can get to monadic paradise at the stream level. If instead we couple them (which is the case in node streams and in whatwg) this is much more difficult. Probably sounds cryptic but I'll explain below: TerminologyTo clarify things I need a bit of terminology. I will stay away from whatwg and concentrate on node and ez-streams for now, to keep things relatively simple. In node, everything is a stream. If I take a chain like:
and if I cut it in two as:
Everything is a stream: In ez-streams, the chain is slightly different:
let's cut it:
Now I can introduce a bit of terminology:
Inheritance: a device is always a stream. Some examples:
Note: is is also possible to create synthetic devices. For example, the number reader that I used to illustrate ez-streams. Monadic stream APIMy claim is that it is important to distinguish the device and stream levels, and that, if we do so, we can have a very simple, callback based, monadic API for streams. I won't give much details here, the ez-streams readme explains it all. Just an important point about this API: as I already said, it consists of reducers and non-reducers. DevicesIn this approach the role of a device is to deal with the low level device-specific issues. This is where we interact with low level events and handle backpressure. The device exposes a monadic stream API (reader or writer) and it may also expose a device-specific API. I think that it is important to decouple the device-specific API from the stream API. This way we can keep the stream API as simple as possible and monadic (KISS). The device specific interactions are handled out-of-band, outside of the stream pipeline. Back to the discussionThis being said, I can get back to the discussion points: Documentation: there is no sequence of events in the proposed stream API. There is a single (*) The reader design is lacking an Error handling: the distinction between reducers and non-reducer helps a lot here. Every chain must be terminated by a reducer, and this is where errors will be propagated. The reducer will not always be a Multiple sources and destinations are still a bit experimental in the ez-streams design and we haven't used them much in our application yet (just simple I think that having a differentiated terminology (devices, readers, writers, transforms, mappers, reducers) and a rich set of methods ( Watchdog value: this is a more minor issue. As If a reader has buffered data and receives an EOF it should deliver things in order: first the data that it has buffered and then the EOF. Whether the EOF is a Anyway, thanks again Chris for taking the time to respond. At a high level, I feel that there a two possible directions here. The first one is towards whatwg streams; it is quite device centric and more aligned on node's current design. The second is more towards lazy.js, more application-centric; it leaves the device details out of the picture and it is more centered on data processing (transforms, filter, map, reduce). |
There is a lot of interesting information in this issue. Streams with a smaller interface would be of huge benefit. The more i think about it, it would be pretty simple to create a wrapper around them in order for them to be compatible with the current streams we know. |
I like the structured categorising of streams. The only term I dislike are "devices", but then I may have missed some neckbeard classic book or something. Is there a precedence for calling producing/consuming streams, devices? |
I like this suggestion. Not sure if we could replace current api with callback-based completely, but it is worth discussing. How about a user-land module that wraps streams in the suggested way exposing callback api? |
@sonewman @rlidwka ez-streams is currently implemented with streamline.js but this is just an implementation detail. You can use it directly from pure callback-code (and promises BTW). @algesten I chose device because they are the real inputs and outputs of stream chains. This is where the chains interface with the physical world (files, networks, consoles, databases). A bit like the |
i think i go with everything in the proposal (having worked with streams for almost a year now), including the sentinel value of Frankly, what really bothers me with event emitters is that there's
Now that's one conspiracy to make debugging a pain and three good reasons for not using them. |
BTW i think
All in all i find the proposal very intriguing; right no in my streaming code i use |
@loveencounterflow it is interesting that you say about listening to every event. Other libraries have implemented this. It would be a useful feature, but it would introduce an extra step in the event emitter code, which is at present a hot piece of code no matter the program.
The problem with this is that we can't set a predefined list of events before hand (and I am not sure that we should have to) because you are always going to want to start listening to event before it happens. All of these things can be simply applied in a small module wrapping an event emitter. I don't think it is justified for this stuff to be applied at a lower level, where performance is more prevalent. The idea of changing the stream EOF has been a topic of discussion and @chrisdickinson has some good ideas about this.
I don't think we should change the node callback pattern for connivence, these are unique API ideas and would create additional overhead creating more state containing objects for every callback. As well as further complexity in use. Again this can easily be used in a wrapping module (I assume you have already implemented) to use this pattern. |
+1 for not changing the callback pattern |
Closing as there isn't much actionable here, discuss the streams points in https://github.com/iojs/readable-stream if necessary. |
Note that iojs is currently moving towards a "callbacks-only" based internal API, and EventEmitters will be option for use cases where they are more convenient. |
) (nodejs#188) Co-authored-by: Michael Lippautz <[email protected]>
) (nodejs#188) Co-authored-by: Michael Lippautz <[email protected]>
I'm posting this as a follow up to #92 #89 and #153. This issue is particularly acute around streams but the layering of events on top of callbacks is actually a more general issue in node APIs (
connect
event, for example, could/should be layered on top of aconnect(cb)
call).Counter truth 1: Stream APIs must be evented.
Wrong:
read(cb)
is sufficient for a readable stream API andwrite(data, cb)
is sufficient for a writable stream. With these APIs,pipe
can be implemented in 3 lines of code.Counter truth 2: Counter truth 1 is true because a callback API cannot handle backpressure.
Wrong again: it is all a question of encapsulation: the low level resource you are interacting with may have an evented API (
pause/resume
on the reading side,drain
on the writing side) but you can encapsulate this low-level event handling into a callback API (read(cb)
andwrite(data, cb)
).Once you have the callback APIs you don't need to worry about backpressure. It will happen naturally through the event loop. You will reason in terms of buffering rather than backpressure. Think of it: if data comes in fast you'll have to buffer it and then pause the input. Then, when someone calls your
read
, you can resume the stream to refill the buffer. Backpressure is handled inside theread
call, it does not need to be exposed to the consumers of your stream. In other terms, what comes in must come out, if nobody's ready to read all you can do is buffer and you'd rather close the tap.Truth 1: A callback API is easy to document, an evented one is not.
With a callback API, you just need to document what the method does and what its parameters are; you don't even need to say that
cb
will be called ascb(err)
if there is an error andcb(null, result)
otherwise because this is a general rule in node. You don't even need to document thatcb
will only be called once when the operation completes (successfully or not) because this too is a general rule.With an evented API you need to document all the events and their parameters but this is not all: you also need to document how the events are sequenced and what expectations the consumer of the API can make on their sequencing. If you are on the producer side (implementing a stream) you must make sure that you meet these sequencing rules. This is the part that gets really tricky and is the source of so many questions/issues.
Truth 2: It is very easy to scaffold an evented APIs around a callback API. The reverse is more difficult.
Proof:
Truth 3: Rigorous error handling is possible (even easy) with a callback API.
This is still tricky with callbacks (but possible). But this is easy with promises, generators and some of the other async solutions.
The big difference between a callback API and an evented API is that
pipe
will naturally take a callback in a callback API. The signature will bereader.pipe(writer, cb)
. The callback is called when the pipe operation completes. If the pipe failscb
will receive the error.Also, it is better to have separate
transform
andpipe
calls. Thetransform
calls do not take a callback, they just pass errors along the chain. Only thepipe
call does take a callback; it is always at the end of the chain. So the chain looks likesource.transform(t1).transform(t2).pipe(writer, cb);
No error is lost in such a chain. If something fails, the error will always end up in the
pipe
callback.Truth 4: a stream API can be content agnostic
No need to distinguish a text mode, a binary mode and an object mode at the stream level. The core API can just handle data in an agnostic way. The only thing that's needed to keep the API simple is a watchdog value for the end of stream:
undefined
is the natural candidate.Truth 5: a callback API lends itself naturally to a monadic, extensible API.
With a callback API, all it takes to implement a readable stream is a simple
read(cb)
call. All the fancier stream API can be scaffolded around this single call, in a monadic style.The monadic API will combine two flavors of calls: reducer calls (like
pipe
) that terminate a chain and take a continuation callback parameter and non-reducer calls (liketransform
) that produce another stream and can be chained. A chain that only contains non-reducers does not pull anything; it just lazily sits there. The reducer at the end of the chain triggers the pull.Wrapping-up
The tone is probably not right but I would like to shake the coconut tree. I just see all these discussions going forever around streams being complex, error handling being problematic, etc., when there is a simple way to address all this.
I know that I'm touching a sensitive point and that this may likely get closed with a laconic comment.
If streams were simple and well understood and if error handling was not a problem any more I would not post this. But this is not the case and the debates are coming back (see recent discussions). I know that it is "very late" to change things but apparently io.js is there to shake things up and maybe take some new directions. So I'm trying one last time.
I have a working implementation of all this (https://github.com/Sage/ez-streams) and we are using it extensively in our product. So this is not a fantasy. My goal is not to have core take it literally, just to consider the concept behind it.
Note: there are lots of similarities with @dominictarr's event-streams (https://github.com/dominictarr/event-stream) and with lazy.js (http://danieltao.com/lazy.js/). The main difference is that all the functions that may be grafted on the chain (transforms, filters, mappers) are async functions by default in ez-streams.
The text was updated successfully, but these errors were encountered: