Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream pump #13506

Closed
wants to merge 5 commits into from
Closed

Stream pump #13506

wants to merge 5 commits into from

Conversation

calvinmetcalf
Copy link
Contributor

Docs not finished.

As discussed at the streams working group face to face in Berlin we wanted to bring @mafintosh's pump module into core. The reason being that pipe is fundamentally broken since it doesn't forward errors or clean up so the advice given is always not use it and instead use the pump module. The consensus at the meeting was that if a user land module is mandatory to use streams, it should probably be part of streams.

see nodejs/readable-stream#283, mafintosh/pump#17

cc @nodejs/streams

@calvinmetcalf calvinmetcalf added the stream Issues and PRs related to the stream subsystem. label Jun 6, 2017
@calvinmetcalf calvinmetcalf requested a review from mcollina June 6, 2017 19:18
@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. stream Issues and PRs related to the stream subsystem. labels Jun 6, 2017
@@ -0,0 +1,191 @@
// The MIT License (MIT)
//
// Copyright (c) 2014 Mathias Buus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that this file keeps getting vendored in? So I guess review comments on the code here make limited sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a semi-related note, I wish we could collect any/all vendored code under a single directory to better indicate that PRs/issues/etc. should not be opened in nodejs/node for those files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop the copyright, as stated by @mafintosh in mafintosh/pump#17 (comment). Are you still of that idea, @mafintosh?

'use strict';

const stream = require('stream');
require('../common');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be the very first require()

const assert = require('assert');
const crypto = require('crypto');
const pump = stream.pump;
// tiny node-tap lookalike.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m really not keen on introducing more of these in the tests, the ones that use “tiny node-tap lookalike” are usually far from readable…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually 👎 on using the "tiny" node-tap lookalike. They are very hard to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because I copied some other test that had that in it

@mscdex mscdex removed the build Issues and PRs related to build files or the CI. label Jun 6, 2017
#### Class Method: stream.pump(...streams[, callback])

* two or more streams to pipe between
* optional callback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters should be formatted according to the style used in the rest of the documentation.

Also, it would be a good idea to either add the function signature here in some way or at the very least describe in the description below what parameters the callback should have.

@sam-github
Copy link
Contributor

I always use pump, agree its better than pipe.

Isn't the consensus among stream experts that no one should never use require('stream'), and instead use require('readable-stream')?

I'm sure the alternative (documenting in the streams API that its docs are present to describe how streams work in the node builtins, but that user code should not require this module directly and instead use the npm-installable alternatives - with a list of them, perhaps bringing the best into github.com/nodejs/...) was considered, maybe describe in this PR why its better to just vendor in pump?

Will we recommend using npm installable pump (for the benefits of semver, etc.)?

@mikeal
Copy link
Contributor

mikeal commented Jun 6, 2017

If I'm reading this correctly, this adds another API to Stream, .pump(), which does essentially what .pipe() does but "better."

IMO, this is not a suitable solution for Core.

If this is the "right way" to do this, then it should replace .pipe(). We should build a consensus that this is the correct API and migrate as a breaking change (error forwarding certainly qualifies as a breaking change).

I don't think it's a good practice for Core to include two APIs that do essentially the same thing. If there is a consensus that this is better then lets move to it. If there is not a consensus then, as annoying as it might be, we should stick with the current behavior.

If memory serves, we always wanted to figure out how to forward errors properly but ended up cutting attempts to do so in order to ship major releases on time that included large breaking upgrades to streams. It sounds like the kinks in this were worked out by this module in the ecosystem and if it doesn't break the ecosystem I think it would be a huge benefit to make an improvement to the Core pipe() API.

@yoshuawuyts
Copy link

yoshuawuyts commented Jun 6, 2017 via email

@mikeal
Copy link
Contributor

mikeal commented Jun 6, 2017

@yoshuawuyts how breaking would the change be? Is there something beyond the error forwarding that I'm not seeing?

@yoshuawuyts
Copy link

yoshuawuyts commented Jun 6, 2017 via email

@mikeal
Copy link
Contributor

mikeal commented Jun 6, 2017

@yoshuawuyts That seems like an actual bug in the current implementation.

I'm not arguing that a change this substantial would not break a few people, but have we actually quantified how much it would break? We have tools to help us figure this out (CITGM) and we can even stage the change in the next major's nightlies and introduce it in a version not slated for LTS: the upcoming v7. We're actually at an optimal time to try this. If the break is too large we can back it out.

If we run CITGM and a bunch of packages break we can drop this entire line of argument, but before we add a redundant API out of this particular fear we should try our best to quantify the effect it would have. We're no longer completely in the dark about how much ecosystem impact changes have and we don't need to make decisions without any data the way we have traditionally had to.

@refack
Copy link
Contributor

refack commented Jun 7, 2017

@refack
Copy link
Contributor

refack commented Jun 7, 2017

Since the CI is free I'm running CITGM on refack/node@2993ff2
https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/860/

Well npm fails so we won't get the data that easily 😞
https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/860/nodes=debian8-64/console

@mikeal
Copy link
Contributor

mikeal commented Jun 7, 2017

@refack that error is because pump() doesn't return the destination stream like pipe() does. You'll need to account for that in your glue code.

@mafintosh
Copy link
Member

@refack @mikeal pump returns the last stream (like pipe) but the code in that commit is wrong. you wanna do

Stream.prototype.pipe = function (dest) {
  return pump(this, dest)
}

@mikeal
Copy link
Contributor

mikeal commented Jun 7, 2017

@mafintosh ah, that makes sense, when I saw that it supported multiple streams I assumed the feature was just cut ;)

@refack
Copy link
Contributor

refack commented Jun 7, 2017

@mafintosh
Copy link
Member

@refack @mikeal my gist is also missing handling the pipe options (like {end: false}). that should probably fix a few of the tests as well

@mikeal
Copy link
Contributor

mikeal commented Jun 7, 2017

The fails on linuxone all appear to be because of the same bug and is probably fixable. I'm seeing similar failures in the smoker. Any thoughts @mafintosh

function _addListener(target, type, listener, prepend) {
                     ^

RangeError: Maximum call stack size exceeded
    at _addListener (events.js:232:22)
    at Extract.addListener (events.js:298:10)
    at Extract.Abstract.on (/data/iojs/build/workspace/citgm-smoker/nodes/rhel72-s390x/smoker/lib/node_modules/citgm/node_modules/fstream/lib/abstract.js:18:25)
    at eos (_stream_pump.js:106:10)
    at destroyer (_stream_pump.js:137:3)
    at streams.map (_stream_pump.js:176:12)
    at Array.map (native)
    at Function.pump (_stream_pump.js:173:28)
    at Extract.Stream.pipe (stream.js:37:17)
    at Extract.Reader.pipe (/data/iojs/build/workspace/citgm-smoker/nodes/rhel72-s390x/smoker/lib/node_modules/citgm/node_modules/fstream/lib/reader.js:235:32)

@mafintosh
Copy link
Member

@mikeal doh i'm being silly. pump calls pipe internally haha. we'll have to inline the implementation for an actual test

@refack
Copy link
Contributor

refack commented Jun 7, 2017

Yep, started stack-overflowing 🤷‍♂️

_stream_pump.js:40
function eos(stream, opts, _callback) {
            ^

RangeError: Maximum call stack size exceeded
    at eos (_stream_pump.js:40:13)
    at destroyer (_stream_pump.js:137:3)
    at streams.map (_stream_pump.js:176:12)
    at Array.map (native)
    at Function.pump (_stream_pump.js:173:28)
    at Extract.Stream.pipe (stream.js:37:17)
    at Extract.Reader.pipe (/home/iojs/build/workspace/citgm-smoker/nodes/ppcbe-ubuntu1404/smoker/lib/node_modules/citgm/node_modules/fstream/lib/reader.js:235:32)
    at pipe (_stream_pump.js:161:33)
    at Array.reduce (native)
    at Function.pump (_stream_pump.js:188:18)
Build step 'Conditional steps (multiple)' marked build as failure
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?
Notifying upstream projects of job completion
Finished: FAILURE

@mafintosh
Copy link
Member

easiest way to test this would be to set Stream.prototype.__pipe__ = Stream.prototype.pipe before overwriting it here and then have pump internally call __pipe__ if it is available

@mcollina
Copy link
Member

mcollina commented Jun 7, 2017

@mikeal

If I'm reading this correctly, this adds another API to Stream, .pump(), which does essentially what .pipe() does but "better."

No. Adds a higher-level API that does pipe() across a chain of streams, correctly handling errors.
pump() is a multi-stream concept, concentrating the 'error' along the chain in a single place:

pump(streamA, streamB, streamC, function (err) {
  // err can originate in any of the streams
})

The hard part of forwarding errors in a.pipe(b).pipe(c), is that this.emit('error', err) has the meaning of "this error originated here".

IMO, this is not a suitable solution for Core.

I disagree, and I hope this is different enough for you to change your mind.

If this is the "right way" to do this, then it should replace .pipe(). We should build a consensus that this is the correct API and migrate as a breaking change (error forwarding certainly qualifies as a breaking change).

I think that a function that consider a pipeline of streams vs a method on the single stream is the way to go. It has been proven in production several times, and the API is easy to read. pump(a, b, c) does what a.pipe(b).pipe(c) does, but consider the flow at a higher level.

If there is a consensus that this is better then lets move to it. If there is not a consensus then, as annoying as it might be, we should stick with the current behavior.

Absolutely.

If memory serves, we always wanted to figure out how to forward errors properly but ended up cutting attempts to do so in order to ship major releases on time that included large breaking upgrades to streams. It sounds like the kinks in this were worked out by this module in the ecosystem and if it doesn't break the ecosystem I think it would be a huge benefit to make an improvement to the Core pipe() API.

I do not see how that has changed. Back then, we wanted to move fast. Right now, we have a massive ecosystem to move forward.

That seems like an actual bug in the current implementation.

If it is a bug or not it is not really the point here. As things are now, pipe() is a low-level construct that does not handle errors. Even if we change it now, the ecosystem is full of modules built using readable-stream (for good reasons), which will retain the previous behavior. Because of this, we would have to change the pump module to be aware of this new behavior, and encourage users to keep using it. Changing pipe() does not solve the problem.

The current best practice is to use pump(), and it is very hard to implement a non-leaking stream-based application (most of them, as http/net/fs are based on streams) without using it at some point in the stack. This change is needed to be able to explain streams without using an user-land module. This PR consolidates the current best practice without any breakage to currently working code.

I'm perfectly fine if we decide to not have this in core, but it is a nice to have and it will simplify discovery for newbies. It would also mean that all the core components for using streams successfully will be released together.

I'm not arguing that a change this substantial would not break a few people, but have we actually quantified how much it would break? We have tools to help us figure this out (CITGM) and we can even stage the change in the next major's nightlies and introduce it in a version not slated for LTS: the upcoming v7. We're actually at an optimal time to try this. If the break is too large we can back it out.

The change as you propose it is a breaking change, so we cannot release it in a non-major release of Node. Also, it would create a lot of churn in the community, as we would have to bump semver-major on readable-stream, and then on every dependants.

If we run CITGM and a bunch of packages break we can drop this entire line of argument, but before we add a redundant API out of this particular fear we should try our best to quantify the effect it would have. We're no longer completely in the dark about how much ecosystem impact changes have and we don't need to make decisions without any data the way we have traditionally had to.

I'd be very happy to test an alternative implementation that does that. Even if there are no breakages (which I truly doubt), the point still remains: .pipe() is expected to behave in a certain manner right now, and a user would have either to check what type of pipe() the stream he is using implements to handle errors. It would complicate things.
I'm sure it would break some of my modules, because there is a lot of custom error handling there.

I'm 👎 on changing the behavior of pipe(), independently of the % of breaking modules, based on all the notes above. An alternative approach would be to state in the docs: do not use .pipe, just use https://github.com/mafintosh/pump everywhere. I prefer for that same component to be in core.


function isRequest(stream) {
return stream.setHeader && isFn(stream.abort);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a test that exercises http request and fs.

I think you can inline the typeof fs === 'function' check.

throw new Error('pump requires two streams per minimum');

let error;
const destroys = streams.map((stream, i) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to close over this  here, as it might be the global Stream  object.

return typeof fn === 'function';
}

function eos(stream, opts, _callback) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might also expose this, as it is another must-have.

function isChildProcess(stream) {
return stream.stdio &&
Array.isArray(stream.stdio) && stream.stdio.length === 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a test for child processes as well.

@mcollina
Copy link
Member

mcollina commented Jun 7, 2017

Isn't the consensus among stream experts that no one should never use require('stream'), and instead use require('readable-stream')?

Yes. When/If this is released, we will pick it up in 'readable-stream' as well.

@mikeal
Copy link
Contributor

mikeal commented Jun 7, 2017

No. Adds a higher-level API that does pipe() across a chain of streams, correctly handling errors.
pump() is a multi-stream concept, concentrating the 'error' along the chain in a single place

Describing it this way makes it sound like this is something that should be kept in the ecosystem.

I think a definitive question that we should ask is: If this is merged is there any case where we would advise users to use pipe() instead of pump()?

@mikeal
Copy link
Contributor

mikeal commented Jun 7, 2017

The change as you propose it is a breaking change, so we cannot release it in a non-major release of Node. Also, it would create a lot of churn in the community, as we would have to bump semver-major on readable-stream, and then on every dependants.

My suggestion was to stick it in the next major :)

@BridgeAR
Copy link
Member

@calvinmetcalf I guess it would be good if you could get around to pushing the changes you did so far :-)

@devsnek
Copy link
Member

devsnek commented Nov 22, 2017

as long as this isn't going to land yet, can I make a request that _stream_pump be moved to internal/streams/pump

@lpinca
Copy link
Member

lpinca commented Feb 3, 2018

Big 👍 on adding pump to core but I'm 👎 on changing pipe() behavior or suggesting to always use pump. In some cases pump is not the right tool. For example, mafintosh/pump#33 (comment).

@BridgeAR
Copy link
Member

BridgeAR commented Feb 7, 2018

@calvinmetcalf @mcollina @mafintosh I am going to close this PR due to no further progress in the next few days. If someone wants to pick this up and update it before that, that would be great!

@mcollina
Copy link
Member

mcollina commented Feb 7, 2018

Can you leave this open? Thanks.

@BridgeAR
Copy link
Member

I just rebased and also addressed a few comments but not all.

@BridgeAR BridgeAR removed the stalled Issues and PRs that are stalled. label Feb 18, 2018
@targos
Copy link
Member

targos commented Mar 7, 2018

I would really like to see this landed. What needs to be done?

@targos
Copy link
Member

targos commented Mar 7, 2018

Should we update the code to match the latest version of the pump module?

@mcollina
Copy link
Member

mcollina commented Mar 7, 2018

Definitely. Most importantly, we need to port the unit tests too.
This also depends on end-of-stream, which needs to receive the same treatment.

@prog1dev
Copy link
Contributor

prog1dev commented Mar 20, 2018

@mcollina @targos @calvinmetcalf Do you need any help with this? Or this one is only for experienced node core devs?

@mcollina
Copy link
Member

@prog1dev we need help, and yes, this is probably going to be an hard one! If you want to help, feel free to contribute anyway - we need all the help we can get

@@ -150,14 +150,14 @@ function destroyer(stream, reading, writing, _callback) {
return stream.abort();
// request.destroy just do .end - .abort is what we want

if (isFn(stream.destroy)) return stream.destroy();
if (isFn(stream.destroy)) return stream.destroy(err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, this wont work on older streams, which is why it was not calling it with err before

// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was the license removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BridgeAR
Copy link
Member

BridgeAR commented Apr 5, 2018

@mafintosh as a collaborator you can now push changes to this branch to make sure this gets in a good state. Alternatively you can of course also open a new PR.

@BridgeAR
Copy link
Member

Closing in favor of #19828

@BridgeAR BridgeAR closed this Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.