Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use pipeline over stream.pipe #9819

Merged
merged 4 commits into from
Jul 20, 2022

Conversation

jmcdo29
Copy link
Member

@jmcdo29 jmcdo29 commented Jun 21, 2022

PR Checklist

Please check if your PR fulfills the following requirements:

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Other... Please describe:

What is the current behavior?

If a request is cancelled during a StreamableFile response, then the stream is not properly closed which can lead to memory leaks and possible server failures if enough of them occur

Issue Number: #9759

What is the new behavior?

pipeline ends up destroying streams used if there is an error in one of
the streams. Due to this, there's no chance of a memory leak from errored out
streams. There's also now an addition of adding an error handler to the
StreamableFile so that stream errors by default return a 400 and can be
customized to return an error message however a developer would like. These
only effect the express adapter because of how Fastify already internally
handles streams.

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

fix: #9759

@coveralls
Copy link

coveralls commented Jun 22, 2022

Pull Request Test Coverage Report for Build 85be597d-e53e-43e6-a6dc-a62eb5acd350

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-94.09%) to 0.0%

Totals Coverage Status
Change from base Build 989d7a11-e284-4a37-8a57-6d51089adf11: -94.09%
Covered Lines: 0
Relevant Lines: 0

💛 - Coveralls

@kamilmysliwiec
Copy link
Member

Tests are failing atm

image

@jmcdo29
Copy link
Member Author

jmcdo29 commented Jun 22, 2022

Tests are failing atm

image

I noticed, but not sure why. They all pass locally, so long as npm run build is ran first. Is that what the integration flow should be? Right now that's not what we're doing in the integration step in CircleCI

image

@kamilmysliwiec
Copy link
Member

They all pass locally, so long as npm run build is ran first. Is that what the integration flow should be?

npm run build is called as part of the "Prepare tests" step

`pipeline` ends up destroying streams used if there is an error in one of
the streams. Due to this, there's no chance of a memory leak from errored out
streams. There's also now an addition of adding an error handler to the
`StreamableFile` so that stream errors by default return a 400 and can be
customized to return an error message however a developer would like. These
only effect the express adapter because of how Fastify already internally
handles streams.

fix: nestjs#9759
@jmcdo29 jmcdo29 force-pushed the fix/pipeline-over-pipe branch from c2b2cfd to 18c27cd Compare June 23, 2022 15:50
@jmcdo29
Copy link
Member Author

jmcdo29 commented Jun 23, 2022

They all pass locally, so long as npm run build is ran first. Is that what the integration flow should be?

npm run build is called as part of the "Prepare tests" step

Sure enough, there was something going on with my local build that was running fine that wasn't in CI. Got it fixed up, should be good once CI resolves now

@alexd6631
Copy link

Hello @jmcdo29,

I have some doubt about the way error handling is implemented in this PR.

I think it will work fine if the error happens at the beginning of the stream, for instance if a file is not present such as in your test suite. But an error could also happen anytime mid-stream (imagine your streamable file streams data from a S3 service, but the S3 service crashes or become unreachable). In that case you will already have sent 200 OK headers and some content to the client and it would be too late to send a 400. (Also IMHO 400 is client bad request error, and seems inappropriate as a default error).

To my knowledge, HTTP doesn't have a mechanism for signaling an error during a response and after headers have been sent and some content stream. So I think the "least worst" approach would be to close the client socket from the server, so the client is notified something is wrong with the streaming (by receiving an IO error) and can retry later.

If we just end the response in case of error, the client will not be able to differentiate between a normal stream completion, and an error which will lead the client to have a truncated content without noticing it.

An alternative for the client would be to compare the Content-Length with the actual streamed content, however it is not always possible to have a Content-Length (we may not known the length of the steam, and it could be generated dynamically).

Let me know if you have a better way to handle this. I think adding a test case with a stream failing in deferred way would be a good way to experiment possible solutions

Best regards,
Alex

@alexd6631
Copy link

@jmcdo29

As an afterthought of my last comment, I think in case of error in the source (here the StreamableFile), pipeline will destroy the destination stream (here the express Response). So in this case the client socket should be destroyed (because the express response is an http.OutgoingMessage).

So in all cases (error at start or mid-stream) the socket will be closed. (Note this may not optimal for the point of view of keep alive in case of start error, because in that case we should be able to emit error in headers, and end the response properly, without killing the underlying socket. A rough idea to implement that behavior would be to not use pipeline, but pipe and some manually event handling)

So in the error handler, maybe it should be enough to check headerSent before sending the 400, to diffrentiate the start or mid-stream error.

@jmcdo29
Copy link
Member Author

jmcdo29 commented Jun 24, 2022

pipeline will destroy the destination stream (here the express Response)

That's actually why in the body.getStream() there is an added .on('error', (err: Error) => { body.errorHandler(err, response); }), so that if an error happens when the stream is created an error can still be sent before the response stream is closed.

A rough idea to implement that behavior would be to not use pipeline

Even though the original issue of #9759 is to use pipeline instead of pipe?

but pipe and some manually event handling

Got an idea of what this would look like?

@alexd6631
Copy link

alexd6631 commented Jun 24, 2022

As inspiration of the article :
https://www.alxolr.com/articles/understanding-memory-leaks-in-node-js-part-1

pipeline is roughly equivalent to :

source.pipe(destination);

destination.on('close', () => {
   source.destroy();
});

destination.on('error', () => {
   source.destroy();
});

source.on('error', () => {
  destination.destroy();
});

source.on('end', () => {
  destination.destroy();
});

So we could do something like that (pseudo code)

source.pipe(destination);

destination.on('close', () => {
   source.destroy();
});

destination.on('error', (e) => {
   source.destroy(e);
});

source.on('error', (e) => {
  if (!destination.headerSent) { //Handle the early error
    destination.send(400);
    destination.end();
  } else { // Mid stream error, close the socket abruptly to notify the client of the error
    destination.destroy(e);
  }
});

source.on('end', () => {
  // destination.destroy()
  // contrary to the article It should not be necessary to destroy the destination stream in case of a successful stream completion, so I think this handler is not useful
});

EDIT: I am not sure about the original implementation in the article, and I did some minor modification based on a quick experimentation with local streams, so ofc take it as a way to draft the idea and not working code ^^.

@kamilmysliwiec
Copy link
Member

Is there anything left we need to implement here @jmcdo29? 🙌

@micalevisk micalevisk mentioned this pull request Jul 17, 2022
15 tasks
@jmcdo29
Copy link
Member Author

jmcdo29 commented Jul 17, 2022

Let me make the changes @micalevisk suggested and then this should be good to merge. If we need to modify it again later to take into account other features that's always an option

@Tony133
Copy link
Contributor

Tony133 commented Jul 18, 2022

I think it refers to the changes made in the last commit

Schermata 2022-07-18 alle 08 06 56

@jmcdo29 jmcdo29 force-pushed the fix/pipeline-over-pipe branch from 4f42aae to 248596b Compare July 18, 2022 06:09
@jmcdo29
Copy link
Member Author

jmcdo29 commented Jul 18, 2022

I think it refers to the changes made in the last commit

Schermata 2022-07-18 alle 08 06 56

Just saw this, just pushed a fix, give it a minute

@kamilmysliwiec kamilmysliwiec merged commit 429dfa1 into nestjs:master Jul 20, 2022
@kamilmysliwiec
Copy link
Member

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StreamableFile pipe can leak resources if response is closed/errored prematurely
7 participants