Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add _flush to Writable streams #112

Closed
markstos opened this issue Feb 12, 2015 · 43 comments
Closed

Proposal: add _flush to Writable streams #112

markstos opened this issue Feb 12, 2015 · 43 comments

Comments

@markstos
Copy link

@TomFrost made a good case for this pre-fork:

nodejs/node-v0.x-archive#7631

There was also significant discussion previously in this issue:
nodejs/node-v0.x-archive#7348

@TomFrost also implemented a workaround that is published on npmjs.org:

https://www.npmjs.com/package/flushwritable

Perhaps it will be helpful in considering the issue.

I'm one of the other developers from the previous thread that would also find it to be useful.

@calvinmetcalf
Copy link
Contributor

This is something I've needed and had to implement my own half baked
implementation. I'd be happy to champion this (do we do that?)

On Thu, Feb 12, 2015, 11:57 AM Mark Stosberg [email protected]
wrote:

@TomFrost https://github.com/TomFrost made a good case for this
pre-fork:

nodejs/node-v0.x-archive#7631 nodejs/node-v0.x-archive#7631

There was also significant discussion previously in this issue:
nodejs/node-v0.x-archive#7348 nodejs/node-v0.x-archive#7348

@TomFrost https://github.com/TomFrost also implemented a workaround
that is published on npmjs.org:

https://www.npmjs.com/package/flushwritable

Perhaps it will be helpful in considering the issue.

I'm one of the other developers from the previous thread that would also
find it to be useful.


Reply to this email directly or view it on GitHub
#112.

@domenic
Copy link

domenic commented Feb 12, 2015

+1, got one of these in WHATWG streams already (under the name "close", although now I am contemplating whether flush is nicer).

@calvinmetcalf
Copy link
Contributor

The tricky thing is that transform flush is after writable half has
finished but before the readable one is finished. For writable ones we'd
need it to be earlier so either a breaking change or a different name like
close.

On Thu, Feb 12, 2015, 12:52 PM Domenic Denicola [email protected]
wrote:

+1, got one of these in WHATWG streams already (under the name "close",
although now I am contemplating whether flush is nicer).


Reply to this email directly or view it on GitHub
#112 (comment)
.

@chrisdickinson
Copy link
Contributor

@calvinmetcalf I'm a little confused by:

The tricky thing is that transform flush is after writable half has finished but before the readable one is finished.

My impression is that we would preserve and extend this behavior to Writables as well. Am I misunderstanding?

I am also curious: what is the order of events?

  1. emit prefinish
  2. _flush
  3. emit finish

OR:

  1. emit _flush
  2. prefinish
  3. emit finish

@vkurchatkin
Copy link

My impression is that we would preserve and extend this behavior to Writables as well. Am I misunderstanding?

there is no point in flushing writable after finish. the whole idea is to defer finish

@calvinmetcalf
Copy link
Contributor

@chrisdickinson like @vkurchatkin said we want to defer finish, adding in new function (like calling it close) that fires earlier would likely be the only backwards compatible way to do it, we could leave _flush in but depreciate it.

@vkurchatkin
Copy link

@calvinmetcalf I was thinking about _finish. I want _close to be used to close underlying resource after finish.

@sonewman
Copy link
Contributor

I am not that opinionated about this. Is there any reason why we couldn't move it from transform?

@calvinmetcalf
Copy link
Contributor

@vkurchatkin I was assuming closing resources would be a use case for this

@sonewman because the transform one happens after finish is emitted and we'd want it to delay finish being emitted.

@mafintosh
Copy link
Member

Why not call it ._end?
Similar to how .write calls ._write internally .end would call ._end before emitting finish

@calvinmetcalf
Copy link
Contributor

@mafintosh and then simply end when passing it to the simplified constructor

@mafintosh
Copy link
Member

@calvinmetcalf yea! and transforms would support both flush and end (different use cases). i know this would simplify a lot of my code.

@sonewman
Copy link
Contributor

@calvinmetcalf prefinish is emitted before finish

function prefinish(stream, state) {
  if (!state.prefinished) {
    state.prefinished = true;
    stream.emit('prefinish');
  }
}

function finishMaybe(stream, state) {
  var need = needFinish(stream, state);
  if (need) {
    if (state.pendingcb === 0) {
      prefinish(stream, state);
      state.finished = true;
      stream.emit('finish');
    } else
      prefinish(stream, state);
  }
  return need;
}

lib/_stream_writable.js#L464-L482

@sonewman
Copy link
Contributor

Also i don't think it would be right for _flush to be called after finish since it is usually where you would push some last bit of protocol data...

Having a close or ended event or implementation method, could be useful, but you wouldn't want to accept data during this time.

I do think we should start addressing these stream lifecycle things as part of a stream strategy similar to WHATWG underlyingSource or @chrisdickinson flows strategy idea.

@sonewman
Copy link
Contributor

The thing is, there is always going to be more, and more(,...and more,...) new things people are going to want, to instrument their stream in some specific way. Open up a model for the underlying sources and those things are easy for anyone to instrument to any requirement.

To be clear, I am not saying any of these ideas are bad, as they are generic and useful.

But IMO going forward we need to open up this cycle/internals for people to customise at their will, meaning the base of streams can be simple and avoid continuous scope creep.

@sonewman - gets off soap box

@calvinmetcalf
Copy link
Contributor

sorry it's that finish can be called before _flush is done if it's async. I seem to remember testing what would happen if you did delay it and it causing quite a few node transform streams to break.

@sonewman
Copy link
Contributor

@calvinmetcalf hmmm, interesting. I can see how that would be true, since _flush is called as of an event, then finish is triggered synchronously afterwards. If we moved _flush to the writable stream, we could remove the internal dependency on that event completely (although we would not be able to remove it, since it is used by http)

@mafintosh
Copy link
Member

the prefinish event isn't super useful since you can't do anything async before finish happens

@mafintosh
Copy link
Member

@sonewman @calvinmetcalf moving the flush function to writable streams is a major change since the flush function currently is called before the readable part of the stream ends (not the writable).

@calvinmetcalf
Copy link
Contributor

so _flush is part of the public api and when it is called is documented. This was why I was suggesting a new function name.

@sonewman
Copy link
Contributor

I think I am +1 on doing this, it seems like that would actually be useful.

@mafintosh prefinish is emitted in lib/_stream_writable.js#L467, forgive me if I am wrong, but I don't see why it would have any affect on the readable side of a transform stream.

@mafintosh
Copy link
Member

@sonewman my point is that the ._flush function used in transform streams is used to do stuff before the readable part of a stream ends - not the writable.

var transform = stream.Transform({
  transform: function (data, enc, cb) {
    cb(null, data)
  },
  flush: function(cb) {
    setTimeout(function () {
      transform.push('world')
      cb()
    }, 1000)
  }
})

transform.on('data', function (data) {
  console.log(data.toString())
})
transform.on('finish', function() {
  console.log('(finish)')
})
transform.on('end', function() {
  console.log('(end)')
})
transform.write('hello')
transform.end()

running the above will result in

hello
(finish)
world
(end)

@mafintosh
Copy link
Member

which is why i think having it being called something like end would be useful. this would allow you to async stuff before finish is emitted.

var stream = require('stream')

var transform = stream.Transform({
  transform: function (data, enc, cb) {
    cb(null, data)
  },
  end: function (cb) {
    setTimeout(function() {
      transform.push('before finish')
      cb()
    }, 1000)
  },
  flush: function (cb) {
    setTimeout(function () {
      transform.push('world')
      cb()
    }, 1000)
  }
})

transform.on('data', function (data) {
  console.log(data.toString())
})
transform.on('finish', function() {
  console.log('(finish)')
})
transform.on('end', function() {
  console.log('(end)')
})
transform.write('hello')
transform.end()

-->

hello
before finish
(finish)
world
(end)

@sonewman
Copy link
Contributor

@mafintosh whether something comes after _flush is probably a separate discussion.

But the point I am trying to make is that _flush when called, should allow for asynchronicity e.g.:

function finishWritable(stream, state) {
  if (state.pendingcb === 0) {
    state.finished = true;
    stream.emit('finish');
  }
}

function prefinish(stream, state) {
  if (!state.prefinished) {
    state.prefinished = true;
    stream.emit('prefinish');

    if ('function' === typeof stream._flush)
      stream._flush(() => finishWritable(stream, state));
    else
      finishWritable(stream, state);
  }
}

function finishMaybe(stream, state) {
  var need = needFinish(stream, state);
  if (need)
      prefinish(stream, state);

  return need;
}

@chrisdickinson
Copy link
Contributor

@vkurchatkin Ah, I see. Transform currently hooks onto prefinish, but flush does not block finish.

@mafintosh I'm -1 on calling it "end". Between "end" events, "end" methods, and the "finish" events, the term is a bit overloaded as it stands.

@sonewman I agree that the proposed flush should be able to "block" writables from emitting finish. This is currently incompatible with Transform#_flush, but what if we took advantage of the simplified stream constructor interface to give the old-style TransformSubclass.prototype._flush = <fn> different semantics from new-style Tranfrom({flush: (cb) => {}}) creation?

@chrisdickinson
Copy link
Contributor

Arg, yes, flush as a shorthand option has already been released. Changing that behavior would be breaking, though probably only minimally so due to how recently introduced it was.

Alternatively, Transform can have an "oddball" flush that cannot block "finish", while all other writable finish functions would be able to.

@mafintosh
Copy link
Member

@chrisdickinson having simple constructor flush and .prototype._flush do different things seems weird though. couldn't flush on Transforms just block both finish and end?

is there a use case for adding ._flush on Readable streams as well?

@chrisdickinson
Copy link
Contributor

@mafintosh Re: delaying finish until flush completes, quoting @calvinmetcalf:

sorry it's that finish can be called before _flush is done if it's async. I seem to remember testing what would happen if you did delay it and it causing quite a few node transform streams to break.

If we use the simplified constructor API to introduce this behavior, we can sidestep that breakage, but still introduce the functionality for all Writables (duplexes and transforms included) It is a little weird that new Transform({flush: (cb) => {}}) will be different than t = new Transform; t._flush = () => {}, but we should strongly discourage use of the latter usage pattern for all streams going forward.


I just noticed this when rereading the thread, but one of the originally linked issues suggested this feature for the purposes of batch writes – specifically, so that a batching writer can be made aware of the end of the stream and output the rest of its data before close. However, we already have a mechanism for batching writes – writev, cork, and uncork – which should make a userland batching writestream superfluous.

Are there other valid uses for this feature outside of batch writes?

@mafintosh
Copy link
Member

I have a couple of use cases myself:

In pumpify (turns multiple writable/readable streams into a single one) I don't want to emit finish until the last stream in the pipeline has emitted finish after all writes are done. Since we don't have a flush function on writable streams I do a bunch of hacks to do this.

In content-addressable-blob-store we write to a tmp file that we want to move somewhere before emitting finish after all writes are done.

@mafintosh
Copy link
Member

@chrisdickinson @calvinmetcalf we could just break prototype._flush and bump the major version \o/

@calvinmetcalf
Copy link
Contributor

@chrisdickinson writev, cork, and uncork don't actually do the trick because they must be explicitly called by the caller so don't work cases like when the stream is being piped too.

Besides batch writes there is also closing underlying resources hence why the name _close came up. While _end is a bit confusing due to the event it d ones fit with the pattern of write, writev, and read.

@mafintosh I totally don't think it's worth a major version bump, I'd call it __flush before bumping the major.

@sonewman
Copy link
Contributor

Hmm, i don't know. I can't see how finish would be emitted asynchronous in a transform stream, but perhaps I am missing something. I guess it could break any modules, which were rely on a transform to emit that it is finished before it flushes...

@chrisdickinson
Copy link
Contributor

Noting this here: it seems like there's interest in flush for the purposes of net.Socket cleanup in core (per @indutny in the io.js IRC channel.)

@indutny
Copy link
Member

indutny commented Mar 16, 2015

I will expand my thoughts:

We have a problem with TLS streams in core at the moment. Current implementation calls socket.shutdown() on finish event, and in many cases does not wait for the completion before destroying the socket. This thing is totally fine, unless you are doing TLS, because TLS sends additional packet on shutdown() and prematurely closing the socket will lead to errors, or non-graceful destruction of connection.

_flush would help there a lot.

@indutny
Copy link
Member

indutny commented Mar 16, 2015

Yeah, and my use case needs callback.

@chrisdickinson
Copy link
Contributor

@indutny Curious: if there's a write going out during flush, would you expect flush to be called again once that write has completed?

@indutny
Copy link
Member

indutny commented Mar 16, 2015

@chrisdickinson nope. In case of TLS, this write is happening internally in C++.

@indutny
Copy link
Member

indutny commented Mar 16, 2015

Just in case, I'm going to stub out some implementation and share it with you guys.

@indutny
Copy link
Member

indutny commented Mar 16, 2015

See nodejs/node#1164

@indutny
Copy link
Member

indutny commented Mar 16, 2015

So far I haven't reunified it with transform, so it is called __flush as suggested here.

@indutny
Copy link
Member

indutny commented Mar 17, 2015

Finished the PR there, PTAL.

@mcollina
Copy link
Member

This is currently being proposed in nodejs/node#12828.

@calvinmetcalf
Copy link
Contributor

merged into node as final

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants