[WIP] Putting a chunk back in the readable stream queue #275

domenic · 2015-02-03T01:17:55Z

See #3, and especially the real-world use case in the MSE spec, step 9.

This doesn't contain any spec updates or examples yet, just prototype in the reference implementation---including some fairly-exhaustive tests. Suggestions for more tests welcome!

Thoughts:

putBack was a name thrown around in Is Node's "unshift" necessary? #3. It's better than Node's "unshift", certainly. The Wikipedia double-ended queue article is entirely unhelpful (and we're using read() instead of dequeue() anyway). One idea that I came up with and am starting to like is rs.unRead(chunk). (Or should it be unread, no capital-R?) I might go with that if nobody objects or comes up with something better.
It might be nice to let the underlying source provide some sort of hook that gets called upon putBack. This could allow things like: causing any array buffers that are putBack to be transferred (see discussion in Is Node's "unshift" necessary? #3), or validating that all chunks in a stream stay the same type, or just disallowing putBack altogether for a given stream. I am hesitant to add this until someone explicitly asks for it though. Anyone want to convince me it's a really good idea, or shall we leave it for later?
Unlike enqueue, right now putBack does not return any backpressure-indicating value. We could easily add it, but I am not sure it should be the consumers responsibility to worry about whether they're putting back too much data? putBack is supposed to be a pretty rare, out-of-band occurrence, and not part of the normal backpressure negotation protocol.
How does this extend to ReadableByteStream, exactly?

tyoshino · 2015-02-05T14:30:56Z

reference-implementation/lib/readable-stream-abstract-ops.js

@@ -159,6 +159,42 @@ export function IsReadableStream(x) {
  return true;
 }

+export function PutBackIntoReadableStream(stream, chunk) {
+  if (stream._state === 'closed') {
+    throw new TypeError('stream is closed');


I'm a bit worried about if not supporting this corner case would be problematic for any use case.

Maybe it happens for the MSE's case, too? All data is read, but we have remaining data we want to put back.

I think you're right. I'm not sure what the right solution would be though :(. Maybe, after reading the last chunk from the queue, delay for a turn before deciding to close the stream? That gives you a one-turn grace period.

Hmm, but that kind of sucks because what if you want to asynchronously make the decision to put it back or not?

Maybe we need .read({ preventClose: true })?? But then how do you close it later? .cancel() I guess??? Icky.

It seems the MSE issue should be addressed by ReadableByteStream. I.e. specifying the size argument based on maxSize?

wanderview · 2015-02-06T14:35:34Z

Is there a reason we can't use the "peek" concept instead of "read and put back"?

domenic · 2015-02-06T15:09:35Z

@wanderview I was initially hoping to do that but the MSE case, and any like it, don't work. (MSE wants to put back a portion of the chunk.) Maybe @tyoshino is right and ReadableByteStream works better for that particular case... I'm worried about cases like it though.

wanderview · 2015-02-06T15:19:27Z

@domenic Can't it "peek", see how much data it would actually consume, and then "read" just that many bytes? As far as I can tell its not changing the order of any bytes here. I guess this is more conducive to the ReadableByteStream concept.

Alternatively, we could just force corner cases likes to be implemented as a wrapper. It doesn't put back the data on the underlying source stream, but locally buffers the remaining the data. It then consults this local buffer first on its next pass through its algorithm.

wanderview · 2015-02-06T15:21:42Z

Or for the chunk-oriented ReadableStream, offer "peek" and the MSE consumer then keeps state about "skip this many bytes in the next chunk" to simulate the put-back.

tyoshino · 2015-02-09T08:01:42Z

For RBSs that are not ArrayBuffer-queue-backed, we could add an attribute, say .amountReadable, on ReadableByteStream using which we can probe how much data can be read out synchronously. (If the underlying byte source is a socket, we can't determine that without read(2)-ing, so I'd like to make it able to have it return something indicating "I don't know".) This allows us to allocate sufficient but not too big ArrayBuffer to pass to .readInto().

@wanderview If the ReadableByteStream is queue-backed (and ArrayBuffers are stored in the queue) like ReadableStream, then we want to look at the ArrayBuffers queued as-is than reading into the ArrayBuffer we brought with. Maybe this is what motivated you to add peek.

The benefits of .peek() are the following:

reduce buffer/state management by relying on the Streams library
[For ArrayBuffer-queuebacked RBSs] read data directly from the ArrayBuffers in the queue but without telling the Stream to pull the number of bytes in the read data

(1) can be realized even by .readInto() and .amountReadable. (2) is not.

For non-ArrayBuffer-queue-backed RBSs, .readInto() + .amountReadable would be :

more efficient in terms of the number of copies if the destination API wants to receive a single contiguous ArrayBuffer (not fragments represented as a list of ArrayBufferViews)
more efficient in terms of the number of ArrayBuffer allocations

In my old W3C Streams API spec, I allowed users to control how much bytes to pull precisely than replenishing quota for all the bytes in the output of read(). Do you think we should just add .peek() or more precise quota replenishment control would be useful considering complexity?

I guess addition of .peek() doesn't complicate the library much. I'm ok with that.

tyoshino · 2015-02-09T11:41:58Z

One more point to consider about .peek() is that if the underlying byte source doesn't support precise flow-control, we cannot peek data from the source without draining data. E.g. we need to read(2) from a socket to build an ArrayBuffer for .peek(), and the read(2) would replenish the TCP window.

wanderview · 2015-02-09T15:12:40Z

I'm not insisting on peek(). I just think it might be cleaner than trying to implement an unshift operation (particularly given the stream closed state).

I think I agree with your statement that a way to read a precise number of bytes (readInto+amountReadable) is sufficient for ReadableByteStream. Most byte-stream algorithms will work just fine with this since its how bsd socket streams, etc, work today.

I just dislike the unshift for ReadableStream. My reasoning goes something like this: (Sorry for stream of consciousness list... still early for me...)

"normal" stream mutation is putting things on the front and taking them off the back.
For ReadableStream it seems the "chunks" are effectively opaque types to the stream.
Reading just a few bytes from a ReadableStream chunk and then putting the remainder back is effectively mutating the last chunk in the ReadableStream. To me this is "not normal" mutation for the stream. The script could put back anything in that last chunk; even a completely different type.
Using peek() avoids allowing this non-normal mutation. If the client code wants to do sub-chunk splitting, it can do that itself.
Lets not contort the ReadableStream API for an algorithm that's really byte-based, not chunk-based.

So what about removing our version of unshift and adding peek() just for the ReadableStream? ReadableByteStream would not have either of these methods.

Of course we could also survey npm packages to see how often their stream unshift is used. If its particularly popular, maybe we keep it in with the caveat it can fail if the stream closes on the last read.

domenic · 2015-02-09T15:33:59Z

Yeah. Given that unshift() doesn't even work for the MSE case I think it's best to drop it.

I'm not even sure we need peek(). We don't really have a use case yet---everything is better handled by ReadableByteStream.

Of course we could also survey npm packages to see how often their stream unshift is used. If its particularly popular, maybe we keep it in with the caveat it can fail if the stream closes on the last read.

On the other hand, this is a good point. And I still haven't even checked how Node's unshift interacts with Node streams closing. So I should do a quick survey of this kind of thing.

tyoshino · 2015-02-10T06:02:28Z

Created a PR #279 to add readableAmount getter

domenic · 2015-02-24T23:24:57Z

For the record Node gives you one turn after calling close() in which you can resurrect the stream with an unshift.

Closing since this is probably a dead-end.

First draft of putBack

f3e7ca0

domenic assigned tyoshino Feb 3, 2015

domenic added this to the Week of 2015-02-02 milestone Feb 3, 2015

domenic added the in progress label Feb 3, 2015

tyoshino reviewed Feb 5, 2015
View reviewed changes

domenic mentioned this pull request Feb 6, 2015

Is Node's "unshift" necessary? #3

Closed

tyoshino mentioned this pull request Feb 12, 2015

Support reading bytes into buffers allocated by user code on platforms where only async read is available #253

Closed

domenic closed this Feb 24, 2015

domenic removed the in progress label Mar 12, 2015

domenic deleted the putBack branch June 29, 2015 05:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Putting a chunk back in the readable stream queue #275

[WIP] Putting a chunk back in the readable stream queue #275

domenic commented Feb 3, 2015

tyoshino Feb 5, 2015

domenic Feb 5, 2015

domenic Feb 6, 2015

tyoshino Feb 6, 2015

wanderview commented Feb 6, 2015

domenic commented Feb 6, 2015

wanderview commented Feb 6, 2015

wanderview commented Feb 6, 2015

tyoshino commented Feb 9, 2015

tyoshino commented Feb 9, 2015

wanderview commented Feb 9, 2015

domenic commented Feb 9, 2015

tyoshino commented Feb 10, 2015

domenic commented Feb 24, 2015

[WIP] Putting a chunk back in the readable stream queue #275

[WIP] Putting a chunk back in the readable stream queue #275

Conversation

domenic commented Feb 3, 2015

tyoshino Feb 5, 2015

Choose a reason for hiding this comment

domenic Feb 5, 2015

Choose a reason for hiding this comment

domenic Feb 6, 2015

Choose a reason for hiding this comment

tyoshino Feb 6, 2015

Choose a reason for hiding this comment

wanderview commented Feb 6, 2015

domenic commented Feb 6, 2015

wanderview commented Feb 6, 2015

wanderview commented Feb 6, 2015

tyoshino commented Feb 9, 2015

tyoshino commented Feb 9, 2015

wanderview commented Feb 9, 2015

domenic commented Feb 9, 2015

tyoshino commented Feb 10, 2015

domenic commented Feb 24, 2015