Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Putting a chunk back in the readable stream queue #275

Closed
wants to merge 1 commit into from

Conversation

domenic
Copy link
Member

@domenic domenic commented Feb 3, 2015

See #3, and especially the real-world use case in the MSE spec, step 9.

This doesn't contain any spec updates or examples yet, just prototype in the reference implementation---including some fairly-exhaustive tests. Suggestions for more tests welcome!

Thoughts:

  • putBack was a name thrown around in Is Node's "unshift" necessary? #3. It's better than Node's "unshift", certainly. The Wikipedia double-ended queue article is entirely unhelpful (and we're using read() instead of dequeue() anyway). One idea that I came up with and am starting to like is rs.unRead(chunk). (Or should it be unread, no capital-R?) I might go with that if nobody objects or comes up with something better.
  • It might be nice to let the underlying source provide some sort of hook that gets called upon putBack. This could allow things like: causing any array buffers that are putBack to be transferred (see discussion in Is Node's "unshift" necessary? #3), or validating that all chunks in a stream stay the same type, or just disallowing putBack altogether for a given stream. I am hesitant to add this until someone explicitly asks for it though. Anyone want to convince me it's a really good idea, or shall we leave it for later?
  • Unlike enqueue, right now putBack does not return any backpressure-indicating value. We could easily add it, but I am not sure it should be the consumers responsibility to worry about whether they're putting back too much data? putBack is supposed to be a pretty rare, out-of-band occurrence, and not part of the normal backpressure negotation protocol.
  • How does this extend to ReadableByteStream, exactly?

@domenic domenic added this to the Week of 2015-02-02 milestone Feb 3, 2015
@@ -159,6 +159,42 @@ export function IsReadableStream(x) {
return true;
}

export function PutBackIntoReadableStream(stream, chunk) {
if (stream._state === 'closed') {
throw new TypeError('stream is closed');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about if not supporting this corner case would be problematic for any use case.

Maybe it happens for the MSE's case, too? All data is read, but we have remaining data we want to put back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I'm not sure what the right solution would be though :(. Maybe, after reading the last chunk from the queue, delay for a turn before deciding to close the stream? That gives you a one-turn grace period.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but that kind of sucks because what if you want to asynchronously make the decision to put it back or not?

Maybe we need .read({ preventClose: true })?? But then how do you close it later? .cancel() I guess??? Icky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the MSE issue should be addressed by ReadableByteStream. I.e. specifying the size argument based on maxSize?

@wanderview
Copy link
Member

Is there a reason we can't use the "peek" concept instead of "read and put back"?

@domenic
Copy link
Member Author

domenic commented Feb 6, 2015

@wanderview I was initially hoping to do that but the MSE case, and any like it, don't work. (MSE wants to put back a portion of the chunk.) Maybe @tyoshino is right and ReadableByteStream works better for that particular case... I'm worried about cases like it though.

@wanderview
Copy link
Member

@domenic Can't it "peek", see how much data it would actually consume, and then "read" just that many bytes? As far as I can tell its not changing the order of any bytes here. I guess this is more conducive to the ReadableByteStream concept.

Alternatively, we could just force corner cases likes to be implemented as a wrapper. It doesn't put back the data on the underlying source stream, but locally buffers the remaining the data. It then consults this local buffer first on its next pass through its algorithm.

@wanderview
Copy link
Member

Or for the chunk-oriented ReadableStream, offer "peek" and the MSE consumer then keeps state about "skip this many bytes in the next chunk" to simulate the put-back.

@tyoshino
Copy link
Member

tyoshino commented Feb 9, 2015

For RBSs that are not ArrayBuffer-queue-backed, we could add an attribute, say .amountReadable, on ReadableByteStream using which we can probe how much data can be read out synchronously. (If the underlying byte source is a socket, we can't determine that without read(2)-ing, so I'd like to make it able to have it return something indicating "I don't know".) This allows us to allocate sufficient but not too big ArrayBuffer to pass to .readInto().

@wanderview If the ReadableByteStream is queue-backed (and ArrayBuffers are stored in the queue) like ReadableStream, then we want to look at the ArrayBuffers queued as-is than reading into the ArrayBuffer we brought with. Maybe this is what motivated you to add peek.

The benefits of .peek() are the following:

  1. reduce buffer/state management by relying on the Streams library
  2. [For ArrayBuffer-queuebacked RBSs] read data directly from the ArrayBuffers in the queue but without telling the Stream to pull the number of bytes in the read data

(1) can be realized even by .readInto() and .amountReadable. (2) is not.

For non-ArrayBuffer-queue-backed RBSs, .readInto() + .amountReadable would be :

  1. more efficient in terms of the number of copies if the destination API wants to receive a single contiguous ArrayBuffer (not fragments represented as a list of ArrayBufferViews)
  2. more efficient in terms of the number of ArrayBuffer allocations

In my old W3C Streams API spec, I allowed users to control how much bytes to pull precisely than replenishing quota for all the bytes in the output of read(). Do you think we should just add .peek() or more precise quota replenishment control would be useful considering complexity?

I guess addition of .peek() doesn't complicate the library much. I'm ok with that.

@tyoshino
Copy link
Member

tyoshino commented Feb 9, 2015

One more point to consider about .peek() is that if the underlying byte source doesn't support precise flow-control, we cannot peek data from the source without draining data. E.g. we need to read(2) from a socket to build an ArrayBuffer for .peek(), and the read(2) would replenish the TCP window.

@wanderview
Copy link
Member

I'm not insisting on peek(). I just think it might be cleaner than trying to implement an unshift operation (particularly given the stream closed state).

I think I agree with your statement that a way to read a precise number of bytes (readInto+amountReadable) is sufficient for ReadableByteStream. Most byte-stream algorithms will work just fine with this since its how bsd socket streams, etc, work today.

I just dislike the unshift for ReadableStream. My reasoning goes something like this: (Sorry for stream of consciousness list... still early for me...)

  1. "normal" stream mutation is putting things on the front and taking them off the back.
  2. For ReadableStream it seems the "chunks" are effectively opaque types to the stream.
  3. Reading just a few bytes from a ReadableStream chunk and then putting the remainder back is effectively mutating the last chunk in the ReadableStream. To me this is "not normal" mutation for the stream. The script could put back anything in that last chunk; even a completely different type.
  4. Using peek() avoids allowing this non-normal mutation. If the client code wants to do sub-chunk splitting, it can do that itself.
  5. Lets not contort the ReadableStream API for an algorithm that's really byte-based, not chunk-based.

So what about removing our version of unshift and adding peek() just for the ReadableStream? ReadableByteStream would not have either of these methods.

Of course we could also survey npm packages to see how often their stream unshift is used. If its particularly popular, maybe we keep it in with the caveat it can fail if the stream closes on the last read.

@domenic
Copy link
Member Author

domenic commented Feb 9, 2015

Yeah. Given that unshift() doesn't even work for the MSE case I think it's best to drop it.

I'm not even sure we need peek(). We don't really have a use case yet---everything is better handled by ReadableByteStream.

Of course we could also survey npm packages to see how often their stream unshift is used. If its particularly popular, maybe we keep it in with the caveat it can fail if the stream closes on the last read.

On the other hand, this is a good point. And I still haven't even checked how Node's unshift interacts with Node streams closing. So I should do a quick survey of this kind of thing.

@tyoshino
Copy link
Member

Created a PR #279 to add readableAmount getter

@domenic
Copy link
Member Author

domenic commented Feb 24, 2015

For the record Node gives you one turn after calling close() in which you can resurrect the stream with an unshift.

Closing since this is probably a dead-end.

@domenic domenic closed this Feb 24, 2015
@domenic domenic deleted the putBack branch June 29, 2015 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants