Feature Proposal: Buffered read stream with direct buffer access #66610

geoffkizer · 2022-03-14T19:41:49Z

It's very common to want to consume a Stream by reading it into a buffer and then consuming the bytes in the buffer progressively. The framework itself does this in multiple places -- e.g. SocketsHttpHandler, SslStream, DeflateStream, etc.

See the "peek buffer" item listed here: #58216

Doing this properly is not a trivial task. We should make this easier by adding something like a ReadBufferStream class. This class would manage a read buffer on top of an underlying Stream and provide direct access to the read buffer as a Memory<byte>. Since it derives from Stream itself as well, it can also be passed to other consumers of Stream, or easily be delegated to by a higher-layer implementation of Stream.

Here's an API sketch. This is derived largely from looking at how read buffering is managed in SocketsHttpHandler and SslStream currently.

public class ReadBufferStream : Stream
{
  // Note: This is a read-only Stream. CanWrite/CanSeek return false and methods like Write() etc throw NotSupportedException.

  public ReadBufferStream(Stream innerStream, int bufferCapacity = 8192);

  // Ensure the read buffer contains at least the specified number of bytes, by repeatedly reading from the underlying stream
  // Note, if the read buffer already contains sufficient bytes, then this returns immediately.
  // Returns false if EOF is hit before the specified number of bytes can be read
  // Throws InvalidOperationException if specified number of bytes is greater than the buffer capacity.
  public bool FillBuffer(int minBufferedBytes = 1);
  public ValueTask<bool> FillBufferAsync(int minBufferedBytes = 1, CancellationToken ct = default);
  
  // Returns a Memory<byte> over the currently buffered bytes (may be empty if no bytes are buffered)
  // This is only valid until the next call to FillBuffer[Async] or Consume (below)
  public Memory<byte> ReadBuffer { get; }

  // Removes the first [byteCount] bytes from the read buffer.
  // The next call to ReadBuffer will return a Memory<byte> starting from the next non-consumed byte.
  // Throws InvalidOperationException if byteCount > ReadBuffer.Length.
  public void ConsumeBytes(int byteCount);

  public int BufferCapacity { get; }

  // Grow or shrink the buffer capacity.
  // Existing buffered bytes will be copied to the new buffer.
  // Throws InvalidOperationException if the specified capacity < ReadBuffer.Length.
  public void ResizeBuffer(int newCapacity);

  // Note on Stream method implementations:
  // Read/ReadAsync should read from the existing buffered bytes first.
  // If there are no buffered bytes, then Read/ReadAsync can either read into the local read buffer or directly into the user specified buffer.
  // If the size of the user's buffer is greater than the local buffer capacity, we may as well read directly into it.
  // If it's not, then reading into the local buffer may allow more bytes to be read from the underlying stream in a single call.
}

The text was updated successfully, but these errors were encountered:

ghost · 2022-03-14T19:41:54Z

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

It's very common to want to consume a Stream by reading it into a buffer and then consuming the bytes in the buffer progressively. The framework itself does this in multiple places -- e.g. SocketsHttpHandler, SslStream, DeflateStream, etc.

See the "peek buffer" item listed here: #58216

Doing this properly is not a trivial task. We should make this easier by adding something like a ReadBufferStream class. This class would manage a read buffer on top of an underlying Stream and provide direct access to the read buffer as a Memory<byte>. Since it derives from Stream itself as well, it can also be passed to other consumers of Stream, or easily be delegated to by a higher-layer implementation of Stream.

Here's an API sketch. This is derived largely from looking at how read buffering is managed in SocketsHttpHandler and SslStream currently.

public class ReadBufferStream : Stream
{
  // Note: This is a read-only Stream. CanWrite/CanSeek return false and methods like Write() etc throw NotSupportedException.

  public ReadBufferStream(Stream innerStream, int bufferCapacity = 8192);

  // Ensure the read buffer contains at least the specified number of bytes, by repeatedly reading from the underlying stream
  // Note, if the read buffer already contains sufficient bytes, then this returns immediately.
  // Returns false if EOF is hit before the specified number of bytes can be read
  // Throws InvalidOperationException if specified number of bytes is greater than the buffer capacity.
  public bool FillBuffer(int minBufferedBytes = 1);
  public ValueTask<bool> FillBufferAsync(int minBufferedBytes = 1, CancellationToken ct = default);
  
  // Returns a Memory<byte> over the currently buffered bytes (may be empty if no bytes are buffered)
  // This is only valid until the next call to FillBuffer[Async] or Consume (below)
  public Memory<byte> ReadBuffer { get; }

  // Removes the first [byteCount] bytes from the read buffer.
  // The next call to ReadBuffer will return a Memory<byte> starting from the next non-consumed byte.
  // Throws InvalidOperationException if byteCount > ReadBuffer.Length.
  public void ConsumeBytes(int byteCount);

  public int BufferCapacity { get; }

  // Grow or shrink the buffer capacity.
  // Existing buffered bytes will be copied to the new buffer.
  // Throws InvalidOperationException if the specified capacity < ReadBuffer.Length.
  public void ResizeBuffer(int newCapacity);

  // Note on Stream method implementations:
  // Read/ReadAsync should read from the existing buffered bytes first.
  // If there are no buffered bytes, then Read/ReadAsync can either read into the local read buffer or directly into the user specified buffer.
  // If the size of the user's buffer is greater than the local buffer capacity, we may as well read directly into it.
  // If it's not, then reading into the local buffer may allow more bytes to be read from the underlying stream in a single call.
}



<table>
  <tr>
    <th align="left">Author:</th>
    <td>geoffkizer</td>
  </tr>
  <tr>
    <th align="left">Assignees:</th>
    <td>-</td>
  </tr>
  <tr>
    <th align="left">Labels:</th>
    <td>

`area-System.IO`, `untriaged`

</td>
  </tr>
  <tr>
    <th align="left">Milestone:</th>
    <td>-</td>
  </tr>
</table>
</details>

jozkee · 2022-03-16T07:50:42Z

This proposal sounds to me like BufferedStream but with the buffer being exposed, except BufferedStream's buffer works for both read and write.

Shouldn't ReadBuffer be ReadOnlyMemory<byte>?
Could be an alternative to add ReadBuffer to the existing BufferedStream?

jozkee · 2022-03-16T16:32:13Z

cc @stephentoub

geoffkizer · 2022-03-16T18:49:25Z

This proposal sounds to me like BufferedStream but with the buffer being exposed, except BufferedStream's buffer works for both read and write.

Possibly. There's some legacy with BufferedStream that makes it not obvious (to me anyway) whether it makes sense to add this to existing BufferedStream, or to create a new class without the legacy.

Shouldn't ReadBuffer be ReadOnlyMemory?

There are actually some places in existing code where the buffer is modified in-place. Two that come to mind:
(1) SslStream decrypts in place. This is in part due to the fact that SCHANNEL APIs decrypt in place.
(2) For HTTP/1.1, we support "folded headers" where a header value is continued on the next line. We do this by modifying the previous line terminator, i.e. "\r\n", to " " (two spaces) so that subsequent parsing code that doesn't deal with newlines will still work properly. This could probably be changed with a modest amount of work.

In general, it seems like there's some value in being able to modify the buffer, and I'm not sure there's much downside to it.

Alternatively, we could make this ReadOnlyMemory<byte> and then do an unsafe cast to Memory<byte> for the cases above.

dotnet-issue-labeler bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Mar 14, 2022

geoffkizer mentioned this issue Mar 14, 2022

Feature Proposal: Allow access to internal read buffer of certain Streams #66611

Open

jozkee added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 16, 2022

jozkee removed the untriaged New issue has not been triaged by the area owner label Mar 16, 2022

jozkee added this to the Future milestone Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: Buffered read stream with direct buffer access #66610

Feature Proposal: Buffered read stream with direct buffer access #66610

geoffkizer commented Mar 14, 2022

ghost commented Mar 14, 2022

jozkee commented Mar 16, 2022 •

edited

Loading

jozkee commented Mar 16, 2022

geoffkizer commented Mar 16, 2022

Feature Proposal: Buffered read stream with direct buffer access #66610

Feature Proposal: Buffered read stream with direct buffer access #66610

Comments

geoffkizer commented Mar 14, 2022

ghost commented Mar 14, 2022

jozkee commented Mar 16, 2022 • edited Loading

jozkee commented Mar 16, 2022

geoffkizer commented Mar 16, 2022

jozkee commented Mar 16, 2022 •

edited

Loading