Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: Buffered read stream with direct buffer access #66610

Open
geoffkizer opened this issue Mar 14, 2022 · 4 comments
Open

Feature Proposal: Buffered read stream with direct buffer access #66610

geoffkizer opened this issue Mar 14, 2022 · 4 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@geoffkizer
Copy link
Contributor

It's very common to want to consume a Stream by reading it into a buffer and then consuming the bytes in the buffer progressively. The framework itself does this in multiple places -- e.g. SocketsHttpHandler, SslStream, DeflateStream, etc.

See the "peek buffer" item listed here: #58216

Doing this properly is not a trivial task. We should make this easier by adding something like a ReadBufferStream class. This class would manage a read buffer on top of an underlying Stream and provide direct access to the read buffer as a Memory<byte>. Since it derives from Stream itself as well, it can also be passed to other consumers of Stream, or easily be delegated to by a higher-layer implementation of Stream.

Here's an API sketch. This is derived largely from looking at how read buffering is managed in SocketsHttpHandler and SslStream currently.

public class ReadBufferStream : Stream
{
  // Note: This is a read-only Stream. CanWrite/CanSeek return false and methods like Write() etc throw NotSupportedException.

  public ReadBufferStream(Stream innerStream, int bufferCapacity = 8192);

  // Ensure the read buffer contains at least the specified number of bytes, by repeatedly reading from the underlying stream
  // Note, if the read buffer already contains sufficient bytes, then this returns immediately.
  // Returns false if EOF is hit before the specified number of bytes can be read
  // Throws InvalidOperationException if specified number of bytes is greater than the buffer capacity.
  public bool FillBuffer(int minBufferedBytes = 1);
  public ValueTask<bool> FillBufferAsync(int minBufferedBytes = 1, CancellationToken ct = default);
  
  // Returns a Memory<byte> over the currently buffered bytes (may be empty if no bytes are buffered)
  // This is only valid until the next call to FillBuffer[Async] or Consume (below)
  public Memory<byte> ReadBuffer { get; }

  // Removes the first [byteCount] bytes from the read buffer.
  // The next call to ReadBuffer will return a Memory<byte> starting from the next non-consumed byte.
  // Throws InvalidOperationException if byteCount > ReadBuffer.Length.
  public void ConsumeBytes(int byteCount);

  public int BufferCapacity { get; }

  // Grow or shrink the buffer capacity.
  // Existing buffered bytes will be copied to the new buffer.
  // Throws InvalidOperationException if the specified capacity < ReadBuffer.Length.
  public void ResizeBuffer(int newCapacity);

  // Note on Stream method implementations:
  // Read/ReadAsync should read from the existing buffered bytes first.
  // If there are no buffered bytes, then Read/ReadAsync can either read into the local read buffer or directly into the user specified buffer.
  // If the size of the user's buffer is greater than the local buffer capacity, we may as well read directly into it.
  // If it's not, then reading into the local buffer may allow more bytes to be read from the underlying stream in a single call.
}
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Mar 14, 2022
@ghost
Copy link

ghost commented Mar 14, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

It's very common to want to consume a Stream by reading it into a buffer and then consuming the bytes in the buffer progressively. The framework itself does this in multiple places -- e.g. SocketsHttpHandler, SslStream, DeflateStream, etc.

See the "peek buffer" item listed here: #58216

Doing this properly is not a trivial task. We should make this easier by adding something like a ReadBufferStream class. This class would manage a read buffer on top of an underlying Stream and provide direct access to the read buffer as a Memory<byte>. Since it derives from Stream itself as well, it can also be passed to other consumers of Stream, or easily be delegated to by a higher-layer implementation of Stream.

Here's an API sketch. This is derived largely from looking at how read buffering is managed in SocketsHttpHandler and SslStream currently.

public class ReadBufferStream : Stream
{
  // Note: This is a read-only Stream. CanWrite/CanSeek return false and methods like Write() etc throw NotSupportedException.

  public ReadBufferStream(Stream innerStream, int bufferCapacity = 8192);

  // Ensure the read buffer contains at least the specified number of bytes, by repeatedly reading from the underlying stream
  // Note, if the read buffer already contains sufficient bytes, then this returns immediately.
  // Returns false if EOF is hit before the specified number of bytes can be read
  // Throws InvalidOperationException if specified number of bytes is greater than the buffer capacity.
  public bool FillBuffer(int minBufferedBytes = 1);
  public ValueTask<bool> FillBufferAsync(int minBufferedBytes = 1, CancellationToken ct = default);
  
  // Returns a Memory<byte> over the currently buffered bytes (may be empty if no bytes are buffered)
  // This is only valid until the next call to FillBuffer[Async] or Consume (below)
  public Memory<byte> ReadBuffer { get; }

  // Removes the first [byteCount] bytes from the read buffer.
  // The next call to ReadBuffer will return a Memory<byte> starting from the next non-consumed byte.
  // Throws InvalidOperationException if byteCount > ReadBuffer.Length.
  public void ConsumeBytes(int byteCount);

  public int BufferCapacity { get; }

  // Grow or shrink the buffer capacity.
  // Existing buffered bytes will be copied to the new buffer.
  // Throws InvalidOperationException if the specified capacity < ReadBuffer.Length.
  public void ResizeBuffer(int newCapacity);

  // Note on Stream method implementations:
  // Read/ReadAsync should read from the existing buffered bytes first.
  // If there are no buffered bytes, then Read/ReadAsync can either read into the local read buffer or directly into the user specified buffer.
  // If the size of the user's buffer is greater than the local buffer capacity, we may as well read directly into it.
  // If it's not, then reading into the local buffer may allow more bytes to be read from the underlying stream in a single call.
}



<table>
  <tr>
    <th align="left">Author:</th>
    <td>geoffkizer</td>
  </tr>
  <tr>
    <th align="left">Assignees:</th>
    <td>-</td>
  </tr>
  <tr>
    <th align="left">Labels:</th>
    <td>

`area-System.IO`, `untriaged`

</td>
  </tr>
  <tr>
    <th align="left">Milestone:</th>
    <td>-</td>
  </tr>
</table>
</details>

@jozkee jozkee added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 16, 2022
@jozkee
Copy link
Member

jozkee commented Mar 16, 2022

This proposal sounds to me like BufferedStream but with the buffer being exposed, except BufferedStream's buffer works for both read and write.

Shouldn't ReadBuffer be ReadOnlyMemory<byte>?
Could be an alternative to add ReadBuffer to the existing BufferedStream?

@jozkee
Copy link
Member

jozkee commented Mar 16, 2022

cc @stephentoub

@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Mar 16, 2022
@jozkee jozkee added this to the Future milestone Mar 16, 2022
@geoffkizer
Copy link
Contributor Author

This proposal sounds to me like BufferedStream but with the buffer being exposed, except BufferedStream's buffer works for both read and write.

Possibly. There's some legacy with BufferedStream that makes it not obvious (to me anyway) whether it makes sense to add this to existing BufferedStream, or to create a new class without the legacy.

Shouldn't ReadBuffer be ReadOnlyMemory?

There are actually some places in existing code where the buffer is modified in-place. Two that come to mind:
(1) SslStream decrypts in place. This is in part due to the fact that SCHANNEL APIs decrypt in place.
(2) For HTTP/1.1, we support "folded headers" where a header value is continued on the next line. We do this by modifying the previous line terminator, i.e. "\r\n", to " " (two spaces) so that subsequent parsing code that doesn't deal with newlines will still work properly. This could probably be changed with a modest amount of work.

In general, it seems like there's some value in being able to modify the buffer, and I'm not sure there's much downside to it.

Alternatively, we could make this ReadOnlyMemory<byte> and then do an unsafe cast to Memory<byte> for the cases above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

2 participants