Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Provide an Stream abstraction centered on best-practices #79261

Closed
jozkee opened this issue Dec 6, 2022 · 16 comments
Closed
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@jozkee
Copy link
Member

jozkee commented Dec 6, 2022

Background and motivation

Stream is a class that has lived to see Tasks and Spans and has been extended to align with them by adding virtual members/overloads. This has the consequence that Stream has grown more and more complex over time and has become a pit of failure for new implementations, to name a few cases:

  • virtual implementations of ReadAsync and WriteAsync do async over sync.
  • Read(Span) and Write(Span) use ArrayPool to get a buffer that is passed to the non-Span overload and then the bytes are copied to the buffer argument.

Given that is easy to fall into these traps, specially if you are not familiar with Streams, I think that we could provide an abstraction that covers the pit-of-failures and enforces implementers to use the methods that align with current standards.

This proposal tries to provide a path to fulfill the checkbox "Making it easier to create custom streams" in #58216 by providing an abstraction that is centered in implementing the Stream members according to the most-performant standards and removing the burden of being forced to implement members that can be considered stalled.

API Proposal

New Stream abstraction with Template Method Pattern.

To ensure the latest and shiniest Read[Async]/Write[Async] APIs are used, we need to seal the old methods. This can be achieved with the template method pattern.
On top of that, we can extend the pattern to other methods that require boilerplate validations. This way a class deriving from NewStream only needs to worry for implementing the Core methods and work with validated arguments.

namespace System.IO;

public abstract class NewStream : Stream // Name TBD
{
    public sealed override int Read(byte[] buffer, int offset, int count) { throw null; }
    public sealed override int Read(Span<byte> buffer) { throw null; }
    public sealed override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }
    public sealed override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { throw null; }
    public sealed override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback callback, object state) { throw null; }
    public sealed override int EndRead(IAsyncResult asyncResult) { throw null; }
    protected abstract int ReadCore(Span<byte> buffer);
    protected abstract ValueTask<int> ReadCoreAsync(Memory<byte> buffer, CancellationToken cancellationToken);

    public sealed override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, AsyncCallback callback, object state) { throw null; }
    public sealed override void EndWrite(IAsyncResult asyncResult) { throw null; }
    public sealed override void Write(byte[] buffer, int offset, int count) { throw null; }
    public sealed override void Write(ReadOnlySpan<byte> buffer) { throw null; }
    public sealed override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }
    public sealed override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { throw null; }
    protected abstract void WriteCore(ReadOnlySpan<byte> buffer);
    protected abstract ValueTask WriteCoreAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken);

    public override long Seek(long offset, SeekOrigin origin) { throw null; }
    protected abstract long SeekCore(long offset, SeekOrigin origin);
    public override void SetLength(long value) { throw null; }
    protected abstract void SetLengthCore(long value);
    protected override void Dispose(bool disposing) { throw null; }
}

Alternative Designs

NewStream abstraction that promotes the latest and shiniest Read[Async]/Write[Async] methods to become abstract and seal the old ones.

This is another way to achieve the same goal of enforcing the use of the best-performant methods but with minimal changes and using abstract override to "promote" existing virtual methods.

This approach is more convenient for future-proof as any new Stream API that would replace one of the current best-performant ones could be enabled by adding another class on top. See versioning proof-of-concept here.

public abstract class NewStream : Stream // Name TBD
{
    // seal the methods that DO NOT align with best practices, we will provide an impl. that aligns with efficiency according to the newest paradigms in Stream.
    public sealed override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback? callback, object? state) { throw null; }
    public sealed override int EndRead(IAsyncResult asyncResult) { throw null; }
    public sealed override int Read(byte[] buffer, int offset, int count) { throw null; }
    public sealed override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }

    public sealed override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, AsyncCallback? callback, object? state) { throw null; }
    public sealed override void EndWrite(IAsyncResult asyncResult) { throw null; }
    public sealed override void Write(byte[] buffer, int offset, int count) { throw null; }
    public sealed override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }

    // promote to abstract the methods that DO align with best practices.
    public abstract override int Read(Span<byte> buffer);
    public abstract override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default);
    public abstract override void Write(ReadOnlySpan<byte> buffer);
    public abstract override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default);
}

Risks

Future-proof is a risk, as much as we plan today, we may not be prepared for the next big thing, similar to how Stream was planned without Task or Span in mind. As proposed above, the alternative design can allow for versioning and friendly patching of the class with new abstractions on top.

@jozkee jozkee added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 6, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 6, 2022
@ghost
Copy link

ghost commented Dec 6, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Stream is a class that has lived to see Tasks and Spans and has been extended to align with them by adding virtual members/overloads. This has the consequence that Stream has grown more and more complex over time and has become a pit of failure for new implementations to name a few:

  • virtual implementations of ReadAsync and WriteAsync do async over sync.
  • Read(Span) and Write(Span) use ArrayPool to get a buffer that is passed to the non-Span overload and then the bytes are copied to the buffer argument.

Given that is easy to fall into these trpas, specially if you are not familiar with Streams, I think that we could provide an abstraction that covers the pit-of-failures and enforces implementers to use the methods that align with current standards.

This proposal tries to provide a path to fulfill the checkbox "Making it easier to create custom streams" in #58216 by providing an abstraction that is centered in implementing the Stream members according to the most-performant standards and removing the burden of being forced to implement members that can be considered stalled.

API Proposal

New Stream abstraction with Template Method Pattern.

To ensure the latest and shiniest Read[Async]/Write[Async] APIs are used, we need to seal the old methods. This can be achieved with the template method pattern.
On top of that, we can extend the pattern to other methods that require boilerplate validations. This way a class deriving from NewStream only needs to worry for implementing the Core methods and work with validated arguments.

namespace System.IO;

public abstract class NewStream : Stream // Name TBD
{
    public sealed override int Read(byte[] buffer, int offset, int count) { throw null; }
    public sealed override int Read(Span<byte> buffer) { throw null; }
    public sealed override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }
    public sealed override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { throw null; }
    public sealed override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback callback, object state) { throw null; }
    public sealed override int EndRead(IAsyncResult asyncResult) { throw null; }
    protected abstract int ReadCore(Span<byte> buffer);
    protected abstract ValueTask<int> ReadCoreAsync(Memory<byte> buffer, CancellationToken cancellationToken);

    public sealed override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, AsyncCallback callback, object state) { throw null; }
    public sealed override void EndWrite(IAsyncResult asyncResult) { throw null; }
    public sealed override void Write(byte[] buffer, int offset, int count) { throw null; }
    public sealed override void Write(ReadOnlySpan<byte> buffer) { throw null; }
    public sealed override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }
    public sealed override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { throw null; }
    protected abstract void WriteCore(ReadOnlySpan<byte> buffer);
    protected abstract ValueTask WriteCoreAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken);

    public override long Seek(long offset, SeekOrigin origin) { throw null; }
    protected abstract long SeekCore(long offset, SeekOrigin origin);
    public override void SetLength(long value) { throw null; }
    protected abstract void SetLengthCore(long value);
    protected override void Dispose(bool disposing) { throw null; }
}

API Usage

~

Alternative Designs

NewStream abstraction that promotes the latest and shiniest Read[Async]/Write[Async] methods to become abstract and seal the old ones.

This is another way to achieve the same goal of enforcing the use of the best-performant methods but with minimal changes and using abstract override to "promote" existing virtual methods.

This approach is more convenient for future-proof as any new Stream API that would replace one of the current best-performant ones could be enabled by adding another class on top. See versioning proof-of-concept here.

public abstract class NewStream : Stream // Name TBD
{
    // seal the methods that DO NOT align with best practices, we will provide an impl. that aligns with efficiency according to the newest paradigms in Stream.
    public sealed override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback? callback, object? state) { throw null; }
    public sealed override int EndRead(IAsyncResult asyncResult) { throw null; }
    public sealed override int Read(byte[] buffer, int offset, int count) { throw null; }
    public sealed override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }

    public sealed override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, AsyncCallback? callback, object? state) { throw null; }
    public sealed override void EndWrite(IAsyncResult asyncResult) { throw null; }
    public sealed override void Write(byte[] buffer, int offset, int count) { throw null; }
    public sealed override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { throw null; }

    // promote to abstract the methods that DO align with best practices.
    public abstract override int Read(Span<byte> buffer);
    public abstract override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default);
    public abstract override void Write(ReadOnlySpan<byte> buffer);
    public abstract override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default);
}

Risks

Future-proof is a risk, as much as we plan today, we may not be prepared for the next big thing, similar to how Stream was planned without Task or Span in mind. As proposed above, the alternative design can allow for versioning and friendly patching of the class with new abstractions on top.

Author: Jozkee
Assignees: -
Labels:

api-suggestion, area-System.IO, untriaged

Milestone: -

@adamsitnik
Copy link
Member

Why do we need both when they accept same arguments and return same result?

protected abstract int ReadCore(Span<byte> buffer);
public sealed override int Read(Span<byte> buffer) { throw null; }

@madelson
Copy link
Contributor

madelson commented Dec 6, 2022

@adamsitnik I think it is because Read(Span) is virtual rather than abstract in the base Stream. If someone tries to use the new base class and everything just compiles, that will be confusing since it won't be that obvious what to override.

@Joe4evr
Copy link
Contributor

Joe4evr commented Dec 6, 2022

This is actually a lot more reasonable than I expected from the title.

  • Am I correct in assuming that NewStream is just a strawman name? It better be.
  • Could we also expect an analyzer around the same release that notifies developers this would be the more recommended type to derive for the cases that are deriving Steam directly today? It would likely help with adoption.

@bartonjs
Copy link
Member

bartonjs commented Dec 6, 2022

Why do we need both (Read and ReadCore) when they accept same arguments and return same result?

It's from the Template Method Pattern. Read will throw a normalized exception if CanRead is false, or Dispose() was already invoked, and then defer to ReadCore, which now gets to ignore the boilerplate state validation and can just write the logic of doing the actual read.

@jozkee
Copy link
Member Author

jozkee commented Dec 6, 2022

@Joe4evr yes to both of your questions.

@teo-tsirpanis
Copy link
Contributor

Have you considered using source generators to fill this boilerplate?

@jozkee
Copy link
Member Author

jozkee commented Dec 6, 2022

How does a source generator would make it better? As reference, this is how I envision NewStream if we follow the main proposal rather than the alternative one: https://gist.github.com/Jozkee/27aa6ab757e8d87dc3ef27d7be69d9d3.

@teo-tsirpanis
Copy link
Contributor

Here's how I imagine the source generator:

[GenerateStreamBoilerplate]
public partial MyStream
{
    // (Read|Write)Core(Async)? would be provided by the user.

    // The Stream overrides would be provided by the source generator.
}

Here are some observations of mine about a subclass-based approach:

  • It would make an internal implementation detail of the stream part of its public API contract (the descendance from NewStream).
    • Existing classes would be unable to migrate (to reduce boilerplate) without breaking changes.
    • It would allow things that don't make sense like passing around values of type NewStream.
  • I think that the subclass' name will be a point of contention, because it would not communicate a usage characteristic but an implementation detail. SpanStream? Users would think that it is backed by a span. I wouldn't mind calling it Stream2 but I guess many would disagree. :trollface:
  • A source generator would be more flexible; it could support specifying custom visibilities to the members, or overriding only the methods you want (for example SslStream has a custom implementation of Read(byte[])).
  • Other classes like Encoding, Encoder, Decoder, TextReader and TextWriter have the same problem of overload boilerplate. We could make subclasses for them as well but does not seem very nice to me.

@stephentoub
Copy link
Member

Here's how I imagine the source generator:

I had on my todo list the action item of writing up almost this exact same idea :)

@jozkee
Copy link
Member Author

jozkee commented Dec 6, 2022

@teo-tsirpanis so, if I follow correctly, you are saying that we can add a GenerateBoilerplateAttribute, or GenerateStreamBoilerplateAttribute, if we don't want to do it for other types, that signals that not implemented members (even abstract ones) will be generated in compilation with boilerplate validation?

How does this solve the issue of users implementing say Read(byte[], int, int) instead of Read(Span<byte>)? In other words, how does this enforce that users choose the "best overload"?
I guess we could rely on analyzers for that.

Do you see any other drawbacks on this approach? I'm still not very familiar with source generators.

@teo-tsirpanis
Copy link
Contributor

As I imagine it, the generator would add a protected partial int ReadCore(Span<byte> buffer); method, forcing the user to implement it (no need for analyzers), and on top of it, if they already override Read(byte[]) for example, the generator would just not emit boilerplate for it.

@jozkee
Copy link
Member Author

jozkee commented Dec 6, 2022

I thought your attribute was meant to be used in any Stream subclass, not only the ones inheriting from the NewStream Subclass.

@teo-tsirpanis
Copy link
Contributor

Yes, there won't be a subclass. In source generators we usually create partial methods to have the generator implement it, but here it will be the opposite; the generator will create a partial method to have us implement it.

Implementing the (synchronous) span methods will be a required minimum; it is the new way forward for those who want to use the boilerplate generator.

@madelson
Copy link
Contributor

madelson commented Dec 7, 2022

@teo-tsirpanis @jozkee I think the source generator design seems pretty compelling. One of the challenges today with implementing Stream is that there are "bundles" of methods that must be implemented together to achieve certain functionality. Specifically:

  • Streams that support async IO should implement ReadAsync/WriteAsync/DisposeAsync/FlushAsync
  • Streams that support reading should implement CanRead/Read/ReadAsync/ReadTimeout
  • Streams that support writing should implement CanWrite/Write/WriteAsync/WriteTimeout
  • Streams that support timeouts should implement CanTimeout/ReadTimeout/WriteTimeout
  • Streams that support seeking should implement CanSeek/Position/Length/SetLength/Seek

An attribute-based approach could help us with this in a way that is much cleaner than a base class, for example:

public sealed class GenerateStreamBoilerplateAttribute
{
    // If true, generates ReadAsyncCore, WriteAsyncCore, DisposeAsyncCore, FlushAsyncCore as required methods to implement.
    // If false, async methods are implemented by calling the sync equivalents and then returning a completed task
    public bool Async { get; set; } = false;

    // If true, ReadCore/ReadAsyncCore/ReadTimeoutCore may be generated
    // If false, read methods implemented as throwing NotSupportedException
    public bool CanRead { get; set; } = true;

    // If true, WriteCore/WriteAsyncCore/WriteTimeoutCore may be generated
    // If false, write methods are implemented as throwing NotSupportedException
    public bool CanWrite { get; set; } = true;
    
    // If true, generates ReadTimeoutCore, WriteTimeoutCore as required methods to implement
    // If false, timeout properties are implemented as throwing NotSupportedException
   public bool CanTimeout { get; set; } = false;

   // If true, generates PositionCore/LengthCore as properties to implement (LengthCore has a setter). 
   // SetLength and Seek can be implemented in terms of these.
   // If false, seek methods and properties are implemented as throwing NotSupportedException
   public bool CanSeek { get; set; } = false;
}

This way, you can mix and match desired stream capabilities and have the compiler guide you towards exactly what needs to be implemented, filling in the boilerplate correctly for all other cases.

@jeffhandley
Copy link
Member

Based on the discussions here and our promising findings on the source generator experiment, I'm going to close this proposal issue. We will work to get that source generator merged into runtimelab and advertise it as an experiment that we'd like to get feedback on. We'll try it out ourselves too, to see if it can reduce the code we manually author/maintain in our own Stream derivatives.

@teo-tsirpanis teo-tsirpanis closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

8 participants