Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Stream Backlog #58216

Open
1 of 14 tasks
stephentoub opened this issue Aug 26, 2021 · 17 comments
Open
1 of 14 tasks

System.IO.Stream Backlog #58216

stephentoub opened this issue Aug 26, 2021 · 17 comments
Assignees
Labels
area-System.IO Epic Groups multiple user stories. Can be grouped under a theme. Priority:3 Work that is nice to have Team:Libraries
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Aug 26, 2021

Summary

System.IO.Stream has been at the center of data processing and flow in .NET since its inception. Even with newer models being introduced related to streams, like pipelines and channels, it continues to be a key component of many important workloads. We've incrementally improved Stream over the years, e.g. with the introduction of new memory/span-based APIs in .NET Core 2.1, but we haven't revisited it to take a holistic look at what's missing for modern workloads and round out the story.

Investigations

The following investigations can be pursued to inform our designs and approach to the backlog below:

  • What helpers / wrappers / custom streams / etc. are frequently (re)implemented by projects in the ecosystem
  • What other ecosystems provide for their equivalent types
  • What known gaps we have
  • In our own libraries and tests, what custom streams and stream-related functionality have we needed to implement
  • Does F# have any Stream abstractions that provide the combinator capabilities or custom operators?

Backlog (roughly in priority order)

The first theme is providing some read-all helpers. These helpers will help address a common scenario that emerged after a .NET 6 breaking change: Partial and zero-byte reads in DeflateStream, GZipStream, and CryptoStream.

  • Read-all helpers. Stream.Read{Async} doesn’t guarantee that it reads as many bytes as we requested. This frequently leads certain kinds of consuming code to write one of two helpers: a "read at least N bytes unless EOF is encountered" helper or a "read as many bytes as I asked for unless EOF is encountered" helper, the latter of which is implementable in terms of the former. We should expose at least the former if not also the latter in some fashion, either as instance methods (maybe or maybe not virtual) on Stream, maybe as static helpers, etc. dotnet/runtime#16598

The second theme is to provide improved composability of Stream functionality to reduce the number of scenarios where a custom Stream must be implemented; this will be done through wrappers and/or combinators.

  • Making it easier to create custom streams. Deriving from Stream entails overriding at least 10 abstract members, and then optionally overriding more than 10 virtual members. We should look at ways to reduce friction here, e.g. adding a DelegatingStream (dotnet/runtime#43139) to make it easier to customize wrapping streams. In particular for testing, we might also want to look into shipping a delegate- based stream like we use in a variety of our test suites, which avoids the need to repeatedly create custom streams for this or that purpose.
  • Stream wrappers for more types. Developers have asked for the ability to create a Stream around the contents of a ReadOnlyMemory, a ReadOnlySequence dotnet/runtime#27156, a String dotnet/runtime#46663, and likely others.
  • Stream combinators. We should look at ways to make it easier to create new Streams from other Streams, e.g. AsRead/WriteOnly to expose just the reading/writing portions of streams, Concat for concatenating multiple streams into a new one for consumption, etc.

Beyond those themes, there are other new APIs that could be added for common scenarios.

  • Peeking. This is where pipelines shines, but it’s something needed here and there when working with streams as well. We should look at adding peeking support (reading N bytes without consuming them) to BufferedStream, or implementing a custom PeekStream, or other ways of achieving the capability here. And/or should we consider a mechanism whereby Streams can expose any internal buffer they may have to be examined?
  • Connected streams. Especially for testing scenarios, but extending to other producer/consumer scenarios as well, it’s commonly needed to be able to create two bidirectional Streams that are connected to each other. One can do this today by creating and connecting two named or anonymous pipes, or creating and connecting two sockets, or writing a custom stream that wraps two pipeline pipes, but none of these are simple, and they all involve more overhead than is needed. dotnet/runtime includes a ConnectedStream type used for testing purposes; we should look at cleaning this up and shipping it publicly. dotnet/runtime#43101
  • Better Pipelines integration. System.IO.Pipelines already provides Stream<->PipeReader/Writer conversions for creating a Stream from a PipeReader or PipeWriter, and vice versa, but it doesn’t make it easy to create a Stream from a duplex pipe, nor does it provide a built-in implementation of a duplex pipe.
  • Scatter/gather. We’ve had multiple requests over the years for supporting Read/WriteAsync methods that accept multiple buffers rather than just one. We know we can implement this on NetworkStream, and we just added support in .NET 6 that should enable doing so on FileStream easily as well. Other streams, like SslStream, could likely benefit at least a little by making it easier to write out a header plus payload. dotnet/runtime#25344
  • Improved support for duplex streams, whether with a custom DuplexStream base type. dotnet/runtime#51566 or new methods on Stream itself (dotnet/runtime#)
@stephentoub stephentoub added this to the 7.0.0 milestone Aug 26, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 26, 2021
@ghost
Copy link

ghost commented Aug 26, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

System.IO.Stream has been at the center of data processing and flow in .NET since its inception. Even with newer models being introduced related to streams, like pipelines and channels, it continues to be a key component of many important workloads. We've incrementally improved Stream over the years, e.g. with the introduction of new memory/span-based APIs in .NET Core 2.1, but we haven't revisited it to take a holistic look at what's missing for modern workloads and round out the story. We should do so for .NET 7.

We should start this effort by investigating:

  • What helpers / wrappers / custom streams / etc. are frequently (re)implemented by projects in the ecosystem
  • What other ecosystems provide for their equivalent types
  • What known gaps we have

We already know of some gaps / common requests to be considered:

  • Better Pipelines integration. System.IO.Pipelines already provides Stream<->PipeReader/Writer conversions for creating a Stream from a PipeReader or PipeWriter, and vice versa, but it doesn’t make it easy to create a Stream from a duplex pipe, nor does it provide a built-in implementation of a duplex pipe.
  • Connected streams. Especially for testing scenarios, but extending to other producer/consumer scenarios as well, it’s commonly needed to be able to create two bidirectional Streams that are connected to each other. One can do this today by creating and connecting two named or anonymous pipes, or creating and connecting two sockets, or writing a custom stream that wraps two pipeline pipes, but none of these are simple, and they all involve more overhead than is needed. dotnet/runtime includes a ConnectedStream type used for testing purposes; we should look at cleaning this up and shipping it publicly. dotnet/runtime#43101
  • Stream wrappers for more types. Developers have asked for the ability to create a Stream around the contents of a ReadOnlyMemory, a ReadOnlySequence dotnet/runtime#27156, a String dotnet/runtime#46663, and likely others.
  • Scatter/gather. We’ve had multiple requests over the years for supporting Read/WriteAsync methods that accept multiple buffers rather than just one. We know we can implement this on NetworkStream, and we just added support in .NET 6 that should enable doing so on FileStream easily as well. Other streams, like SslStream, could likely benefit at least a little by making it easier to write out a header plus payload. dotnet/runtime#25344
  • Improved support for duplex streams, whether with a custom DuplexStream base type (dotnet/runtime#51566 or new methods on Stream itself (dotnet/runtime#)
  • Read-all helpers. Stream.Read{Async} doesn’t guarantee that it reads as many bytes as we requested. This frequently leads certain kinds of consuming code to write one of two helpers: a “read at least N bytes unless EOF is encountered” helper or a “read as many bytes as I asked for unless EOF is encountered” helper, the latter of which is implementable in terms of the former. We should expose at least the former if not also the latter in some fashion, either as instance methods (maybe or maybe not virtual) on Stream, maybe as static helpers, etc.
  • Making it easier to create custom streams. Deriving from Stream entails overriding at least 10 abstract members, and then optionally overriding more than 10 virtual members. We should look at ways to reduce friction here, e.g. adding a DelegatingStream (dotnet/runtime#43139) to make it easier to customize wrapping streams.
  • Peeking. This is where pipelines shines, but it’s something needed here and there when working with streams as well. We should look at adding peeking support (reading N bytes without consuming them) to BufferedStream, or implementing a custom PeekStream, or other ways of achieving the capability here. And/or should we consider a mechanism whereby Streams can expose any internal buffer they may have to be examined?
  • Stream combinators. We should look at ways to make it easier to create new Streams from other Streams, e.g. AsRead/WriteOnly to expose just the reading/writing portions of streams, Concat for concatenating multiple streams into a new one for consumption, etc.
Author: stephentoub
Assignees: -
Labels:

area-System.IO

Milestone: 7.0.0

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Aug 27, 2021
@stephentoub
Copy link
Member Author

May I also ask for a Stream wrapper for an unmanaged memory block?

Something different from https://docs.microsoft.com/en-us/dotnet/api/system.io.unmanagedmemorystream?

@acaly
Copy link

acaly commented Aug 28, 2021

May I also ask for a Stream wrapper for an unmanaged memory block?

Something different from https://docs.microsoft.com/en-us/dotnet/api/system.io.unmanagedmemorystream?

Yes I realized that after I posted. Don't remember why I had to do it manually :p

@AraHaan
Copy link
Member

AraHaan commented Aug 29, 2021

@stephentoub could these changes also help simplify my stream class too located in this repository?

Currently there are a ton of properties which I am forced to override (sometimes I have to keep them the same as the base Stream class, a few functions which I then have to throw NotSupportedException, etc which I would actually like to remove eventually for the ones already inside the base abstract class.

I also got tiered of manually writing the Dispose functions as well and so I made it to where people can just source generate their Dispose functions for them (related to implementing the Dispose interface), however I think I should eventually implement where they can optionally tell it to generate a finalizer for them as well (with the Dispose() (no params) gets generated with the GC.SuppressFinalize(this) call and etc). All of the usage can be found in this file (where the stream is).

@stephentoub
Copy link
Member Author

Currently there are a ton of properties which I am forced to override

This is the "Making it easier to create custom streams" bullet.

@msedi
Copy link

msedi commented Aug 31, 2021

I am agreeing to all the things described above. What I needed many times:

  • an IStream interface: Does it make sense to have an interface or are to many components relying on Stream? What be the implications of having an interface on it?
  • ReadArray<T> and ReadSpan<T> like it is on the SafeBuffer. In its original implementation ReadArray was way to slow so I had to do my own. Thanks to Span and MemoryMarshal this is pretty is now and could be made available on the Stream. Maybe an extension would be sufficient.

@bjornen77
Copy link
Contributor

Is the plan to also include a non/less allocating stream like the following proposal:
#22428

Maybe that is covered in the "Better Pipelines integration" bullet?

@stephentoub
Copy link
Member Author

Is the plan

There's no "plan" currently, other to investigate what should be done to formulate the plan of what we want to implement in 7.

@bjornen77
Copy link
Contributor

Ok, I understand. I think it would be good to also include the issue mentioned above in the investigation. Thanks!

@wfurt
Copy link
Member

wfurt commented Nov 7, 2021

For the read helpers, I'm wondering if we can add something like

ReadAsync(Memory<Byte> buffer, CancellationToken token, int minBytes);

where

  • minBytes == 0 -> existing behavior
  • minBytes > 0 -> read at least minBytes
  • minBytes == buffer.Length -> exact read. This may be inconvenient with harry but Slice() makes it very easy IMHO to read exact chunls
  • minBytes > buffer.Length ->InvalidArgumentException

With that if we get EOF before minBytes we may throw IOExeption.
Alternatively we can just return the data we have so far and readLength < minLenght would indicate that EOF was received and subsequent Read would return 0. That allows to handle incomplete data if need - like logging as well it gives ability to not throw and handle it more gracefully or throw custom Exception.

@GSPP
Copy link

GSPP commented Feb 5, 2022

Stream combinators: There could be a RangeStream that exposes a subrange of another stream:

    var rangeStream = new RangeStream(innerStream, skip: 2, take: 10);

It would support both seekable and non-seekable inner streams. Write support could be there for seekable ones (allowing to overwrite a range).


A concat stream could concatenate streams. But it could also support a more general form of "segment". A segment could be a stream, but also a Memory<byte>. This could be user-extensible. Something like:

    public abstract class ConcatStreamSegment : IDisposable
    {
        public abstract void Dispose();

        public abstract long? Length { get; }

        public abstract int Read(Span<byte> buffer);

       //Plus additional members for async.
    }

A segment would be read-once forward-only.

Having memory-based segments would also provide a way to create a stream from a memory object. And you could mix, for example to prefix a normal stream with a memory-based header.

@AraHaan
Copy link
Member

AraHaan commented Feb 7, 2022

Honestly now that I think about the stream stuff, I think the base Stream and Stream related stuff should be replaced with span based APIs. Mainly because they are not only cleaner, easier to set up and use, and easier to maintain.

Pipelines and Channels could also replace Streams as well for cases that span based API's might not fit well for as well.

Perhaps the base Stream class could be obsoleted, but in the future be changed to an internal abstract class for the public stream classes (or rework them to be a pipeline or channel based class without renaming or changing behavior).

The reasoning for this is well, streams are great and all, but there is complexity when you need to overload a ton of members to do stuff a certain way (take a look at ZlibStream for example).

So, I get it that many use or like streams but many more like me prefer Span based (if possible), pipelines, or channels to avoid streams entirely if we can help it to save our time with possible issues in the future.

So while there is no intention of removing all of the Stream based classes from the runtime, perhaps they could be considered to be reworked into calling into Span,Channel, or Pipeline based APIs internally to simulate the "Stream" functionality.

And this goes for MemoryStream, I prefer the more performant version (System.Memory<T>) provided I do not need to worry about not knowing sizes of things that it needs to be used for.

@msedi
Copy link

msedi commented Feb 7, 2022

@AraHaan: I'm not sure how your API suggestion to this would look like. For a lot of things I have also created my own sort of Stream called SpanStream as a ref struct because I needed to keep the Span<T> as a member and ref structs do not support interfaces ( I would nevertheless love to see an IStream interface at least).

But for me there is one caveat. The stream currently needs to support long as Position and Length. The current Span<T> or Memory<T> cannot express this. You would need a Span64<T> or so. Thats for example my current problem when using memory mapped files where I could do things in a safe way, but need to go with pointers since I have more than 32bit.

@adamsitnik
Copy link
Member

Twitter survey results

Numbers in () represent the number of "likes" on Twitter at the time of writing. So (10) means that 10 people liked the idea.

Custom streams:

  • StreamWithKnownLength,SliceStream, SubStream, SubReadStream (28+8+2+2)
  • AggregatedStream, StreamOfStreams (6+1+1)
  • SplitStream: split one stream into many (2)
  • DelegatingStream: base class for wrappers (3+4)
  • UnclosableStream, KeepOpenStream (3+8+10+2+4)
  • DisposeWithStream: "Do some disposal (e.g. delete a temp file) when the stream is disposed by a process (ASP.NET) that you don't have control over" (2)
  • UnseekableStream: "Wrapper around a MemoryStream useful for unit tests when there are different code paths based on CanSeek." (3)
  • ReadOnlyStream (4)
  • TemporaryFileStream: "Opens a file with FILE_ATTRIBUTE_TEMPORARY on Windows." (2)
  • DelayedFileOpenStream: "delays FileOpen until first API use." (10)
  • CountingStream, ProgressStream: "count the length of a stream as it being read" (3+1+7+1+2+5+14)
  • BreakStream: "stops reading when encountering a given sequence of bytes" (3)
  • HashingStream: (7+1+3+2)
  • PushToPullStream: "a memory-backed stream that can be consumed while being written-to by a different thread, with throttling logic to account for a laggy consumer or producer. We use it very frequently to provide multimedia streams that can be played as they are being created" (7]

MemoryStream:

  • {ReadOnly}Memory<byte> constructors (1+1)

Extensions:

  • ReadFillBuffer, int ReadBlock, void ReadAtLeast and its variants (span, async etc), which calls Read as many times as necessary to fill the given buffer. (3+2+2)
  • And also an extension method to CopyToAsync(...) that accepts a IProgress (or IProgress in my case: https://github.com/Tyrrrz/Gress) (20+1)
  • Peeking from non-seekable stream: (12)
  • byte[] Stream.ReadToEnd() (5)
  • ValueTask.WhenAll (3)

Readers:

  • StreamReader.ReadLineAsync(Memory<char>) (4)
  • AsyncBinaryReader/Writer (3)
  • Specifying the endianness for BinaryWriter/Reader (1)

My comments

  • SubStream - a very common request, we do already have an internal implementation and plenty of libs have their own (1, 2, 3). We would need two ctors: one for streams that support seeking: ctor(Stream src, long position, long length) and one for streams that don't support seeking: ctor(Stream src, long length). Dispose should most likely do nothing on purpose.
  • AggregatedStream - common request. Should we introduce a lazy multipart stream as well? Some of our users wrote their own: (1, 2))
  • SplitStream - only one person mentioned it, I believe the goal could be achieved by using SubStream. Sth like:
long bytesRemaining = largeInputStreamSize;
while (bytesRemaining > 0)
{
    long bytesToCopy = Math.Min(MAX_SIZE, bytesRemaining);
    using Stream src = new SubStream(largeInputStream, largeInputStreamSize - bytesRemaining, bytesToCopy);
    using Stream dst = File.Create(GetPath());
    src.CopyTo(dst);
    bytesRemaining -= bytesToCopy;
}
  • DelegatingStream - common request, would make it easy to satisfy other requirements. API Proposal: DelegatingStream #43139
  • UnclosableStream, KeepOpenStream - common request, could be implemented on top of DelegatingStream. It could be a DelegatingStream feature: DelegatingStream(Stream source, bool dispose = true), potentially with an extension method `Stream AsUnclosable(this Stream src).
  • ReadOnlyStream, WriteOnlyStream - I am not sure if the request was common enough to introduce a new type, but we should be able to achieve the goal by using DelegatingStream (another ctor?). New extension methods would be nice to have: Stream AsReadOnly(this Stream stream).
  • DisposeWithStream - the goal could be achieved by using DelegatingStream
  • UnseekableStream - I do see the point (unit testing), but I am not sure if it's worth adding a new type to BCL.
  • TemporaryFileStream - it should rather be a new FileOption rather than a new type. We could implement it by using FILE_ATTRIBUTE_TEMPORARY on Windows and O_TMPFILE on Linux and maybe somehow emulate it on macOS. Proposal: Add Temporary and Sparse and to FileOptions #65155
  • DelayedFileOpenStream - I am not convinced, it can be achieved by using Lazy<FileStream> as of today.
  • ProgressStream - a very common request, I need more data to understand the problem and provide good solution. Perhaps all we need is an Stream.CopyToAsync overload that is capable of reporting the progress?
  • BreakStream - Pipelines address this problem.
  • HashingStream - we already have APIs that allow for that, we should extend the docs with file hashing examples (context).
  • Peeking - we should consider extending BufferedStream with such possibility.
  • PushToPullStream - I need more data
  • MemoryStream({ReadOnly}Memory<byte>) - I defnitely see the need for it, it would be great if we could change MemoryStream to internally use Memory<byte> instead of an array.
  • ReadFillBuffer, int ReadBlock, void ReadAtLeast - we defnitely need to implement it. Naming will be the hardest part (Provide a version of Stream.Read that reads as much as possible #16598).
  • byte[] Stream.ReadToEnd() - in my opinion it's a performance anti-pattern and we should not introduce it.
  • Specifying the endianness for BinaryWriter/Reader- this is not a new request (Proposal: Endianness Enum and Extensions for BinaryReader an BinaryWriter #26904), I don't see us doing it in 7.0.

Suggested actions for .NET 7:

  1. Implement read-all helpers (Provide a version of Stream.Read that reads as much as possible #16598)
  2. Implement DelegatingStream as PoC, try to meet more requirements (don't close the stream, expose it as read/write only), propose new API including AsReadOnly() and AsWriteOnly methods.
  3. Implement SubStream as PoC, propose new API. Do more research and figure out if we really need AggregatedStream.
  4. MemoryStream({ReadOnly}Memory<byte>) - try to see if it would be possible to extend MemoryStream with two new ctors. Research best option for Feature request: Create a Stream that reads from a ReadOnlySequence<byte> #27156 and API Proposal: StringStream (a Stream that wraps a string) #46663.
  5. Do more research about ProgressStream and figure out what problem it is trying to solve and find the best solution.

@AdamCoulterOz
Copy link

@adamsitnik - Could there be a series of new System.IO interfaces which define what needs to be implemented to meet different types of streams? i.e. IReadableStream, IWriteableStream, ISeekableStream, etc?

Also a new "Modern" inheritance base class of ModernStream : Stream which implements a lot of the boiler-plate Stream in best practice ways and does away with legacy decisions? i.e. using Memory slices, async patterns, buffer fillers and copying.
It would expose a much simpler implementation surface for the more primitive operations which are used by ModernStream without the implementor of ModernStream needing to even really understand how to write a Stream, other than implement the wrapper to some client library (for example). It would also have parameters on things like buffer size (min, mix, growth strategy), and other stream implementation arguments.

In my example I want to build an abstract CloudBlobStream, which has its own child adapter classes to call the native DotNet storage clients for Azure, Amazon or Google multi-part uploaders or downloaders. Could this CloudBlobStream layer be mostly implemented by ModernStream with just a few arguments being needed in a child's CloudBlobStream constructor?

Could we also reconcile the role of System.IO.Pipelines? Does it need to be its own thing? Or is it just another stream implementation strategy?

@alexrp
Copy link
Contributor

alexrp commented May 16, 2022

@adamsitnik Regarding ProgressStream: A CopyToAsync overload would not be sufficient in my case. I'm reading the uncompressed length of some data in a CryptoStream, which is immediately followed by the compressed blob that I then read with a ZLibStream wrapping the CryptoStream. Once I'm done with the ZLibStream, I do a sanity check that the amount of bytes I've read is equal to the uncompressed length I read earlier. A ProgressStream is pretty much a perfect fit here. I currently implement the progress tracking on my BinaryReader equivalent but that's kind of a hack.

Here's a (simplified) snippet of my code illustrating the use case:

using var aes = DataCenter.CreateCipher(options.Key, options.IV);
using var decryptor = aes.CreateDecryptor();

await using (var cryptoStream = new CryptoStream(stream, decryptor, CryptoStreamMode.Read, true))
{
    var size = await new DataCenterBinaryReader(cryptoStream).ReadUInt32Async();

    await using (var zlibStream = new ZLibStream(cryptoStream, CompressionMode.Decompress, true))
    {
        var reader = new DataCenterBinaryReader(zlibStream);

        await _header.ReadAsync(reader);
        await _keys.ReadAsync(reader);
        await _attributes.ReadAsync(reader);
        await _nodes.ReadAsync(reader);
        await _values.ReadAsync(reader);
        await _names.ReadAsync(reader);
        await _footer.ReadAsync(reader);

        if (reader.Progress != size)
            throw new InvalidDataException();
    }
}

@acaly
Copy link

acaly commented May 24, 2022

When developing some IPC program, I am using a shared-memory based stream, which is written from one side and read from another. While the data is backed by memory, it's necessary to occasionally free some memory that has been read already and will never be read again. Because I organize data in frames, this actually happens every time after the reader side finishes reading a frame.

There seems to have no standard way to do this in Stream class now. In order to achieve this, I added a new method called Truncate, which informs the stream that everything before Position will never be used again and is safe to free. Later use of freed region will cause InvalidOperationException. I imagine this might also be helpful in other scenarios, for example, to allow wrapping a non-seekable stream with a memory buffer to make it seekable.

I notice that there has been proposal to wrap non-seekable streams. If it is added, I hope something like this could be added to it, either as new methods on Stream, or as an interface method. It saves a lot of memory if the memory is used for a long time. The name "Truncate" I used is ambiguous. Maybe a better name is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO Epic Groups multiple user stories. Can be grouped under a theme. Priority:3 Work that is nice to have Team:Libraries
Projects
None yet
Development

No branches or pull requests