-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize pattern for exposing advanced configuration for compression streams #42820
Comments
I personally like the |
The |
This isn't necessarily true: class ZLibOptions
{
public static readonly int FlushFinish = 4;
public int FlushMode { get; set; }
}
class ZLibStream : Stream
{
public ZLibStream(Stream stream, ZLibOptions options, bool leaveOpen = false);
public ZLibOptions Options { get; set; }
}
// ...
using var stream = new ZLibStream(myStream, options);
stream.Options.FlushMode = ZLibOptions.FlushFinish;
stream.Write(...); However, couldn't the existing |
Ah yeah, no. Deflate.Flush is a separate operation from Stream.Flush. |
FWIW, I'd say that current abstraction is broken (both in terms of correspondence with e.g. actual zlib functions, of POLA, and in simplicity of use) and adding a better abstraction would be a better idea. #2236 (comment) sums up my stance on this. As such, I'd say that providing an access to more low-level API would solve all the issues raised here (and in #2236, #16923, and probably some other places too). What am I talking about precisely? Look at actual zlib. We of course can skip both the init/end as separate calls, but still: both deflate() and inflate() calls take
This is neither intuitive or useful in any way. A proper way would be to have either (
or, if for some reason we want to have them bundled into a single class (I would advocate against it, but it would be still better than current approach):
Using an option pattern for flushing would be IMVHO the best option (sic!), since we can just set flush option as read-write (as @FiniteReality stated). I would keep the other options, especially those used in e.g. https://github.com/Elskom/zlib.managed/tree/master/zlib.managed does this in the most straightforward way. tl;dr please, for the sake of humanity and future generations, use options instead of wacky constructors. |
Never mind; I just realized that I don't have a design proposal, just a vague direction to go after 😄 |
I think the best option is to overhaul System.IO.Compression and deprecate the current classes and enums inside for better designed compression types, or at least change them but to carefully do it to not break someome badly. This is why I think the best option is to deprecate them and start over on them. I will see what I can do on my end to fix a regression in my zlib that I did not have tests to catch and then propose them to replace the current zlib apis for .NET 7, I might also at the same time work on the other types in System.IO.Compression as well and work them all in a single proposal that could go into this issue. However it might take a few months to work on. |
I think a new Api could be worthwhile. Just a few minutes ago I ran into another problem with the current API design. I upgraded a WebApi project to .Net 6 and some of my endpoint response times dropped from 200ms to 3seconds. I was very confused why my response time was 15x longer than before. Turns out it was due to the design of the compression API's. I used services.AddResponseCompression(options =>
{
options.MimeTypes = CompressedMimeTypes;
options.Providers.Add(
new BrotliCompressionProvider(new BrotliCompressionProviderOptions()
{
Level = (CompressionLevel) 3
}));
}); in Net5 to get a brotli compression level of 3 which provided a good combination of compression size vs compression time. However, in Net6, the CompressionLevel enum was expanded to include SmallestSize which is now enum value 3: CompressionLevel
{
Optimal,
Fastest,
NoCompression,
SmallestSize,
} Now, my code instead of using Brotli compression level 3 was using level 11! Why was that? Because of this code: internal static int GetQualityFromCompressionLevel(CompressionLevel level) =>
level switch
{
CompressionLevel.Optimal => Quality_Default,
CompressionLevel.NoCompression => Quality_Min,
CompressionLevel.Fastest => 1,
CompressionLevel.SmallestSize => Quality_Max,
_ => (int)level,
}; The The more I think about this issue, the more I lean towards a new API for compression. So another +1 for a new options-based or some other new API where compression API's are explicit and understandable rather than the confusing situation we have today. |
Further, the choice of enum conversions in BrotliUtils.cs is strange. Optimal compression and SmallestSize both map to the highest compression, value 11. Smallest size makes sense as 11, however, Optimal should not IMHO map to 11. In my application, on a roughly 1.5MB payload, compression level 3 reduces the payload by 95% and takes 200ms. Compression level 11 reduces the payload by 97% and takes 3,000ms. Doesn't seem like it is worth taking 15 times as long for an additional 2% reduction in size. Optimal in BrotliUtils.cs should map to something else like level 3. Better still of course is a new API, but this seems like a quick 1-line change that would make Brotli easier to use by avoiding the probably never used compression level 11. internal static partial class BrotliUtils
{
public const int WindowBits_Min = 10;
public const int WindowBits_Default = 22;
public const int WindowBits_Max = 24;
public const int Quality_Min = 0;
public const int Quality_Default = 11; //In my opinion, Quality_Default should be 3 or something similar, not 11.
public const int Quality_Max = 11;
public const int MaxInputSize = int.MaxValue - 515; // 515 is the max compressed extra bytes
internal static int GetQualityFromCompressionLevel(CompressionLevel level) =>
level switch
{
CompressionLevel.Optimal => Quality_Default,
CompressionLevel.NoCompression => Quality_Min,
CompressionLevel.Fastest => 1,
CompressionLevel.SmallestSize => Quality_Max,
_ => (int)level,
};
} Also, just looked at the source for CompressionLevel: /// <summary>
/// The compression operation should balance compression speed and output size.
/// </summary>
Optimal = 0, So |
I agree I think either an entire redesign of the apis is needed or an public ctor overload accepting int's so we can bypass the range check (and the badly designed enum that is reused for all of the compression stream classes). In zlib if I wanted to use compression level 9 by casting the 9 to the enum with DeflateStream/ZlibStream (.NET 6) then it will throw an out of range exception instead of just working out of the box and using the native (and supported) compression level 9, but that is an example with 9, some people might want the 8, 7, 5, 4, 3, 2 compression levels on zlib and not 9 that is not covered by the enum. Alternatively I would say that the enum should be obsoleted and the ctors accepting them and recommending them to move to the ones accepting ints and have their own range checking as well between the minimal compression levels, and the maximum one and if not in between them throw in the ctor. Which would probably be even better to do than needing to maintain an poorly designed enum that is reused and not even enough for what most developer's needs are. We have to consider these:
I think that bullet 1 and 3 are the best options, it will expose new properties and overloads of the ctors accepting int's instead of polluting the namespace with x more enums that people then have to look up the documentations for. Which would make it a lot easier to migrate to the ctors of the streams accepting an int compression level. |
API Proposalnamespace System.IO.Compression
{
+ public enum ZlibCompressionLevel
+ {
+ DefaultCompression = -1,
+ NoCompression,
+ Level1,
+ BestSpeed = Level1,
+ Level2,
+ Level3,
+ Level4,
+ Level5,
+ Level6,
+ Level7,
+ Level8,
+ Level9,
+ BestCompression = Level9
+ }
+ public enum ZlibCompressionStrategy
+ {
+ DefaultStrategy,
+ Filtered,
+ HuffmanOnly,
+ Rle,
+ Fixed
+ }
+ public enum ZlibFlushCode
+ {
+ NoFlush,
+ PartialFlush,
+ SyncFlush,
+ FullFlush,
+ Finish,
+ Block,
+ Trees
+ }
+ public enum ZlibMemoryLevel
+ {
+ Level1 = 1,
+ Level2,
+ Level3,
+ Level4,
+ Level5,
+ Level6,
+ Level7,
+ Level8,
+ Level9
+ }
+ public class ZlibOptions
+ {
+ public int WindowBits { get { throw null; } }
+ public ZlibMemoryLevel MemoryLevel { get { throw null; } }
+ public ZlibFlushCode FlushMode { get { throw null; } set { throw null; } }
+ public ZlibCompressionLevel CompressionLevel { get { throw null; } }
+ public ZlibCompressionStrategy Strategy { get { throw null; } }
+ public CompressionMode CompressionMode { get { throw null; } }
+ public ZlibOptions() { } // Do not override this in subclasses (hopefully those compiler generated ones will be generated to call this base one).
+ public ZlibOptions(CompressionMode compressionMode) { } // override this one in subclasses.
+ public ZlibOptions(int windowBits, ZlibMemoryLevel memoryLevel, ZlibCompressionStrategy strategy, CompressionMode compressionMode) { }
+ public ZlibOptions(ZlibCompressionLevel compressionLevel) { }
+ }
+ public class DeflateOptions : ZlibOptions
+ {
+ public DeflateOptions(CompressionMode compressionMode) { }
+ }
+ public class GZipOptions : ZlibOptions
+ {
+ public GZipOptions(CompressionMode compressionMode) { }
+ }
+ public enum BrotliCompressionLevel
+ {
+ NoCompression,
+ Level1,
+ Level2,
+ Level3,
+ Level4,
+ Level5,
+ Level6,
+ Level7,
+ Level8,
+ Level9,
+ Level10,
+ Level11
+ }
+ public struct BrotliOptions
+ {
+ public readonly BrotliCompressionLevel CompressionLevel { get { throw null; } }
+ public readonly int WindowBits { get { throw null; } }
+ public readonly CompressionMode CompressionMode { get { throw null; } }
+ public BrotliOptions() { }
+ public BrotliOptions(BrotliCompressionLevel compressionLevel) { }
+ public BrotliOptions(BrotliCompressionLevel compressionLevel, int windowBits) { }
+ }
public partial struct BrotliEncoder : System.IDisposable
{
+ public BrotliEncoder(BrotliOptions options) { throw null; }
+ public static bool TryCompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten, BrotliOptions options) { throw null; }
}
public sealed partial class BrotliStream : System.IO.Stream
{
+ public BrotliStream(Stream stream, BrotliOptions options) { }
+ public BrotliStream(Stream stream, BrotliOptions options, bool leaveOpen) { }
} UsageFor usage see the proposal: #62113. RiskMinimal TestsWill have to be added to the PR that will fix #62113. Likewise due to the tests needing to be added there the implementation might just have go in that same PR. DetailsThis is due to needing the Encoder/Decoder added in that PR to then finish off this issue, to where the options classes can be used in the stream compression api classes (with a public constructor added that can accept an options instance). Why do subclasses of ZlibOptions?I feel like the subclass model to doing custom configurations of the windowbits, memory level, and strategy is the way to go (it will allow developers to quickly pick what they want with little effort). |
No, we will not be obsoleting/deprecating the Stream-based compression APIs. |
@JimBobSquarePants @AraHaan and whoever interested in these APIs, do you need the |
@buyaa-n I don’t think I would ever have a need for them and I don’t think many people would. Compression levels and strategy are commonly used features. |
"Bug" or not, it's still a breaking change because people wrote code that depends on it, that code worked as expected, and it no longer works in the updated version. That's a real problem, one that Microsoft used to understand well. |
And was documented as such. |
While that may be true, there are times where people want to make software using custom window bits so having a public api that supports providing it is a free win for them. |
And here is an image explaining why to expose window bits at least for zlib: In my |
For which DeflateStream, ZLibStream, and GZipStream exist. |
I sorta made something that can replace those streams via Version 1.2.13.4 though as I have yet to update the native zlip for it to 1.3.1. Things to note: For the upcoming .NET version the code could use a new |
That doesn't sound like a reason to add those for runtime. We can add that later, when it's really needed. I removed them from the API proposal. |
I disagree, having only stream based API's is counter productive if they are in hot path code as it would reduce performance. Span based on the other hand would increase performance of those hot paths and can make a difference in places. There was a discussion about this in a particular discord server about that. Also I have found that in a majority of cases when compressing or decompressing data people tend to not need streams and only have use for them for the existing compression streams to zlib and that POC package avoids them while allowing those people to still decompress/compress their data. |
The two of you are talking about different things. @buyaa-n is taking about Stream ctors that take the additional settings. @AraHaan is talking about non-Stream-based encoder/decoders. This issue is about Stream ctors. |
Right, sorry for the confusion, I am talking about adding |
namespace System.IO.Compression
{
public enum ZLibCompressionStrategy
{
Default = 0,
Filtered = 1,
HuffmanOnly = 2,
RunLengthEncoding = 3,
Fixed = 4,
}
public sealed class ZLibCompressionOptions
{
public int CompressionLevel { get; set; }
public ZLibCompressionStrategy CompressionStrategy { get; set; }
}
public sealed class ZLibStream : Stream
{
// CompressionMode.Compress
public ZLibStream(Stream stream, ZLibCompressionOptions compressionOptions, bool leaveOpen = false);
}
public partial class DeflateStream : Stream
{
public DeflateStream(Stream stream, ZLibCompressionOptions compressionOptions, bool leaveOpen = false);
}
public partial class GZipStream : Stream
{
public GZipStream(Stream stream, ZLibCompressionOptions compressionOptions, bool leaveOpen = false);
}
public sealed class BrotliCompressionOptions
{
public int Quality { get; set; }
}
public sealed partial class BrotliStream : System.IO.Stream
{
public BrotliStream(Stream stream, BrotliCompressionOptions compressionOptions, bool leaveOpen = false) { }
}
} |
What about dictionary size or window length |
Compresión level maybe would be better with an enum to know which options are available, same as brotli quality |
I think node js can compress brotli with multiple threads, could this be possible on c# too? |
If you need them, please open a new issue (API poposal) with your user scenario, we are open to add more options to the new |
I tried opening them and I couldn't do it If you can do it, please I would like to be able to increase dictionary length and set custom brotli mode, also deflate doesn't got any custom compression options? |
What prevented you from doing so? |
I didn't have permission or something it gives me an error |
If you could copy/paste the exact text of the error here, that would be extremely helpful. |
No permission needed for creating an issue |
Yes I tried and on PC and let me do it, sorry. |
There is a desire to expose compression-algorithm specific configuration when constructing compression streams. The same options can be used for
ZLibStream
,DeflateStream
,GZipStream
. Named those options withZlib
prefix same as the native library name they use another prefix we considered isDeflate
- the algorithm name, the prefix needed because we have existingCompressionLevel
enum that used for all compression streams includingBrotliStream
and it cannot be used for fine tuning the compression level for specific streamruntime/src/libraries/System.IO.Compression/ref/System.IO.Compression.cs
Lines 9 to 15 in d710379
API Proposal
ZlibCompressionLevel
,ZlibCompressionStrategy
are only for Compress mode.Currently
ZlibFlushMode.NoFlush
used in normalRead/Write
operations,ZlibFlushMode.SyncFlush
used for Stream.Flush andZlibFlushMode.Finish
is used on dispose. The value set by the newFlushMode
property will be used for normalRead/Write
operations only.The
MemoryLevel
and/orWindowBits
options are omitted because there is no ask for them, we could add these and other options if/when they are requested.API Usage
Alternative Designs
An alternative design would introduce an options type, maybe per each stream, like so:
We can then decide whether we want to be in the business of defining enums for the underlying types or whether we consider them passthru.
The text was updated successfully, but these errors were encountered: