-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CompressionLevel.Optimal for Brotli #72266
Conversation
The intent of Optimal is to be a balanced trade-off between compression ratio and speed, but whereas DeflateStream, GZipStream, and ZLibStream all treat Optimal as such (using zlib's default setting for such a balanced tradeoff), Brotli treats Optimal the same as maximum compression, which is very slow. Especially now that maximum compression is expressible as CompressionLevel.SmallestSize, it's even more valuable for Optimal to represent that balanced tradeoff. Based on a variety of sources around the net and some local testing, I've changed the Optimal value from 11 to 4. This is also more important now that we've fixed the argument validation bug that allowed arbitrary numerical values to be passed through unvalidated (DeflateStream, GZipStream, and ZLibStream all properly validate).
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsThe intent of /// <summary>
/// The compression operation should balance compression speed and output size.
/// </summary>
Optimal = 0, but whereas Fixes #46595 For reference, here are local measurements of time and size to compress each of our test files in runtime-assets using this PR. Prior to this PR, the Optimal values matched the SmallestSize values. This is based on writing in 1K chunks.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks for sharing the detailed test results.
@stephentoub Im not sure if you're able to do this, but I'd suggest renaming "Optimal" to actually be named "Balanced". I think that is more self describing and avoids confusion with "SmallestSize". |
We're not. |
Curious that fastest compression is often faster than no compression. Maybe less time writing the file? |
Don't put a lot of stock in the values that are so close. This is only one iteration per level. The important thing is the big gaps from SmallestSize. |
The run-time for SmallestSize is absolutely ridiculous for Brotli and should not be mapped to an option a lot of users will choose. As someone who has spent a fair amount of time with compression algorithms from an academic perspective, it is not unusual to see algorithm creators support a wide range of settings (memory usage, window size, chain length, etc.) that goes beyond what is reasonable in 99% of use cases. To make it easier for users, these settings are (usually) mapped on a scale from 0 to 10, where 11 is a nod to Turn to 11, which a lot of implementers of the algorithms do not seem to understand should never be used outside theoretical/synthetic use cases. I reported this back in 2018; thankfully, this merge fixes the default from 11 to 4, which is a lot saner. However, the issue of users being unable to set the quality setting themselves remains. |
There are scenarios, of course, where you are willing to burn almost any CPU to minimize the size. For example, I am guessing that the Visual Studio setup team dial up the LZMA setting to the max, because the savings will be multiplied by millions of downloads. On a web product I did this for PNG's eg. That doesn't directly address your suggestion, but "11" might save bytes, it's not unreasonable in some cases. |
You are correct, but how far are you willing to take it? You can completely bypass the quality parameter and tweak the underlying settings to get an even smaller result. The underlying algorithms are often derivatives of the shortest common supersequence problem, which is an O(N^2) class of algorithms (often solved in O(N^k), though). That means you should see exponential growth in run time as you increase the quality factor. I'm not sure how Stephen got those large numbers (seconds) for the test corpus, but it follows similar patterns I've seen before. I've bolded the numbers. UncompressedTestFiles\alice29.txt: It shows that quality = 11 is completely out of bounds on the exponential growth factor. Edit: I'm unsure how the numbers are derived, so I'm going to test myself in a few minutes. |
I did a quick test with a file of 10 MB.
Or as a graph: Edit: This is with a window size of 22, which is the default. Edit2: This is with .NET 7 preview 6 using BrotliEncoder directly |
Note that I'm not nessecarily saying Quality = 11 should be completely inaccessible. It is definitely "SmallestSize", but at an extreme cost to such an extent that it cannot be used by 99% of users. MS decided to abstract away the granularity of compression settings to Optimal, SmallestSize etc. which is nice for newbies, but setting SmallestSize to the equivalent of "Extreme compression settings. Use with caution." will probably cause a lot of headache. For comparison, LZMA2 on level 9 (the highest - called Ultra in 7-Zip Manager) compressed the 10MB file in 0.49s on my computer. |
|
Perf Improvements dotnet/perf-autofiling-issues#6965 🙂 |
Seems to be regression on windows/arm64: #73391 and dotnet/perf-autofiling-issues#6971 |
Windows/arm64 improvements: dotnet/perf-autofiling-issues#6950, dotnet/perf-autofiling-issues#6980 |
The test is faulty: |
It is a change in algorithmic defaults. Binding the CompressionLevel enum to a different quality level. The benchmark should use quality settings directly so a change in defaults wouldn't seem like a perf improvement. |
Reacting to dotnet/runtime#72266, which changed CompressionLevel.Optimal to no longer mean "smallest size", but instead a balance between compression speed and output size. In Blazor WASM publishing, we really want smallest size - it is preferred to spend more time during publish in order for less bytes to be downloaded and cached in the browser. The fix is to change the default compression level to SmallestSize in the brotli tool used by WASM publish.
Reacting to dotnet/runtime#72266, which changed CompressionLevel.Optimal to no longer mean "smallest size", but instead a balance between compression speed and output size. In Blazor WASM publishing, we really want smallest size - it is preferred to spend more time during publish in order for less bytes to be downloaded and cached in the browser. The fix is to change the default compression level to SmallestSize in the brotli tool used by WASM publish.
@stephentoub - do you think this change warrants a breaking change doc being written? |
I don't personally think so; the impact is some code will get a lot faster and the resulting compressed data may be a bit larger. If we updated to a new version of Brotli and it itself incurred a change that resulted in a bit larger output, we wouldn't call that a breaking change. |
We always have folks tell us we broke them when we make a change like this. It's better to let them know it was intentional. It's also a good place to tell them what steps they can take to get back to the old behavior - if that's what they want. |
Then we should also have one for updating zlib, unless you're sure that all the changes will result in exactly the same output bytes for any input. |
I think such a notice would be helpful and would help folks understand what our guarantees are. In those cases the changes should be more subtle and we can imagine that folks wouldn't want / need to be able to restore the old behavior - but I am sure they would appreciate the notice that a change is expected. In this case we can imagine that someone might want to restore the old behavior (as was the case in the SDK scenario) so it's more important to have the docs in place to help them do that. |
I don't follow. Due to this change, when upgrading an app, an important codepath may well get 30-40x slower (viz dotnet/sdk#27659). For most customers it won't be immediately obvious what happened or how to fix it. A simple breaking change note about it costs us little but would set people in the right direction. A zlib update seems rather different, as far as I know we expect it to just work, so I don't see the connection. |
Huh? This PR (Fix CompressionLevel.Optimal for Brotli) made that task in SDK 30x-40x faster (209ms instead of 7600ms), not slower. It did so by not employing as much compression, such that the resulting binary size was then 3.28 MB instead of 2.64 MB. The PR you link to that incurs a 40x slowdown is explicitly specifying CompressionLevel.SmallestSize in order to revert back to the behavior prior to this change, in order to get back that 0.6MB in size in exchange for increased 7s of processing time. |
The connection is there are five years worth of improvements in that zlib upgrade. I have no idea if any of those changes might impact exactly what (valid) bytes are output as part of compression. Maybe the compressed output between zlib 1.12.11 and 1.12.12 is always 100% identical, or maybe there are cases where they both produce valid compressed outputs that aren't byte-for-byte identical. If this Brotli PR is a breaking change, it'd be because we consider byte-for-byte changes from version to version a breaking change worthy of documenting. This PR "just works", too. If the goal of adding a breaking change notice is just that it's a place where we can call attention to interesting differences in versions, then we need to come up with a different term other than "breaking". As we all know, literally every bug fix we make, every performance improvement we make, can break someone somewhere somehow, so we have a bar for what constitutes a break, and from my perspective, this PR doesn't rise to the level of that bar. I have zero concerns with creating a notice somewhere about "if you see your BrotliStream working faster but creating larger files and that's an issue for you, here's how to get back to the old slowness and smallness", I just don't believe it's a "break" by our definition (unless we consider byte-for-byte compressed output compatibility a thing). |
Ha, yes, brainstorm. My point was -- it was a change that made the output significantly larger, and a fair number of users may need to react because after the upgrade that no longer meets their requirements. I think that's materially more than an output that isn't "byte for byte identical". The breaking change note would be simple. We did this, the result you will see is this, if that's not what you want make this explicit change. |
It's not. The goal is to call out changes that we believe have a good chance of breaking some existing code (meaning, it goes out of spec) usually where there is a clear recommendation we can make to adapt to the change. This meets those criteria and a zlib update does not. |
My concern isn't around the complexity of the note. It's that documenting this kind of change as a "break" dilutes the meaning of "breaking change" and the value of a breaking change list. Plain and simple. |
Gotcha. Maybe we can make that clear by putting it at the end separately, with a note that it's not a breaking change, but may be a visible change that upgraders may want to accommodate. |
@PriyaPurkayastha do you have written criteria for the breaking change list? |
No, we don't specific criteria that defines what should get documented as a breaking change.
To the best of my knowledge, we have the following modes for documentation:
Customers tend to look at Known Issues and Breaking Changes when they observe something that is different from what they are used to seeing when they migrate to a new .NET version and especially if they are looking for details that will help revert to previous behavior. Another factor to consider is how easily can customers diagnose and root cause a change in their output to this specific change. The reason we don't have strict criteria about what is considered breaking or not is because it is not possible to come up with an exhaustive/complete list. I generally rely upon teams to determine if there is any change that customers might not be able/want to consume based on code inspection/subject matter expertise/compat lab indicators and document those changes as breaking changes. An important point to remember is that we are not discussing what is allowed v/s not allowed - this is purely what would be helpful to surface to customers via documentation. |
That's really neat! |
This PR is marked as "merged," but it's not in .NET 7 RC1. Brotli compression under level "Optimal" is excessively slow, and decompiling |
It is in .NET 7 RC1.
That's not RC1. It's "7.0.0-preview.7.22375.6" (a "preview 7" build). RC1 is 7.0.0-rc.1.22426.10. |
All right, maybe I'm missing something. On the post announcing RC1, it said to use the latest Visual Studio Preview build if you want to use RC1 with Visual Studio. I'm on that preview. This is the .NET 7 build I have. What gives? |
You also need to install RC1. |
Well I installed that, and now Visual Studio is broken. I try to launch it and open my project, and it hangs indefinitely on "Loading project 22 of 36." |
Hmm, does that occur consistently? And if you uninstall RC1, does it go away? |
Yes, it occurs consistently. I was able to work around it by removing a specific project out of my SLN, but there's no clear reason I can see as to why that project should have given it any trouble in the first place. Now, everything continues to be broken, for a different reason. How is it even possible that this is the state of an RC? This is what I'd expect out of an Alpha release. |
The intent of
CompressionLevel.Optimal
is to be a balanced tradeoff between compression ratio and speed:but whereas
DeflateStream
,GZipStream
, andZLibStream
all treatOptimal
as such (using zlib's default setting for such a balanced tradeoff),BrotliStream
treatsOptimal
the same as maximum compression, which is very slow. Especially now that maximum compression is expressible asCompressionLevel.SmallestSize
, it's even more valuable forOptimal
to represent that balanced tradeoff. Based on a variety of sources around the net and some local testing, I've changed theOptimal
value from 11 to 4; I've also changed the default used when no level is set to beOptimal
(e.g.new BrotliStream(..., CompressionMode.Compress)
. This is also more important now that we've fixed the argument validation bug that allowed arbitrary numerical values to be passed through unvalidated (DeflateStream
,GZipStream
, andZLibStream
all properly validated already).Fixes #46595
Fixes #64185
Fixes #72220
For reference, here are local measurements of time and size to compress each of our test files in runtime-assets using this PR. Prior to this PR, the Optimal values matched the SmallestSize values. This is based on writing in 1K chunks.