-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bzip input stream simple vectorization #611
Bzip input stream simple vectorization #611
Conversation
Codecov Report
@@ Coverage Diff @@
## master #611 +/- ##
==========================================
+ Coverage 70.96% 73.27% +2.30%
==========================================
Files 68 68
Lines 13417 8718 -4699
==========================================
- Hits 9522 6388 -3134
+ Misses 3895 2330 -1565
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with the implementation, but it took a while staring at it to understand what was happening. Perhaps some comments around it?
The speed improvements are really nice and badly needed for the bzip2 algo which is painfully slow (among other problems).
I once had a (very) brief go at using the SSE intrinsics in the deflate code, but never tried just System.Numerics.Vector on it's own (those require a NetCore3+ TFM to build though). |
Sure, I'l add some.
I would also like for some of you to confirm those speed improvements. Just to be sure that my conclusions are correct. |
Right. The sole purpose of choosing |
this is what I get using your branch
|
Good, this is very similar to my results. Since I was developing it on M1, I was only able to do native tests on my daughter's laptop and also included some results from friend's PC. |
Yeah, got basically the same results:
|
Great, is there any obstacle left to have this merged soon? |
Hi guys.
Here's the simple approach to get some speed-up on BZip2 decompression. Rotation loop (
yy[j] = yy[j -1]
) is here vectorized automatically, i.e. by means ofVector<byte>
instead of e.g.Vector128<byte>
tied to a specific platform. The API is available from the .NET Core/Standard 2.1, however the real gain starts in .NET Core 3.First commits add .NET Core 3.1 as a target platform for performance tests and such a test for BZip2 decompression. The last commit is the actual vectorization. Here are some results from two Intel machines (one on Windows and one on a rather antique MacBook Air).
Without vectorization:
First machine
Second machine
With vectorization:
First machine
Second machine
Machine details.
First machine
Second machine
The speed-up on test machines on vectorized vs non-vectorized is about 35-50%. Note that it is only observable from .NET Core 3 usage onwards (I also tested on .NET 5, results are similar).
I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.