Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CRC32 performance #516

Merged
merged 1 commit into from
Nov 21, 2020

Conversation

nsgomez
Copy link
Contributor

@nsgomez nsgomez commented Sep 22, 2020

This change replaces the CRC32 update code with a slicing-by-16 implementation that can process 16 bytes at a time. It uses the original byte-by-byte code as a fallback, but also marks that method for aggressive inlining to get rid of overhead from function calls. This reduces the runtime of the BZip and ZLib CRC checksums by about 75%, which is also good for overall I/O.

The performance can be improved even further by using unsafe code to eliminate array bounds checks. While that code isn't included in this PR, the benchmarks for the unsafe code are included below. The code can be found here: nsgomez@6a2e954

CLA

While this is my personal GitHub account, I'm submitting this PR in my capacity as a Microsoft employee.

The method CrcUtilities.GenerateSlicingLookupTable is derived from Crc32.NET's table generation code. We do not own that source code, but because it's MIT Licensed we have sufficient rights to use, copy, and distribute it. Otherwise, I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

Benchmarks

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.1248 (2004/?/20H1)
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
[Host] : .NET Framework 4.8 (4.8.4200.0), X64 RyuJIT
Job-KMVHYW : .NET Core 2.1.12 (CoreCLR 4.6.27817.01, CoreFX 4.6.27818.01), X64 RyuJIT
Job-DCFJWB : .NET Framework 4.8 (4.8.4200.0), X64 RyuJIT

net461

Method Mean ± StdDev (master) Mean ± StdDev (PR) Diff (mean) Mean ± StdDev (unsafe)
Adler32LargeUpdate 196.1 ± 1.04 ms 195.6 ± 0.76 ms ~0.00% 195.5 ± 0.45 ms
BZip2CrcLargeUpdate 988.8 ± 0.99 ms 226.6 ± 0.95 ms -77.08% 104.5 ± 0.32 ms
Crc32LargeUpdate 777.1 ± 4.24 ms 204.4 ± 2.46 ms -73.70% 102.5 ± 0.62 ms
ReadZipInputStream 311.9 ± 2.53 ms 166.6 ± 0.55 ms -46.59% 141.0 ± 0.35 ms
WriteZipOutputStream 616.7 ± 3.83 ms 497.0 ± 3.11 ms -19.41% 462.1 ± 0.82 ms

netcoreapp2.1

Method Mean ± StdDev (master) Mean ± StdDev (PR) Diff (mean) Mean ± StdDev (unsafe)
Adler32LargeUpdate 229.5 ± 0.68 ms 230.0 ± 0.89 ms ~0.00% 231.8 ± 1.96 ms
BZip2CrcLargeUpdate 1,079.4 ± 2.90 ms 233.5 ± 0.97 ms -78.37% 104.4 ± 0.30 ms
Crc32LargeUpdate 776.2 ± 3.11 ms 224.5 ± 1.06 ms -71.08% 102.2 ± 0.13 ms
ReadZipInputStream 310.7 ± 1.41 ms 167.2 ± 0.42 ms -46.19% 141.8 ± 0.38 ms
WriteZipOutputStream 606.6 ± 3.34 ms 495.0 ± 3.33 ms -18.40% 438.0 ± 1.86 ms

@Numpsy
Copy link
Contributor

Numpsy commented Sep 22, 2020

#319 did something similar with a variation of the Crc32,Net code that had some pretty nice improvements (There are couple of benchmarks in #328), but unfortunately has been sitting around for some time.
It would be nice to pick up the speed ups though.

@nsgomez
Copy link
Contributor Author

nsgomez commented Sep 23, 2020

We'd also love to have the perf improvement, let me know if I can help with getting either #319 or this PR merged in.

@piksel
Copy link
Member

piksel commented Oct 3, 2020

Yeah, we have some PRs for CRC improvements, but it felt kind of overwhelming to compare and choose. I made some efforts on setting up benchmarks to compare them, but we never reached a conclusion on what to do here. I will try to look in to it this weekend.

@piksel piksel merged commit ddc545b into icsharpcode:master Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants