-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize HTTP2 HPack huffman decoding #43603
Optimize HTTP2 HPack huffman decoding #43603
Conversation
Optimized to use 8 bits lookup tables tree with result of about 0.35 CPU utilization as oppose to former version. Decoding table is lazy generated as ushort[]. runtime#1506
Tagging subscribers to this area: @dotnet/ncl |
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
- removing var where unclear - remove outdated comments - using local variable as oppose to static
Closed/reopened to pickup fixes from master - worked :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of GenerateDecodingLookupTree
, will continue on Monday.
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Show resolved
Hide resolved
- renaming LUT - swapping leaf flag meaning (MSB) - typos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM. Some little comments and questions. Thank you!
BTW, this should go through ASP.NET folks before merging. Do you have a person of contact or should I invoke someone?
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
That's me 😁.
You're trading disk size for startup time. How long in ms does it take generate the table? |
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
- fix shift count computation - change acc size to 16 bits and append to it by byte
…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>
…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>
…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>
This reverts commit a09b4d3, partially, all but Huffman.cs
…tps://github.com/rokonec/runtime into rokonec/1506-optimize-huffman-decoding-using-lut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm once outstanding comments are addressed. I'm not an expert in Huffman -- lets please make sure tests are exercising everything.
src/libraries/Common/src/System/Security/Cryptography/Asn1/AlgorithmIdentifierAsn.xml.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
The "jump table" algorithm is an optimization of the classic, naive algorithm for decoding bit-by-bit using a binary tree. In the classic algorithm, you start at the root of the tree and process one bit at a time. If the bit is 0, you walk to the left; if it's 1 you walk to the right. When you hit a leaf, you emit the associated character and start over at the root. Of course you wouldn't literally represent this as a tree in memory. Instead you'd assign each internal node in the tree a number, from 0 (root) to 256. Then you'd have a lookup table that takes (node#, bit) and maps it to (isLeaf, emittedChar, nextNode). Note that emittedChar is only valid when isLeaf is true, and nextNode is always 0 for leaf nodes. The interesting observation here is that you don't need to walk the tree just one bit at a time. You can build a lookup table that takes multiple bits at a time. The best choice here seems to be 4, as it's easy to get 4 bits at a time from a byte, and it keeps the lookup table from getting too huge. So, your lookup table is now (node#, 4bits) -> (isCharEmitted, emittedChar, nextNode). Note that isCharEmitted means that a character was emitted in walking any of those 4 bits -- and since all chars need at least 5 bits to encode, we know only one can be emitted per 4 bits. And note that nextNode now is independent of whether a character was emitted or not -- we could have emitted a character after the first bit, but nextNode will give the node that we ended up on after returning to the root and walking the next three bits. So the inner loop here basically looks like: bits = getnext4bits();
(isCharEmitted, emittedChar, currentNode) = lookuptable[currentNode, bits];
if (isCharEmitted)
output[index++] = emittedChar; There's a little bit more complexity to handle invalid sequences (EOS) and the end of the encoding. |
I have had carefully read your description and I do not see difference between "jump table" and my algorithm. Here is what my loop body does: bits = getnext8bits();
(isCharEmitted, emittedChar, unusedBits, nextNode) = lookuptable[currentNode, bits];
if (isCharEmitted)
output[index++] = emittedChar;
keepBits(unusedBits); // not all of 8 bits are used by emitted char, keep them for next code
currentNode=rootNode;
else
currentNode=nextNode; Because we consume 8 bits it could well be that particular node could have enough info to emit two characters. For example if we would have 9-bit-code,5bit-code. I tried to write data structure which would allow multiple chars to be emitted but not only the decoding table was 50% bigger (in bytes size) it was slower in benchmark due to added complexity in supporting multiple emitted chars. We have chosen 8 bits as it render best performance, about twice as fast as 4 bit, with acceptable size of decoding tree. My code is much less readable than above pseudocode as I have tried to write it as high perf code with these optimization:
These made it about 30% faster than not-optimized version. |
@geoffkizer After this PR gets approved, I will create corresponding PR to aspnetcore, as described here also creating new benchmark close to existing hpack benchmark. Expected Benchmark will looks like: https://gist.github.com/rokonec/4f63d464fd9a359add75056789e0f9e1 |
It would be good to have the benchmark code in this repo too. Let's figure out how to get it in here. |
@Tratcher can you review this as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know huffman, but I don't see anything alarming and I'm content with the test coverage we have here.
I've added @halter73 in case he wants to take a look.
…mize-huffman-decoding-using-lut
…ec/1506-optimize-huffman-decoding-using-lut
…ec/1506-optimize-huffman-decoding-using-lut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few technicalities around the benchmark project, otherwise LGTM.
Thanks for this perf improvement!
...es/System.Net.Http/tests/PerformanceTests/HPackHuffmanBenchmark/HPackHuffmanBenchmark.csproj
Outdated
Show resolved
Hide resolved
...es/System.Net.Http/tests/PerformanceTests/HPackHuffmanBenchmark/HPackHuffmanBenchmark.csproj
Show resolved
Hide resolved
...es/System.Net.Http/tests/PerformanceTests/HPackHuffmanBenchmark/HPackHuffmanBenchmark.csproj
Show resolved
Hide resolved
And please also update .gitignore because I'm seeing |
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Stephen Toub <[email protected]>
Optimized to use 8 bits lookup tables tree with result of about 0.35 CPU utilization as oppose to former version.
Decoding table is lazy generated as ushort[].
#1506
After this changes goes through review and receive final approve, I will create corresponding pull request to aspnetcore including these changes and new benchmark similar to existing hpack benchmark