Optimize HTTP2 HPack huffman decoding #43603

rokonec · 2020-10-19T20:53:58Z

Optimized to use 8 bits lookup tables tree with result of about 0.35 CPU utilization as oppose to former version.
Decoding table is lazy generated as ushort[].

#1506

After this changes goes through review and receive final approve, I will create corresponding pull request to aspnetcore including these changes and new benchmark similar to existing hpack benchmark

Optimized to use 8 bits lookup tables tree with result of about 0.35 CPU utilization as oppose to former version. Decoding table is lazy generated as ushort[]. runtime#1506

ghost · 2020-10-19T20:54:06Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs

- removing var where unclear - remove outdated comments - using local variable as oppose to static

rokonec · 2020-10-26T16:58:49Z

Closed/reopened to pickup fixes from master - worked :-)

ManickaP

At the end of GenerateDecodingLookupTree, will continue on Monday.

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs

- renaming LUT - swapping leaf flag meaning (MSB) - typos

ManickaP

In general LGTM. Some little comments and questions. Thank you!

BTW, this should go through ASP.NET folks before merging. Do you have a person of contact or should I invoke someone?

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs

Tratcher · 2020-11-09T19:34:34Z

BTW, this should go through ASP.NET folks before merging. Do you have a person of contact or should I invoke someone?

That's me 😁.

Decoding table is lazy generated as ushort[].

You're trading disk size for startup time. How long in ms does it take generate the table?

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs

- fix shift count computation - change acc size to 16 bits and append to it by byte

…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>

This reverts commit a09b4d3, partially, all but Huffman.cs

…tps://github.com/rokonec/runtime into rokonec/1506-optimize-huffman-decoding-using-lut

scalablecory

lgtm once outstanding comments are addressed. I'm not an expert in Huffman -- lets please make sure tests are exercising everything.

src/libraries/Common/src/System/Security/Cryptography/Asn1/AlgorithmIdentifierAsn.xml.cs

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs

geoffkizer · 2020-11-12T18:53:09Z

The "jump table" algorithm is an optimization of the classic, naive algorithm for decoding bit-by-bit using a binary tree.

In the classic algorithm, you start at the root of the tree and process one bit at a time. If the bit is 0, you walk to the left; if it's 1 you walk to the right. When you hit a leaf, you emit the associated character and start over at the root.

Of course you wouldn't literally represent this as a tree in memory. Instead you'd assign each internal node in the tree a number, from 0 (root) to 256. Then you'd have a lookup table that takes (node#, bit) and maps it to (isLeaf, emittedChar, nextNode). Note that emittedChar is only valid when isLeaf is true, and nextNode is always 0 for leaf nodes.

The interesting observation here is that you don't need to walk the tree just one bit at a time. You can build a lookup table that takes multiple bits at a time. The best choice here seems to be 4, as it's easy to get 4 bits at a time from a byte, and it keeps the lookup table from getting too huge.

So, your lookup table is now (node#, 4bits) -> (isCharEmitted, emittedChar, nextNode). Note that isCharEmitted means that a character was emitted in walking any of those 4 bits -- and since all chars need at least 5 bits to encode, we know only one can be emitted per 4 bits. And note that nextNode now is independent of whether a character was emitted or not -- we could have emitted a character after the first bit, but nextNode will give the node that we ended up on after returning to the root and walking the next three bits.

So the inner loop here basically looks like:

  bits = getnext4bits();
  (isCharEmitted, emittedChar, currentNode) = lookuptable[currentNode, bits];
  if (isCharEmitted)
    output[index++] = emittedChar;

There's a little bit more complexity to handle invalid sequences (EOS) and the end of the encoding.

rokonec · 2020-11-12T19:40:49Z

I have had carefully read your description and I do not see difference between "jump table" and my algorithm.

Here is what my loop body does:

bits = getnext8bits();
(isCharEmitted, emittedChar, unusedBits, nextNode) = lookuptable[currentNode, bits];
if (isCharEmitted)
  output[index++] = emittedChar;
  keepBits(unusedBits); // not all of 8 bits are used by emitted char, keep them for next code
  currentNode=rootNode;
else
  currentNode=nextNode;

Because we consume 8 bits it could well be that particular node could have enough info to emit two characters. For example if we would have 9-bit-code,5bit-code. I tried to write data structure which would allow multiple chars to be emitted but not only the decoding table was 50% bigger (in bytes size) it was slower in benchmark due to added complexity in supporting multiple emitted chars.
So in such situation we simple keep 7 bits from last lookup/jump, add 1 bit from source, reset to root and start over.

We have chosen 8 bits as it render best performance, about twice as fast as 4 bit, with acceptable size of decoding tree.

My code is much less readable than above pseudocode as I have tried to write it as high perf code with these optimization:

2d array flatten into 1d array
(isCharEmitted, emittedChar, unusedBits, nextNode) encoded into ushort
final 7- bits process separately to have main hot-loop as simple as possible

These made it about 30% faster than not-optimized version.

rokonec · 2020-11-16T18:51:42Z

Can we get the benchmark code added to the PR as well? This is useful for measuring future improvements.

@geoffkizer After this PR gets approved, I will create corresponding PR to aspnetcore, as described here also creating new benchmark close to existing hpack benchmark.

Expected Benchmark will looks like: https://gist.github.com/rokonec/4f63d464fd9a359add75056789e0f9e1

geoffkizer · 2020-11-19T17:22:19Z

I will create corresponding PR to aspnetcore

It would be good to have the benchmark code in this repo too. Let's figure out how to get it in here.

geoffkizer · 2020-11-19T17:23:12Z

@Tratcher can you review this as well?

Tratcher

I don't know huffman, but I don't see anything alarming and I'm content with the test coverage we have here.

I've added @halter73 in case he wants to take a look.

…mize-huffman-decoding-using-lut

…ec/1506-optimize-huffman-decoding-using-lut

ManickaP

Just a few technicalities around the benchmark project, otherwise LGTM.
Thanks for this perf improvement!

...es/System.Net.Http/tests/PerformanceTests/HPackHuffmanBenchmark/HPackHuffmanBenchmark.csproj

ManickaP · 2020-11-26T11:16:14Z

And please also update .gitignore because I'm seeing System.Net.Http/tests/PerformanceTests/HPackHuffmanBenchmark/BenchmarkDotNet.Artifacts in un-commited changes.

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs

Co-authored-by: Stephen Toub <[email protected]>

Optimize HTTP2 HPack huffman decoding

53f82b8

Optimized to use 8 bits lookup tables tree with result of about 0.35 CPU utilization as oppose to former version. Decoding table is lazy generated as ushort[]. runtime#1506

rokonec requested a review from a team October 19, 2020 20:53

Dotnet-GitSync-Bot added the area-System.Net.Http label Oct 19, 2020

Fix nullability

4ed05e3

rokonec mentioned this pull request Oct 19, 2020

HTTP2: Huffman decoding implementation is inefficient #1506

Open

rokonec requested a review from a team October 20, 2020 12:33

stephentoub reviewed Oct 22, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

stephentoub reviewed Oct 22, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

stephentoub reviewed Oct 22, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

stephentoub reviewed Oct 22, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

Review comments dotnet#1

ce3ce60

- removing var where unclear - remove outdated comments - using local variable as oppose to static

rokonec requested a review from stephentoub October 22, 2020 15:55

Add #nulable enable

28e9288

rokonec self-assigned this Oct 22, 2020

rokonec closed this Oct 26, 2020

rokonec reopened this Oct 26, 2020

ManickaP reviewed Nov 6, 2020

View reviewed changes

Review changes, ManickaP dotnet#1

a995dd3

- renaming LUT - swapping leaf flag meaning (MSB) - typos

ManickaP reviewed Nov 9, 2020

View reviewed changes

Tratcher reviewed Nov 9, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

rokonec and others added 6 commits November 9, 2020 23:01

Review changes, ManickaP dotnet#1

a09b4d3

- fix shift count computation - change acc size to 16 bits and append to it by byte

Update src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpac…

5ee19d7

…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>

Update src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpac…

b061aa1

…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>

Update src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpac…

4f74639

…k/Huffman.cs Co-authored-by: Marie Píchová <[email protected]>

Revert partially "Review changes, ManickaP dotnet#2"

12663d5

This reverts commit a09b4d3, partially, all but Huffman.cs

Merge branch 'rokonec/1506-optimize-huffman-decoding-using-lut' of ht…

f56825c

…tps://github.com/rokonec/runtime into rokonec/1506-optimize-huffman-decoding-using-lut

scalablecory reviewed Nov 9, 2020

View reviewed changes

rokonec requested review from Tratcher and scalablecory November 19, 2020 13:45

Tratcher requested a review from halter73 November 19, 2020 19:47

Tratcher approved these changes Nov 19, 2020

View reviewed changes

rokonec added 2 commits November 20, 2020 15:25

Merge remote-tracking branch 'upstream/master' into rokonec/1506-opti…

e3bdd87

…mize-huffman-decoding-using-lut

Merge branch 'master' of https://github.com/dotnet/runtime into rokon…

735831d

…ec/1506-optimize-huffman-decoding-using-lut

runfoapp bot mentioned this pull request Nov 20, 2020

Inability to unzip assets during build on Unix x64 #32805

Closed

rokonec added 2 commits November 23, 2020 11:42

Merge branch 'master' of https://github.com/dotnet/runtime into rokon…

21c44e8

…ec/1506-optimize-huffman-decoding-using-lut

Added benchmark

3bcd31d

ManickaP approved these changes Nov 26, 2020

View reviewed changes

stephentoub reviewed Nov 26, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

stephentoub reviewed Nov 26, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

rokonec added 3 commits November 26, 2020 13:28

Review changes - stephentoub dotnet#2

d6b6252

Review changes - manickap dotnet#3

9e79115

Fix, reorder static field for correct initialization order

e3b25a3

stephentoub reviewed Nov 26, 2020

View reviewed changes

src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/Huffman.cs Outdated Show resolved Hide resolved

runfoapp bot mentioned this pull request Nov 26, 2020

Many Android JIT tests failing in CI #44306

Closed

Review changes stephentoub dotnet#4

331d034

Co-authored-by: Stephen Toub <[email protected]>

ManickaP merged commit 7171407 into dotnet:master Nov 27, 2020

rokonec deleted the rokonec/1506-optimize-huffman-decoding-using-lut branch November 27, 2020 14:23

rokonec mentioned this pull request Nov 30, 2020

dotnet/runtime#1506 optimize huffman decoding dotnet/aspnetcore#28243

Merged

ghost locked as resolved and limited conversation to collaborators Dec 27, 2020

karelz added this to the 6.0.0 milestone Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize HTTP2 HPack huffman decoding #43603

Optimize HTTP2 HPack huffman decoding #43603

rokonec commented Oct 19, 2020

ghost commented Oct 19, 2020

rokonec commented Oct 26, 2020

ManickaP left a comment

ManickaP left a comment

Tratcher commented Nov 9, 2020

scalablecory left a comment

geoffkizer commented Nov 12, 2020 •

edited by karelz

Loading

rokonec commented Nov 12, 2020 •

edited

Loading

rokonec commented Nov 16, 2020 •

edited

Loading

geoffkizer commented Nov 19, 2020

geoffkizer commented Nov 19, 2020

Tratcher left a comment

ManickaP left a comment

ManickaP commented Nov 26, 2020

Optimize HTTP2 HPack huffman decoding #43603

Optimize HTTP2 HPack huffman decoding #43603

Conversation

rokonec commented Oct 19, 2020

ghost commented Oct 19, 2020

rokonec commented Oct 26, 2020

ManickaP left a comment

Choose a reason for hiding this comment

ManickaP left a comment

Choose a reason for hiding this comment

Tratcher commented Nov 9, 2020

scalablecory left a comment

Choose a reason for hiding this comment

geoffkizer commented Nov 12, 2020 • edited by karelz Loading

rokonec commented Nov 12, 2020 • edited Loading

rokonec commented Nov 16, 2020 • edited Loading

geoffkizer commented Nov 19, 2020

geoffkizer commented Nov 19, 2020

Tratcher left a comment

Choose a reason for hiding this comment

ManickaP left a comment

Choose a reason for hiding this comment

ManickaP commented Nov 26, 2020

geoffkizer commented Nov 12, 2020 •

edited by karelz

Loading

rokonec commented Nov 12, 2020 •

edited

Loading

rokonec commented Nov 16, 2020 •

edited

Loading