Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoist some of the calculations from loops of 3DMoments() #1818

Merged
merged 4 commits into from
Nov 13, 2021

Conversation

kunalspathak
Copy link
Contributor

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

The 4-level nested for loop in 3DMoments is very expensive, the inner most loop executes almost 1.1M times. We call GetPaletteIndex() method that does bunch of calculation using index variables of all the 4-loops.

for (int r = 1; r < IndexCount; r++)
{
volumeSpan.Clear();
for (int g = 1; g < IndexCount; g++)
{
areaSpan.Clear();
for (int b = 1; b < IndexCount; b++)
{
Moment line = default;
for (int a = 1; a < IndexAlphaCount; a++)
{
int ind1 = GetPaletteIndex(r, g, b, a);
line += momentSpan[ind1];
areaSpan[a] += line;
int inv = (b * IndexAlphaCount) + a;
volumeSpan[inv] += areaSpan[a];
int ind2 = ind1 - baseIndex;
momentSpan[ind1] = momentSpan[ind2] + volumeSpan[inv];
}
}
}

There is a potential to hoist some of the invariants calculation outside the loop. I verified the benchmark and see around 2-3% improvement.

Here is the assembly difference: https://www.diffchecker.com/I4uqCtBO

@codecov
Copy link

codecov bot commented Nov 10, 2021

Codecov Report

Merging #1818 (0db87a5) into master (21d95a7) will increase coverage by 0%.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #1818    +/-   ##
=======================================
  Coverage      87%     87%            
=======================================
  Files         936     936            
  Lines       48164   48191    +27     
  Branches     6037    6037            
=======================================
+ Hits        41970   42097   +127     
+ Misses       5190    5093    -97     
+ Partials     1004    1001     -3     
Flag Coverage Δ
unittests 87% <100%> (+<1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ing/Processors/Quantization/WuQuantizer{TPixel}.cs 91% <100%> (+<1%) ⬆️
...ImageSharp/Formats/Webp/Lossless/Vp8LBitEntropy.cs 98% <0%> (-2%) ⬇️
.../ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs 97% <0%> (+8%) ⬆️
...ageSharp/Formats/Webp/Lossless/PredictorEncoder.cs 98% <0%> (+8%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21d95a7...0db87a5. Read the comment docs.

@JimBobSquarePants
Copy link
Member

@kunalspathak Thanks for this it is very interesting. I've just read your linked runtime issue also.

If we can add some supplementary comments to explain the optimization and refactor some of the variable naming (prefer ind1R over ind1_r ) then that would be a great addition to the codebase! 👍

@kunalspathak
Copy link
Contributor Author

I have also noticed other places where similar technique could be use although I didn't measure its performance.
For eg, if I just search for all the references of GetPaletteIndex:

private static int GetPaletteIndex(int r, int g, int b, int a)
, once of the reference is :
private static Moment Volume(ref Box cube, ReadOnlySpan<Moment> moments)
=> moments[GetPaletteIndex(cube.RMax, cube.GMax, cube.BMax, cube.AMax)]
- moments[GetPaletteIndex(cube.RMax, cube.GMax, cube.BMax, cube.AMin)]
- moments[GetPaletteIndex(cube.RMax, cube.GMax, cube.BMin, cube.AMax)]
+ moments[GetPaletteIndex(cube.RMax, cube.GMax, cube.BMin, cube.AMin)]
- moments[GetPaletteIndex(cube.RMax, cube.GMin, cube.BMax, cube.AMax)]
+ moments[GetPaletteIndex(cube.RMax, cube.GMin, cube.BMax, cube.AMin)]
+ moments[GetPaletteIndex(cube.RMax, cube.GMin, cube.BMin, cube.AMax)]
- moments[GetPaletteIndex(cube.RMax, cube.GMin, cube.BMin, cube.AMin)]
- moments[GetPaletteIndex(cube.RMin, cube.GMax, cube.BMax, cube.AMax)]
+ moments[GetPaletteIndex(cube.RMin, cube.GMax, cube.BMax, cube.AMin)]
+ moments[GetPaletteIndex(cube.RMin, cube.GMax, cube.BMin, cube.AMax)]
- moments[GetPaletteIndex(cube.RMin, cube.GMax, cube.BMin, cube.AMin)]
+ moments[GetPaletteIndex(cube.RMin, cube.GMin, cube.BMax, cube.AMax)]
- moments[GetPaletteIndex(cube.RMin, cube.GMin, cube.BMax, cube.AMin)]
- moments[GetPaletteIndex(cube.RMin, cube.GMin, cube.BMin, cube.AMax)]
+ moments[GetPaletteIndex(cube.RMin, cube.GMin, cube.BMin, cube.AMin)];

Here, r related calculation for cube.RMax and cube.RMin can be precalculated and reused.

Another interesting method is Mark(). What is the common range of RMin ~ RMax, GMin ~ GMax, etc. If it is >= 32, we are already calculating GetPaletteIndex() million times and some of the calculation can be hoisted out the way its been done in this PR. Let me know and I can do for Mark() method as well.

for (int r = cube.RMin + 1; r <= cube.RMax; r++)
{
for (int g = cube.GMin + 1; g <= cube.GMax; g++)
{
for (int b = cube.BMin + 1; b <= cube.BMax; b++)
{
for (int a = cube.AMin + 1; a <= cube.AMax; a++)
{
tagSpan[GetPaletteIndex(r, g, b, a)] = label;
}
}
}
}

@@ -85,7 +85,7 @@ public void PngCoreWu()
public void PngCoreWuNoDither()
{
using var memoryStream = new MemoryStream();
var options = new PngEncoder { Quantizer = new WuQuantizer(new QuantizerOptions { Dither = null }) };
var options = new PngEncoder { Quantizer = new WuQuantizer(new QuantizerOptions { Dither = null }), ColorType = PngColorType.Palette };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change so I can get perf coverage for it. Should I keep it or revert it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. That's an oversight from us.

Very happy to see what other optimizations you can bring as long as they are well commented. Thanks! 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, happy to contribute. I think it will be more productive for us to contribute or investigate the performance issues if these benchmarks are in our system. Any progress on #1795?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving that one for @antonfirsov Likely after we complete #1730 and release V2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak kunalspathak marked this pull request as ready for review November 12, 2021 15:41
@kunalspathak
Copy link
Contributor Author

This should be ready for review now.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! 👍

@JimBobSquarePants JimBobSquarePants merged commit a39fe4c into SixLabors:master Nov 13, 2021
@kunalspathak
Copy link
Contributor Author

This workaround will not be needed anymore because codegen will take care of it as part of dotnet/runtime#68061. Here is the diff for the earlier version of C# code:

image

Full diff: https://www.diffchecker.com/DKGAZKq0

@JimBobSquarePants
Copy link
Member

That’s .NET 7 though yeah? We’ll be sticking to LTS release targets from now on so just 6 for now.

I am planning on revising many parts of the codebase though to simplify things where possible now we have a much more modern single target.

@kunalspathak
Copy link
Contributor Author

That’s .NET 7 though yeah?

Yes.

We’ll be sticking to LTS release targets from now on so just 6 for now.

Sure, this was just FYI :)

@JimBobSquarePants
Copy link
Member

It’s really good to know!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants