Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1D convolution optimization and general codegen tweaks #1477

Merged
merged 12 commits into from
Dec 16, 2020

Conversation

Sergio0694
Copy link
Member

@Sergio0694 Sergio0694 commented Dec 15, 2020

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR does a few things:

  • Speed optimizations to the 2D pass convolution processor (powering gaussian blur, sharpen, etc.)
  • Speed optimizations to the bokeh blur
  • Some general codegen optimizations that should apply to all common pixel conversions, etc.

Benchmarks

Here's a preview of the current improvements for the gaussian blur benchmark:

image

And here's some more bokeh blur optimizations compared to master, after #1475 got merged:

image

@Sergio0694 Sergio0694 added this to the 1.1.0 milestone Dec 15, 2020
@codecov
Copy link

codecov bot commented Dec 15, 2020

Codecov Report

Merging #1477 (5601559) into master (a8cae3f) will decrease coverage by 0.07%.
The diff coverage is 78.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1477      +/-   ##
==========================================
- Coverage   83.55%   83.48%   -0.08%     
==========================================
  Files         741      740       -1     
  Lines       32462    32559      +97     
  Branches     3648     3652       +4     
==========================================
+ Hits        27125    27181      +56     
- Misses       4625     4665      +40     
- Partials      712      713       +1     
Flag Coverage Δ
unittests 83.48% <78.37%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...s/Convolution/Convolution2PassProcessor{TPixel}.cs 60.86% <58.85%> (-39.14%) ⬇️
...mageSharp/ColorSpaces/Companding/SRgbCompanding.cs 100.00% <100.00%> (ø)
src/ImageSharp/Common/Helpers/Numerics.cs 97.80% <100.00%> (+0.15%) ⬆️
...rp/PixelFormats/Utils/Vector4Converters.Default.cs 100.00% <100.00%> (ø)
...ssing/Processors/Convolution/BokehBlurProcessor.cs 100.00% <100.00%> (ø)
...ocessors/Convolution/BokehBlurProcessor{TPixel}.cs 99.35% <100.00%> (+0.01%) ⬆️
...Processors/Convolution/BoxBlurProcessor{TPixel}.cs 100.00% <100.00%> (ø)
...cessors/Convolution/ConvolutionProcessorHelpers.cs 100.00% <100.00%> (ø)
...ssors/Convolution/GaussianBlurProcessor{TPixel}.cs 100.00% <100.00%> (ø)
...rs/Convolution/GaussianSharpenProcessor{TPixel}.cs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1a0fb6...5601559. Read the comment docs.

@Sergio0694 Sergio0694 marked this pull request as ready for review December 15, 2020 22:59
@@ -90,4 +94,4 @@ public static void Compress(ref Vector4 vector)
[MethodImpl(InliningOptions.ShortMethod)]
public static float Compress(float channel) => channel <= 0.0031308F ? 12.92F * channel : (1.055F * MathF.Pow(channel, 0.416666666666667F)) - 0.055F;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever figure out how to do an accurate SIMD enable approximation of this we would be laughing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pow(channel, 0.416666666666667F) => exp(channel * log(0.416666666666667F))

log(0.416666666666667F) == -0.875468737353899935628f

So...

public static void Compress(ref Vector4 vector)
{
    var channels = Unsafe.As<Vector4, Vector128<float>>(ref vector);
    var log = Vector128.Create(-0.875468737353899935628f);

    channels = Sse.Multiply(channels, log);

    channels = Exp(channels); // Isn't simd intrinsic

    if (Fma.IsSupported)
    {
        channels = Fma.MultiplyAdd(Vector128.Create(1.055F), channels, Vector128.Create(-0.055F));
    }
    else
    {
        channels = Sse.Add(Sse.Multiply(Vector128.Create(1.055F), channels), Vector128.Create(-0.055F));
    }

    Unsafe.As<Vector4, Vector128<float>>(ref vector) = channels;
}

But Exp isn't a Simd intrinsic; however you can approximate it with these sequences sse_mathfun or avx_mathfun?

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very, very nice! 🚀

@JimBobSquarePants JimBobSquarePants merged commit f84d525 into master Dec 16, 2020
@JimBobSquarePants JimBobSquarePants deleted the sp/2pass-convolution-speedup branch December 16, 2020 00:12
JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021
1D convolution optimization and general codegen tweaks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants