1D convolution optimization and general codegen tweaks #1477

Sergio0694 · 2020-12-15T21:12:30Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This PR does a few things:

Speed optimizations to the 2D pass convolution processor (powering gaussian blur, sharpen, etc.)
Speed optimizations to the bokeh blur
Some general codegen optimizations that should apply to all common pixel conversions, etc.

Benchmarks

Here's a preview of the current improvements for the gaussian blur benchmark:

And here's some more bokeh blur optimizations compared to master, after #1475 got merged:

codecov · 2020-12-15T21:27:27Z

Codecov Report

Merging #1477 (5601559) into master (a8cae3f) will decrease coverage by 0.07%.
The diff coverage is 78.37%.

@@            Coverage Diff             @@
##           master    #1477      +/-   ##
==========================================
- Coverage   83.55%   83.48%   -0.08%     
==========================================
  Files         741      740       -1     
  Lines       32462    32559      +97     
  Branches     3648     3652       +4     
==========================================
+ Hits        27125    27181      +56     
- Misses       4625     4665      +40     
- Partials      712      713       +1

Flag	Coverage Δ
unittests	`83.48% <78.37%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...s/Convolution/Convolution2PassProcessor{TPixel}.cs	`60.86% <58.85%> (-39.14%)`	⬇️
...mageSharp/ColorSpaces/Companding/SRgbCompanding.cs	`100.00% <100.00%> (ø)`
src/ImageSharp/Common/Helpers/Numerics.cs	`97.80% <100.00%> (+0.15%)`	⬆️
...rp/PixelFormats/Utils/Vector4Converters.Default.cs	`100.00% <100.00%> (ø)`
...ssing/Processors/Convolution/BokehBlurProcessor.cs	`100.00% <100.00%> (ø)`
...ocessors/Convolution/BokehBlurProcessor{TPixel}.cs	`99.35% <100.00%> (+0.01%)`	⬆️
...Processors/Convolution/BoxBlurProcessor{TPixel}.cs	`100.00% <100.00%> (ø)`
...cessors/Convolution/ConvolutionProcessorHelpers.cs	`100.00% <100.00%> (ø)`
...ssors/Convolution/GaussianBlurProcessor{TPixel}.cs	`100.00% <100.00%> (ø)`
...rs/Convolution/GaussianSharpenProcessor{TPixel}.cs	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1a0fb6...5601559. Read the comment docs.

JimBobSquarePants · 2020-12-16T00:03:27Z

src/ImageSharp/ColorSpaces/Companding/SRgbCompanding.cs

@@ -90,4 +94,4 @@ public static void Compress(ref Vector4 vector)
        [MethodImpl(InliningOptions.ShortMethod)]
        public static float Compress(float channel) => channel <= 0.0031308F ? 12.92F * channel : (1.055F * MathF.Pow(channel, 0.416666666666667F)) - 0.055F;


If we ever figure out how to do an accurate SIMD enable approximation of this we would be laughing.

pow(channel, 0.416666666666667F) => exp(channel * log(0.416666666666667F))

log(0.416666666666667F) == -0.875468737353899935628f

So...

public static void Compress(ref Vector4 vector) { var channels = Unsafe.As<Vector4, Vector128<float>>(ref vector); var log = Vector128.Create(-0.875468737353899935628f); channels = Sse.Multiply(channels, log); channels = Exp(channels); // Isn't simd intrinsic if (Fma.IsSupported) { channels = Fma.MultiplyAdd(Vector128.Create(1.055F), channels, Vector128.Create(-0.055F)); } else { channels = Sse.Add(Sse.Multiply(Vector128.Create(1.055F), channels), Vector128.Create(-0.055F)); } Unsafe.As<Vector4, Vector128<float>>(ref vector) = channels; }

But Exp isn't a Simd intrinsic; however you can approximate it with these sequences sse_mathfun or avx_mathfun?

src/ImageSharp/Processing/Processors/Convolution/Convolution2PassProcessor{TPixel}.cs

JimBobSquarePants

Very, very nice! 🚀

1D convolution optimization and general codegen tweaks

Sergio0694 added 9 commits December 15, 2020 18:35

Port horizontal convolution processor, remove Y loop

8e67153

Port vertical convolution processor, remove X loop

a618b76

Remove unnecessary inner loop coordinate sampling

f52802d

Switch to shared sampling map for convolution passes

a9c1652

Remove convolution state, more optimizations

e60827f

Remove transposed 1D kernels, switch to float[] type

e574232

Remove leftover ConvolutionRowOperation<TPixel> type

5a38307

Minor code tweaks

e11adc6

More performance improvements to 2 pass convolution

cb5c868

Sergio0694 added the area:performance label Dec 15, 2020

Sergio0694 added this to the 1.1.0 milestone Dec 15, 2020

Sergio0694 added 3 commits December 15, 2020 22:49

More codegen improvements to bokeh blur

979baf7

More codegen improvements to shared methods

1a3e1e7

Codegen improvements to Numerics.Clamp

5601559

Sergio0694 marked this pull request as ready for review December 15, 2020 22:59

Sergio0694 requested a review from JimBobSquarePants December 15, 2020 23:05

JimBobSquarePants reviewed Dec 16, 2020

View reviewed changes

src/ImageSharp/Processing/Processors/Convolution/Convolution2PassProcessor{TPixel}.cs Show resolved Hide resolved

JimBobSquarePants approved these changes Dec 16, 2020

View reviewed changes

JimBobSquarePants merged commit f84d525 into master Dec 16, 2020

JimBobSquarePants deleted the sp/2pass-convolution-speedup branch December 16, 2020 00:12

Sergio0694 mentioned this pull request Feb 2, 2021

Improve JIT loop optimizations (.NET 6) dotnet/runtime#43549

Closed

25 tasks

JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021

Merge pull request #1477 from SixLabors/sp/2pass-convolution-speedup

b4e7d80

1D convolution optimization and general codegen tweaks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1D convolution optimization and general codegen tweaks #1477

1D convolution optimization and general codegen tweaks #1477

Sergio0694 commented Dec 15, 2020 •

edited

Loading

codecov bot commented Dec 15, 2020 •

edited

Loading

JimBobSquarePants Dec 16, 2020

benaadams Dec 16, 2020

JimBobSquarePants left a comment

		@@ -90,4 +94,4 @@ public static void Compress(ref Vector4 vector)
		[MethodImpl(InliningOptions.ShortMethod)]
		public static float Compress(float channel) => channel <= 0.0031308F ? 12.92F * channel : (1.055F * MathF.Pow(channel, 0.416666666666667F)) - 0.055F;

1D convolution optimization and general codegen tweaks #1477

1D convolution optimization and general codegen tweaks #1477

Conversation

Sergio0694 commented Dec 15, 2020 • edited Loading

Prerequisites

Description

Benchmarks

codecov bot commented Dec 15, 2020 • edited Loading

Codecov Report

JimBobSquarePants Dec 16, 2020

Choose a reason for hiding this comment

benaadams Dec 16, 2020

Choose a reason for hiding this comment

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Sergio0694 commented Dec 15, 2020 •

edited

Loading

codecov bot commented Dec 15, 2020 •

edited

Loading