-
-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvements to resize convolution (no vpermps w/ FMA) #1518
Conversation
Co-authored-by: Clinton Ingram <[email protected]>
…geSharp into js/Shuffle3Channel
Fix JpegDecoderTests.Decode_IsCancellable
Assembly for loading in the loop went from: ```asm vmovss xmm2, [rax] vbroadcastss xmm2, xmm2 vmovss xmm3, [rax+4] vbroadcastss xmm3, xmm3 vinsertf128 ymm2, ymm2, xmm3, 1 ``` To: ```asm vmovsd xmm3, [rax] vbroadcastsd ymm3, xmm3 vpermps ymm3, ymm1, ymm3 ```
See Vector256.Create issue: dotnet/runtime#47236
Speed improvements to resize kernel (w/ SIMD)
@Sergio0694 looks like there are several tests failing in
Would be nice if we could dig out the git history, and check how @JimBobSquarePants 's original code used |
@antonfirsov yeah I noticed that too, just hadn't had the time to work on that just yet 😅 I'm thinking maybe we should split up this PR and only merge the |
The resize tests do not cover all the cases. Kernel map creation is tested against all kinds of weird image dimensions + different resampler dimensions. It's not worth to run expensive end-to-end resize tests for all those combinations, instead we have unit tests in
Splitting out would be great yeah! By " |
Ooh I see, makes sense, thanks! Will take a look at those extra tests then 😄
Yup, exactly - the bit that expands the factors buffer to length 4x and then removing the shuffle. |
db51f69
to
172c48e
Compare
Sorry @Sergio0694 The introduction of Git LFS (and subsequent history rewrite) has broken this. I spent a couple of hours trying to do a merge with unmatched history but Git simply wont bend to my will. It'd be simpler to create a new branch from master, copy your changes into it and open a new PR. |
So... I decided to revisit this after far too many years and reimplemented each commit against the v4 codebase. Differences in kernel map generation have something to do with the Benchmarks don't really yield any meaningful difference between this and main.
Main
PR
You can see my changes in this branch here. https://github.com/SixLabors/ImageSharp/tree/js/resize-map-optimizations |
Prerequisites
Description
Follow up to #1513. This PR does a couple things:
float
Span<T>.CopyTo
insteadResize convolution codegen diff
Before:
After: