Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(on top of #11237)
Comparing https://github.com/OSGeo/gdal/actions/runs/11766759419/job/32774680289?pr=11237 (before) and https://github.com/rouault/gdal/actions/runs/11766932147/job/32775064275 (this PR), shows on Apple Silicon:
before:
this PR:
So a 1,402,279.4000 down to 1,332,336.0000 us execution time for the single threaded use case, a 7% improvement, and a 13% improvement in the multithreaded code path. Note: those timings might not be super reliable due to being done on 2 separate VM execution, but they at least show some improvement