Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239

rouault · 2024-11-10T17:16:38Z

(on top of #11237)

Comparing https://github.com/OSGeo/gdal/actions/runs/11766759419/job/32774680289?pr=11237 (before) and https://github.com/rouault/gdal/actions/runs/11766932147/job/32775064275 (this PR), shows on Apple Silicon:

before:

Name (time in us)                                                             Min                       Max                      Mean                 StdDev                    Median                     IQR            Outliers          OPS            Rounds  Iterations
test_gdalwarp[cubic-1]                                             1,393,559.0000 (>1000.0)  1,408,203.0000 (>1000.0)  1,402,279.4000 (>1000.0)   5,568.3731 (753.32)   1,402,622.0000 (>1000.0)    6,975.2500 (>1000.0)       2;0       0.7131 (0.00)          5           1
test_gdalwarp[cubic-ALL_CPUS]                                      1,393,650.0000 (>1000.0)  1,600,824.0000 (>1000.0)  1,455,685.8000 (>1000.0)  85,820.3716 (>1000.0)  1,413,421.0000 (>1000.0)   98,352.0000 (>1000.0)       1;0       0.6870 (0.00)          5           1

this PR:

Name (time in us)                                                             Min                       Max                      Mean                 StdDev                    Median                     IQR            Outliers          OPS            Rounds  Iterations
test_gdalwarp[cubic-1]                                             1,294,533.0000 (>1000.0)  1,332,336.0000 (>1000.0)  1,311,688.6000 (>1000.0)  14,026.4810 (>1000.0)  1,313,216.0000 (>1000.0)  17,055.7500 (>1000.0)       2;0       0.7624 (0.00)          5           1
test_gdalwarp[cubic-ALL_CPUS]                                      1,271,232.0000 (>1000.0)  1,287,036.0000 (>1000.0)  1,280,295.6000 (>1000.0)   6,870.0957 (>1000.0)  1,282,034.0000 (>1000.0)  12,030.0000 (>1000.0)       1;0       0.7811 (0.00)          5           1

So a 1,402,279.4000 down to 1,332,336.0000 us execution time for the single threaded use case, a 7% improvement, and a 13% improvement in the multithreaded code path. Note: those timings might not be super reliable due to being done on 2 separate VM execution, but they at least show some improvement

…ing, dithering, RPC

We are happy users of sse2neon with OSGeo/gdal#11202 and OSGeo/gdal#11239

autotest: fix benchark which was no longer running since 122cc14

eb3dd2f

rouault added this to the 3.11.0 milestone Nov 10, 2024

rouault added 9 commits November 10, 2024 18:19

CI: osx: run benchmarks

35225d0

gcore/gdalsse_priv.h: enable SSE4.1 code path for AVX and Neon

316aaf4

alg/: enable ARM Neon optimizations for warping, pansharpening, gridd…

d309a24

…ing, dithering, RPC

GTI: use SSE2 code path for ARM Neon optimizations

eef42b5

PNG: use SSE2 code path for ARM Neon optimizations

1a9ffb6

gcore/gdal_minmax_element.hpp: use SSE4.1 code path with AVX and Neon

504c8c9

gcore/gdal_priv_templates.hpp: use SSE4.1 code path with AVX and Neon

ef167ea

overview.cpp: use SSE4.1 optim with AVX

db73dcc

warp: use SSE4.1 code path with AVX

0642738

rouault added the funded through GSP Work funded through the GDAL Sponsorship Program label Nov 10, 2024

rouault force-pushed the alg_neon branch from 158870e to 0642738 Compare November 10, 2024 17:41

rouault added a commit to rouault/sse2neon that referenced this pull request Nov 10, 2024

README.md: mention GDAL

abe504b

We are happy users of sse2neon with OSGeo/gdal#11202 and OSGeo/gdal#11239

rouault mentioned this pull request Nov 10, 2024

README.md: mention GDAL DLTcollab/sse2neon#654

Merged

rouault merged commit 9bc4894 into OSGeo:master Nov 25, 2024
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239

Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239

rouault commented Nov 10, 2024 •

edited

Loading

Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239

Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239

Conversation

rouault commented Nov 10, 2024 • edited Loading

rouault commented Nov 10, 2024 •

edited

Loading