Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SSE2 optimizations on ARM Neon for warping, pansharpening, gridding, dithering, RPC, PNG, GTI #11239

Merged
merged 10 commits into from
Nov 25, 2024

Conversation

rouault
Copy link
Member

@rouault rouault commented Nov 10, 2024

(on top of #11237)

Comparing https://github.com/OSGeo/gdal/actions/runs/11766759419/job/32774680289?pr=11237 (before) and https://github.com/rouault/gdal/actions/runs/11766932147/job/32775064275 (this PR), shows on Apple Silicon:

before:

Name (time in us)                                                             Min                       Max                      Mean                 StdDev                    Median                     IQR            Outliers          OPS            Rounds  Iterations
test_gdalwarp[cubic-1]                                             1,393,559.0000 (>1000.0)  1,408,203.0000 (>1000.0)  1,402,279.4000 (>1000.0)   5,568.3731 (753.32)   1,402,622.0000 (>1000.0)    6,975.2500 (>1000.0)       2;0       0.7131 (0.00)          5           1
test_gdalwarp[cubic-ALL_CPUS]                                      1,393,650.0000 (>1000.0)  1,600,824.0000 (>1000.0)  1,455,685.8000 (>1000.0)  85,820.3716 (>1000.0)  1,413,421.0000 (>1000.0)   98,352.0000 (>1000.0)       1;0       0.6870 (0.00)          5           1

this PR:

Name (time in us)                                                             Min                       Max                      Mean                 StdDev                    Median                     IQR            Outliers          OPS            Rounds  Iterations
test_gdalwarp[cubic-1]                                             1,294,533.0000 (>1000.0)  1,332,336.0000 (>1000.0)  1,311,688.6000 (>1000.0)  14,026.4810 (>1000.0)  1,313,216.0000 (>1000.0)  17,055.7500 (>1000.0)       2;0       0.7624 (0.00)          5           1
test_gdalwarp[cubic-ALL_CPUS]                                      1,271,232.0000 (>1000.0)  1,287,036.0000 (>1000.0)  1,280,295.6000 (>1000.0)   6,870.0957 (>1000.0)  1,282,034.0000 (>1000.0)  12,030.0000 (>1000.0)       1;0       0.7811 (0.00)          5           1

So a 1,402,279.4000 down to 1,332,336.0000 us execution time for the single threaded use case, a 7% improvement, and a 13% improvement in the multithreaded code path. Note: those timings might not be super reliable due to being done on 2 separate VM execution, but they at least show some improvement

@rouault rouault added this to the 3.11.0 milestone Nov 10, 2024
@rouault rouault added the funded through GSP Work funded through the GDAL Sponsorship Program label Nov 10, 2024
rouault added a commit to rouault/sse2neon that referenced this pull request Nov 10, 2024
We are happy users of sse2neon with OSGeo/gdal#11202 and OSGeo/gdal#11239
@rouault rouault merged commit 9bc4894 into OSGeo:master Nov 25, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
funded through GSP Work funded through the GDAL Sponsorship Program
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant