-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make SIMD configure tests work for MacOS multiarch builds #78
Conversation
Multiarch builds compile source files twice for different architectures and then joins the object files together using 'lipo'. This can cause architecture-specific configure tests, like the ones for SIMD features, to go wrong because while the source compiles for one architecture, it may fail on the other causing the test as a whole to incorrectly report that the feature is not supported. To fix this, #ifdef guards are added to both the configure checks and the source code being built so that they won't fail on the wrong architecture. The configure tests are also changed so that instead of testing to see if a particular flag works, they instead check to see what flags are needed to build code using a particular SIMD feature. Due to the #ifdef guards, the tests will work and give an empty result on the wrong architecture (the test result will be reported as "none"). They will also do this if no flag is needed, e.g. because other options enable the feature already. If the flag is needed then they will report it and add if to CFLAGS. Finally if the test fails completely even though trying the correct architecture they will report "unsupported" and disable the SIMD code. In the multiarch case, the build on the "wrong" architecture should now do nothing, which the "right" one will determine the flags needed, which will be reported as the end result. The test for NEON didn't need to try any specific flags. As the NEON code is selected by #ifdef __ARM_NEON already, the NEON test has been removed and rANS_static32x16pr_neon.c is now compiled unconditionally. As some of the source files may now be compiled but essentially empty due to platform #ifdef guards, some dummy symbols have been added to avoid complaints about empty translation units. The AX_CHECK_COMPILE_FLAG() autoconf macro is replaced with HTS_CHECK_COMPILE_FLAGS_NEEDED() which implements the flag-checking tests.
On the Intel side, what about |
We did discuss this briefly, but concluded the only i386 machines with SIMD support are basically modern CPUs running in 32-bit mode. The code should work fine in 32-bit mode, but it won't gain access to the accelerated implementations. I don't really know how big an issue this is, and whether we actually care given it's a rather unusual scenario. Do you have a better feeling for the reasoning behind using new CPUs in old modes? (Attempts to save memory are the only reason I could think of, but in general does it have performance issues elsewhere?) |
I did think about adding We could add |
I had an idea the popcnt one would work on real i386, but perhaps not and perhaps the other requirements for that implementation would disable it on i386 anyway. Debian still build for i386 so I was expecting they would notice the difference. However in the light of that guard on the dispatcher just being |
Indeed, while popcnt itself may exist on older CPUs (I didn't check), we only use it in conjunction with a minimum of SSE4 I think. I guess there is potential to use it elsewhere, but I'm not really minded to spend time optimising i386 :) |
Multiarch builds source files for each target, and then merges the results together into a single universal object file. This causes issues with configure tests that attempt to detect architecture-specific features because if they fail on one architecture then they effectively fail for all of them. In htscodecs this caused x86_64 SIMD code to be disabled when trying to build an x86_64 / arm64 universal binary because the related configure checks failed on the arm64 side - and likewise the arm NEON code was disabled due to a failing x86_64 test.
To fix:
#ifdef __x86_64__
and#ifdef __ARM_NEON
to the source code so that architecture-specific code is filtered out at compile time.#ifdef __x86_64__
guards to the x86 SIMD configure checks so that they don't fail for multiarch builds.none
(no option needed),unsupported
(compilation failed both with and without the option), or an option that needs to be used to enable the feature.rANS_static32x16pr_neon.c
unconditionally.#ifdef
checks.The combination of the
#ifdef __x86_64__
guards and revised configure checks means that when building forx86_64
, including as part of a multiarch build, the flags will be enabled and disabled as needed; while on other platforms the test code will be skipped and the configure check will determine that no special compiler flags are needed.Fixes #76