Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SIMD configure tests work for MacOS multiarch builds #78

Merged
merged 2 commits into from
Mar 20, 2023

Conversation

daviesrob
Copy link
Member

Multiarch builds source files for each target, and then merges the results together into a single universal object file. This causes issues with configure tests that attempt to detect architecture-specific features because if they fail on one architecture then they effectively fail for all of them. In htscodecs this caused x86_64 SIMD code to be disabled when trying to build an x86_64 / arm64 universal binary because the related configure checks failed on the arm64 side - and likewise the arm NEON code was disabled due to a failing x86_64 test.

To fix:

  • Add #ifdef __x86_64__ and #ifdef __ARM_NEON to the source code so that architecture-specific code is filtered out at compile time.
  • Add #ifdef __x86_64__ guards to the x86 SIMD configure checks so that they don't fail for multiarch builds.
  • Refactor the x86 SIMD tests so that instead of checking if a specific compiler option works, they find out if it is actually needed to build the code. They do this by attempting to compile first without and then (if that fails) with the option. The result will either be none (no option needed), unsupported (compilation failed both with and without the option), or an option that needs to be used to enable the feature.
  • Remove the NEON configure test, and compile rANS_static32x16pr_neon.c unconditionally.
  • Add dummy symbols to some source files to avoid "empty translation unit" errors if the code they contain has all been skipped due to #ifdef checks.

The combination of the #ifdef __x86_64__ guards and revised configure checks means that when building for x86_64, including as part of a multiarch build, the flags will be enabled and disabled as needed; while on other platforms the test code will be skipped and the configure check will determine that no special compiler flags are needed.

Fixes #76

Multiarch builds compile source files twice for different
architectures and then joins the object files together using
'lipo'.  This can cause architecture-specific configure tests,
like the ones for SIMD features, to go wrong because while the
source compiles for one architecture, it may fail on the other
causing the test as a whole to incorrectly report that the feature
is not supported.

To fix this, #ifdef guards are added to both the configure checks
and the source code being built so that they won't fail on the
wrong architecture.  The configure tests are also changed so that
instead of testing to see if a particular flag works, they instead
check to see what flags are needed to build code using a
particular SIMD feature.  Due to the #ifdef guards, the tests
will work and give an empty result on the wrong architecture (the
test result will be reported as "none").  They will also do this
if no flag is needed, e.g. because other options enable the
feature already.  If the flag is needed then they will report it
and add if to CFLAGS.  Finally if the test fails completely
even though trying the correct architecture they will report
"unsupported" and disable the SIMD code.

In the multiarch case, the build on the "wrong" architecture should
now do nothing, which the "right" one will determine the flags
needed, which will be reported as the end result.

The test for NEON didn't need to try any specific flags.  As the
NEON code is selected by #ifdef __ARM_NEON already, the NEON
test has been removed and rANS_static32x16pr_neon.c is now
compiled unconditionally.

As some of the source files may now be compiled but essentially
empty due to platform #ifdef guards, some dummy symbols have
been added to avoid complaints about empty translation units.

The AX_CHECK_COMPILE_FLAG() autoconf macro is replaced with
HTS_CHECK_COMPILE_FLAGS_NEEDED() which implements the flag-checking
tests.
@jmarshall
Copy link
Member

On the Intel side, what about __i386__?

@jkbonfield
Copy link
Collaborator

jkbonfield commented Mar 16, 2023

We did discuss this briefly, but concluded the only i386 machines with SIMD support are basically modern CPUs running in 32-bit mode. The code should work fine in 32-bit mode, but it won't gain access to the accelerated implementations. I don't really know how big an issue this is, and whether we actually care given it's a rather unusual scenario.

Do you have a better feeling for the reasoning behind using new CPUs in old modes? (Attempts to save memory are the only reason I could think of, but in general does it have performance issues elsewhere?)

@daviesrob
Copy link
Member Author

I did think about adding __i386__, but then discovered that the accelerated code is currently only called on x86_64. As James mentions, old CPUs won't have the accelerated instructions anyway so it was simpler to just use __x86_64__ everywhere.

We could add __i386__ if there's demand for it, but I think it would be best done as a separate project.

@jmarshall
Copy link
Member

I had an idea the popcnt one would work on real i386, but perhaps not and perhaps the other requirements for that implementation would disable it on i386 anyway.

Debian still build for i386 so I was expecting they would notice the difference. However in the light of that guard on the dispatcher just being __x86_64__, never mind.

@jkbonfield
Copy link
Collaborator

Indeed, while popcnt itself may exist on older CPUs (I didn't check), we only use it in conjunction with a minimum of SSE4 I think.

I guess there is potential to use it elsewhere, but I'm not really minded to spend time optimising i386 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIMD code selection and configury is incorrect for multi-arch compilation
3 participants