-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rANS_static32x16pr_neon.c doesn't build on AArch32 #81
Comments
Is this building from a release, or from the develop branch at github? I assume this is related to the changes in samtools/htslib#1587, but I cannot test it as I don't have any way of testing on an AArch32 system (or does See also #78. It sounds like Rob's changes are to use compile time |
This was found on 1.17, your latest release. ARMv7 systems do generally support NEON, but NEON on AArch32 is only a subset of NEON on AArch64. The ARM intrinsics guides specify which NEON intrinsics are available on both and which are AArch64 only. The problem is that many of the intrinsics your code uses are available for AArch64 only. The macro |
Thanks. It sounds like we need to check |
Yes, correct. Or rework the code so it only uses intrinsics that also work on AArch32. Or add polyfill for the missing intrinsics when running on AArch32. |
PR #82 should enable it to compiler, but the difference is trivial. I don't have SIMD for Aarch32 as that's complex and probably also fruitless, but it can obviously run the scalar implementation just fine. Unfortunately I'm not able to test it as AWS Arm is Aarch64 only. |
NEON alone isn't a sufficient guard as AArch32 also has some limited Neon capabilities. While we could no doubt have a 32-bit alternative, for now this is the simple fix and let aarch32 use the scalar implementation. Doing a 32-bit neon is a complex task and without having access to the hardware it's pretty much impossible. I also wouldn't have high hopes for any significant speed gains over scalar with only half the lanes available. Fixes samtools#81
NEON alone isn't a sufficient guard as AArch32 also has some limited Neon capabilities. While we could no doubt have a 32-bit alternative, for now this is the simple fix and let aarch32 use the scalar implementation. Doing a 32-bit neon is a complex task and without having access to the hardware it's pretty much impossible. I also wouldn't have high hopes for any significant speed gains over scalar with only half the lanes available. Fixes samtools#81
This matches the changes I plan to deploy as local patches to the FreeBSD port of this package. I've also patched the NEON detection in |
The neon tests in configure were removed in #78 as they conflict with MacOS multi-arch builds. The Mac basically has a compiler that performs 2 compilations, for ARM and Amd64 CPUs, so any tests of compiler capability (like does it support a -mavx2 option) will break. So instead it all needs to be done with on-the-fly CPU detection via ifdefs instead. |
@jkbonfield Good to hear. This is LGTM then from my side. |
Your code base has NEON-optimised code in
rANS_static32x16pr_neon.c
. Unfortunately, this code only builds on AArch64 (arm64). It does not build on AArch32 as many of the intrinsics used therein are not supported in AArch32 mode:Please adapt your build system to build this code on AArch64 only. Alternatively, fix the code so these files compile on AArch32, too.
The text was updated successfully, but these errors were encountered: