-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Neon SIMD #6
base: main
Are you sure you want to change the base?
Conversation
fccf53d
to
8733f3c
Compare
I think those |
They totally could be, but |
Actually only the interleaved loads are affected ( |
Maybe you want to take a look at my adler32 SIMD implementation for NEON from way back then: It also contains some tricks which can be used in other SIMD (x86), mainly vector_chop, a modulo-approx. you can do on most SIMD instruction set. This way you can leave the data in the vector register and don't have to do a full reduction, copy to normal register set, do a modulo, and back to vector register. Also since you then have often 4 or more single sums, your inner loop count is not limited to NMAX (5552). |
Is there any progress or status on this? I need adler32 to be fast on arm64. Is this PR not deemed ok? Any plans to finish the support for NEON? I would love to contribute, but no SIMD experience. |
I'll try running the build again when I have time. Last a checked rust support for some of the operations we need was lacking although this was a while ago |
bump |
I'll quickly remove all the nightly conditions that shouldn't apply anymore to aarch64. |
This ports the WASM algorithm over to Aarch64 / ARM Neon.
Seems like |
Actually the error seems incorrect. The function is stable, but requires |
It's been a bit since I looked at this code, but if I remember correct this would require adding an additional |
Mmh, turns out that the |
Would this dotprod be supported on Android ARM64 devices? Or is it a very obscure instruction? Would it impact performance much if a device doesn't have it? (cq can we still use NEON for the rest?) |
At a glance @CryZe comment indicates we could get it working when compiling with nightly. The alternative is to port the c implementations mentioned above and hope that the intrinsics are stable. |
I am willing to offer a bounty of 50 Euro to get this working on ARM64 Android. One for the implementer (@CryZe ?) and one for the owner of this crate. |
I now resorted to using libdeflater-sys that wraps the libdeflate C++ library, and provides SIMD accelerated adler32 besides deflate and crc32. It has a dotproduct check and implementations for both with and without dot prod, while using NEON intrinsics. It did speed up my app on Android with 33% compared to using nightly simd-adler32. For now I leave it at that, maybe I find time and motivation in the future to extract that adler32 part from the library into a rust one, to make it somewhat more safe. But it's SIMD, so I doubt much improvements can be done in that regard. |
It's supported on more modern chips. It is by no means universally present, though. |
This ports the WASM algorithm over to Aarch64 / ARM Neon. Rust itself isn't quite ready yet, but this should start compiling soon.
Blocked on: rust-lang/stdarch#1230