Implement Neon SIMD #6

CryZe · 2021-10-13T16:09:47Z

This ports the WASM algorithm over to Aarch64 / ARM Neon. Rust itself isn't quite ready yet, but this should start compiling soon.

Blocked on: rust-lang/stdarch#1230

pthariensflame · 2021-10-13T23:43:45Z

I think those transmutes should probably be vld1qs?

CryZe · 2021-10-14T07:52:50Z

They totally could be, but ptr::read_unaligned also should work just fine across the board (and is platform independent, so that makes the code easier to port to a different platform). Also this issue where vld* instructions are supposedly broken concerns me: rust-lang/stdarch#1227

hkratz · 2021-10-14T10:55:03Z

Actually only the interleaved loads are affected (vld[234]*), using vld1* should be fine as those use ptr::read_unaligned under the hood.

kaffeemonster · 2022-09-03T15:45:15Z

Maybe you want to take a look at my adler32 SIMD implementation for NEON from way back then:
https://github.com/kaffeemonster/zlib/blob/adler32_vec/arm/adler32.c

It also contains some tricks which can be used in other SIMD (x86), mainly vector_chop, a modulo-approx. you can do on most SIMD instruction set. This way you can leave the data in the vector register and don't have to do a full reduction, copy to normal register set, do a modulo, and back to vector register. Also since you then have often 4 or more single sums, your inner loop count is not limited to NMAX (5552).

peterdk · 2023-01-06T13:17:43Z

Is there any progress or status on this? I need adler32 to be fast on arm64. Is this PR not deemed ok? Any plans to finish the support for NEON? I would love to contribute, but no SIMD experience.

mcountryman · 2023-01-20T13:34:16Z

Is there any progress or status on this? I need adler32 to be fast on arm64. Is this PR not deemed ok? Any plans to finish the support for NEON? I would love to contribute, but no SIMD experience.

I'll try running the build again when I have time. Last a checked rust support for some of the operations we need was lacking although this was a while ago

mcountryman · 2023-01-21T13:30:53Z

bump

CryZe · 2023-01-21T13:33:12Z

I'll quickly remove all the nightly conditions that shouldn't apply anymore to aarch64.

This ports the WASM algorithm over to Aarch64 / ARM Neon.

CryZe · 2023-01-21T13:43:54Z

Seems like vdotq_u32 is not stable yet somehow.

CryZe · 2023-01-21T13:46:05Z

Actually the error seems incorrect. The function is stable, but requires dotprod in addition to neon.

mcountryman · 2023-01-21T13:54:46Z

Actually the error seems incorrect. The function is stable, but requires dotprod in addition to neon.

It's been a bit since I looked at this code, but if I remember correct this would require adding an additional #cfg clause to allow compilation on non-dotprod,neon targets.

CryZe · 2023-01-21T14:10:22Z

Mmh, turns out that the dotprod feature is indeed still unstable:

https://github.com/rust-lang/stdarch/blob/master/crates/stdarch-gen/src/main.rs#L1590

peterdk · 2023-01-21T14:14:00Z

Would this dotprod be supported on Android ARM64 devices? Or is it a very obscure instruction? Would it impact performance much if a device doesn't have it? (cq can we still use NEON for the rest?)

mcountryman · 2023-01-21T14:22:37Z

Would this dotprod be supported on Android ARM64 devices? Or is it a very obscure instruction? Would it impact performance much if a device doesn't have it? (cq can we still use NEON for the rest?)

At a glance @CryZe comment indicates we could get it working when compiling with nightly. The alternative is to port the c implementations mentioned above and hope that the intrinsics are stable.

peterdk · 2023-01-25T14:41:12Z

I am willing to offer a bounty of 50 Euro to get this working on ARM64 Android. One for the implementer (@CryZe ?) and one for the owner of this crate.

peterdk · 2023-03-30T20:15:59Z

I now resorted to using libdeflater-sys that wraps the libdeflate C++ library, and provides SIMD accelerated adler32 besides deflate and crc32. It has a dotproduct check and implementations for both with and without dot prod, while using NEON intrinsics. It did speed up my app on Android with 33% compared to using nightly simd-adler32. For now I leave it at that, maybe I find time and motivation in the future to extract that adler32 part from the library into a rust one, to make it somewhat more safe. But it's SIMD, so I doubt much improvements can be done in that regard.

thomcc · 2024-08-05T10:17:33Z

Would this dotprod be supported on Android ARM64 devices? Or is it a very obscure instruction?

It's supported on more modern chips. It is by no means universally present, though.

CryZe force-pushed the neon-simd branch 2 times, most recently from fccf53d to 8733f3c Compare October 13, 2021 16:54

mcountryman marked this pull request as ready for review February 14, 2022 11:11

mcountryman closed this Feb 14, 2022

mcountryman reopened this Feb 14, 2022

mcountryman closed this Jan 21, 2023

mcountryman reopened this Jan 21, 2023

Implement Neon SIMD

eb0b039

This ports the WASM algorithm over to Aarch64 / ARM Neon.

CryZe force-pushed the neon-simd branch from 8733f3c to eb0b039 Compare January 21, 2023 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Neon SIMD #6

Implement Neon SIMD #6

CryZe commented Oct 13, 2021

pthariensflame commented Oct 13, 2021

CryZe commented Oct 14, 2021

hkratz commented Oct 14, 2021 •

edited

Loading

kaffeemonster commented Sep 3, 2022

peterdk commented Jan 6, 2023 •

edited

Loading

mcountryman commented Jan 20, 2023

mcountryman commented Jan 21, 2023

CryZe commented Jan 21, 2023

CryZe commented Jan 21, 2023

CryZe commented Jan 21, 2023

mcountryman commented Jan 21, 2023

CryZe commented Jan 21, 2023

peterdk commented Jan 21, 2023

mcountryman commented Jan 21, 2023

peterdk commented Jan 25, 2023 •

edited

Loading

peterdk commented Mar 30, 2023 •

edited

Loading

thomcc commented Aug 5, 2024

Implement Neon SIMD #6

Are you sure you want to change the base?

Implement Neon SIMD #6

Conversation

CryZe commented Oct 13, 2021

pthariensflame commented Oct 13, 2021

CryZe commented Oct 14, 2021

hkratz commented Oct 14, 2021 • edited Loading

kaffeemonster commented Sep 3, 2022

peterdk commented Jan 6, 2023 • edited Loading

mcountryman commented Jan 20, 2023

mcountryman commented Jan 21, 2023

CryZe commented Jan 21, 2023

CryZe commented Jan 21, 2023

CryZe commented Jan 21, 2023

mcountryman commented Jan 21, 2023

CryZe commented Jan 21, 2023

peterdk commented Jan 21, 2023

mcountryman commented Jan 21, 2023

peterdk commented Jan 25, 2023 • edited Loading

peterdk commented Mar 30, 2023 • edited Loading

thomcc commented Aug 5, 2024

hkratz commented Oct 14, 2021 •

edited

Loading

peterdk commented Jan 6, 2023 •

edited

Loading

peterdk commented Jan 25, 2023 •

edited

Loading

peterdk commented Mar 30, 2023 •

edited

Loading