aes: rework backends #442

newpavlov · 2024-07-31T16:00:17Z

This PR unifies code between AES-NI and ARM backends and prepares ground for future removal of duplicated definitions of AES types in autodetect, soft, ni, and armv8 modules. Additionally, it allows to quickly change number of blocks processed in parallel by different intrinsics-based backends instead of hardcoding it to 8 blocks.

newpavlov · 2024-07-31T16:11:33Z

aes/src/macros.rs

+            }
+        }
+
+        impl cipher::BlockBackend for &$enc_name {


BlockBackend is implemented for references because its methods work with &mut self. We probably should introduce two separate traits BlockCipherBackend (with &self methods) and BlockModeBackend (with &mut self methods).

newpavlov · 2024-07-31T18:02:53Z

aes/src/ni.rs

+    dec_name = Aes128BackDec,
+    key_size = consts::U16,
+    keys_ty = expand::Aes128RoundKeys,
+    par_size = consts::U15,


Since x86 has only 16 XMM registers (AVX-512 is out of scope for now), processing 15 blocks in parallel on x86 means that each round key will be loaded on each iteration. It maximizes ILP, but introduces additional loads from L1 cache.

For AES-128, 192, and 256 we can process only 5, 3, and 1 block in parallel respectively without reloading some keys. On my laptop the sweet spot seems to be 11 blocks (according to the crate ECB benchmarks ~5% better than the 15 blocks baseline), but it's likely highly dependent on CPU model. We will need additional benchmarks including the CTR mode to find optimal numbers. For now, I decided to use 15 blocks for cleaner assembler. I also considered using inline assembly to work around the stack spilling issue, but it's better to try it in a separate PR.

Generated assembly for AES-128 looks approximately like this: https://rust.godbolt.org/z/or5ccd5da

UPD: After measuring performance a bit more carefully using Criterion, 9 blocks produce the best result, at least on AMD CPUs. For AES-128 and AES-192 similar results are achieved with 11 and 10 blocks respectively, but since 9 blocks result in a bit smaller code, so I updated code to use it. Surprisingly, 8 blocks result in ~5-10% smaller throughput.

newpavlov · 2024-07-31T18:05:50Z

aes/src/armv8.rs

+    dec_name = Aes128BackDec,
+    key_size = consts::U16,
+    keys_ty = expand::Aes128RoundKeys,
+    par_size = consts::U15,


ARMv8 NEON has 32 SIMD registers, so techincally we can process 21, 19, and 17 blocks in parallel for AES-128, 192, and 256 respectively while keeping round keys in registers. But since the code forces inlining, it also balloons binary size, so additional benchmarks are needed.

Generated assembly for AES-128 looks approximately like this: https://rust.godbolt.org/z/EWzPe47c6

This PR splits `BlockBackend` traits into 4 specific traits: `BlockCipherEncBackend`, `BlockCipherDecBackend`, `BlockModeEncBackend`, and `BlockModeDecBackend`. Same for `BlockClosure`. This allows for cipher backends to remove awkard `&mut &backend` juggling (see RustCrypto/block-ciphers#442), makes code a bit easier to read (i.e. `encrypt_blocks` instead of `proc_blocks`), and allows for one backend type to be used for both encryption and decryption. The `impl_simple_block_encdec` macro is removed since we now can implement the backend traits directly on cipher types, which should make implementation crates slightly easier to understand. Additionally, it moves traits to the `block` and `cipher` modules to reduce clutter in the crate root. Later we can add docs to each module to describe the traits in detail.

newpavlov added 3 commits July 31, 2024 18:59

aes: rework backends

b286c09

temporarily allow unused macros

42728f6

Fix zeroize impl

f5cec24

newpavlov commented Jul 31, 2024

View reviewed changes

newpavlov added 2 commits July 31, 2024 20:50

Remove TODO comment

69ead99

Add ignore attribute to armv8 key expansion tests

9c7bcc4

newpavlov requested a review from tarcieri July 31, 2024 17:53

newpavlov marked this pull request as ready for review July 31, 2024 17:54

newpavlov commented Jul 31, 2024

View reviewed changes

newpavlov added 2 commits July 31, 2024 21:09

Change par_size for armv8 backends

0dd075a

fix

898fd85

newpavlov mentioned this pull request Jul 31, 2024

cipher: rework backend traits RustCrypto/traits#1636

Merged

newpavlov added 2 commits August 1, 2024 08:53

Tweak par_size for AES-NI backends

5901c8a

Rename state to blocks

e42ac9e

newpavlov merged commit daac7ea into master Aug 7, 2024
25 checks passed

newpavlov deleted the aes_back_rework branch August 7, 2024 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aes: rework backends #442

aes: rework backends #442

newpavlov commented Jul 31, 2024 •

edited

Loading

newpavlov Jul 31, 2024 •

edited

Loading

newpavlov Jul 31, 2024 •

edited

Loading

newpavlov Jul 31, 2024 •

edited

Loading

aes: rework backends #442

aes: rework backends #442

Conversation

newpavlov commented Jul 31, 2024 • edited Loading

newpavlov Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

newpavlov Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

newpavlov Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

newpavlov commented Jul 31, 2024 •

edited

Loading

newpavlov Jul 31, 2024 •

edited

Loading

newpavlov Jul 31, 2024 •

edited

Loading

newpavlov Jul 31, 2024 •

edited

Loading