Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reorder struct for better Neon codegen.
I unexpectedly discovered that we can reduce our splat-copy ops by one instruction by swapping these struct fields. This is apparently because we can wedge a right-shift by 32 into an add instruction. (`uxtw` means zero-extend.) Before (splat_2_constants): bcf4: 28 04 40 f9 ldr x8, [x1, #8] bcf8: 09 fd 60 d3 lsr x9, x8, #32 <--- eliminated bcfc: 30 01 27 1e fmov s16, w9 bd00: 10 06 04 4e dup.4s v16, v16[0] bd04: 88 40 28 8b add x8, x4, w8, uxtw <--- changed bd08: 10 41 00 ad stp q16, q16, [x8] bd0c: 25 0c 41 f8 ldr x5, [x1, #16]! bd10: a0 00 1f d6 br x5 After: baa0: 28 04 40 f9 ldr x8, [x1, #8] baa4: 10 01 27 1e fmov s16, w8 baa8: 10 06 04 4e dup.4s v16, v16[0] baac: 88 80 48 8b add x8, x4, x8, lsr #32 bab0: 10 41 00 ad stp q16, q16, [x8] bab4: 25 0c 41 f8 ldr x5, [x1, #16]! bab8: a0 00 1f d6 br x5 (This also saves an op on Haswell!) Change-Id: Icea7196b42bc4057d697bbf049d368193d46f27e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/679719 Commit-Queue: John Stiles <[email protected]> Auto-Submit: John Stiles <[email protected]> Reviewed-by: Brian Osman <[email protected]> Commit-Queue: Brian Osman <[email protected]>
- Loading branch information