Skip to content

Commit

Permalink
Add bignum_mont{mul,sqr}_p384_neon, speed improvements/refactoring …
Browse files Browse the repository at this point in the history
…in tactics

This patch adds `bignum_mont{mul,sqr}_p384_neon` which are slightly faster than
`bignum_mont{mul,sqr}_p384`.
They use SIMD instructions and better scheduling found with SLOTHY.
Their correctness is verified using equivalence check w.r.t. specifications of their scalar ops.
The new SUBROUTINE lemmas are added to the specification list using
```
./tools/collect-specs.sh arm >arm/proofs/specifications.txt
```

Benchmark results on Graviton2:
```
bignum_montsqr_p384             :    58.6 ns each (var  0.3%, corr  0.06) =   17053295 ops/sec
bignum_montsqr_p384_neon        :    52.6 ns each (var  0.4%, corr -0.04) =   19017192 ops/sec
bignum_montmul_p384             :    72.9 ns each (var  0.2%, corr -0.02) =   13726633 ops/sec
bignum_montmul_p384_neon        :    68.1 ns each (var  0.3%, corr  0.02) =   14680905 ops/sec
```

Test and benchmark were updated to include these & fix incorrect naming bugs
in my previous p256_neon patch.

Also, some speedups in tactics are made:

1. `ARM_STEPS'_AND_ABBREV_TAC` and `ARM_STEPS'_AND_REWRITE_TAC`.

They are tactics for symbolic execution when showing equivalence of two programs
after reordering instructions.
`ARM_STEPS'_AND_ABBREV_TAC` does symbolic execution of the 'left' program and
abbreviates every RHS of new `read comp s = RHS`s,
meaning that after the tactic is done there are a bunch of equality assumptions whose
number increases linearly to the number of instructions.
`ARM_STEPS'_AND_REWRITE_TAC` then does symbolic execution of the 'right' program
and rewrites the results using the assumptions.
This means the overall complexity of `ARM_STEPS'_AND_REWRITE_TAC` was quadratic
to the number of instructions (# assum * # insts = (# insts)^2).
This is fixed to be (close to) linear, by separately maintaining the
abbreviations as a list of theorems internally rather than assumptions.
This doesn’t mean that the therotical time complexity is now linear,
but many tactics inside `ARM_STEPS'_AND_REWRITE_TAC` that inspect assumptions
now run linearly.

2. `FIND_HOLE_TAC`

`FIND_HOLE_TAC` tactic finds the 'hole' in the memory space that can place the
machine code that is used in program equivalence. This is done by inspecting
`nonoverlapping` assumptions, properly segmenting the memory with fixed-width
ranges and doing case analysis. Previously the # splitted cases was something
like 2^((# segments)^2), but now it is reduced to (# segments)^(#segments).
Comparing these two numbers is easier if logarithm is used.

Finally, some lemmas in existing `_neon.ml` proofs are updated so that
they do not mix usage of '*_mc' and '*_core_mc'. '*_core_mc' is a machine
code that is a sub-list of '*_mc' retrieved by stripping the callee-save register
store/loads as well as the ret instruction.
If possible, a lemmas is updated to only use '*_core_mc' because this
makes the modular usage of the lemma possible in bigger theorems.
  • Loading branch information
aqjune-aws committed May 9, 2024
1 parent b9266e7 commit d3a7b19
Show file tree
Hide file tree
Showing 27 changed files with 4,613 additions and 718 deletions.
2 changes: 2 additions & 0 deletions arm/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -279,8 +279,10 @@ BIGNUM_OBJ = curve25519/bignum_add_p25519.o \
p384/bignum_mod_p384_6.o \
p384/bignum_montmul_p384.o \
p384/bignum_montmul_p384_alt.o \
p384/bignum_montmul_p384_neon.o \
p384/bignum_montsqr_p384.o \
p384/bignum_montsqr_p384_alt.o \
p384/bignum_montsqr_p384_neon.o \
p384/bignum_mux_6.o \
p384/bignum_neg_p384.o \
p384/bignum_nonzero_6.o \
Expand Down
2 changes: 2 additions & 0 deletions arm/allowed_asm
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
: csel$
: cset$
: csetm$
: dup$
: eor$
: extr$
: ldp$
Expand All @@ -38,6 +39,7 @@
: sbc$
: sbcs$
: shl$
: shrn$
: stp$
: str$
: strb$
Expand Down
2 changes: 2 additions & 0 deletions arm/p384/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,10 @@ OBJ = bignum_add_p384.o \
bignum_mod_p384_6.o \
bignum_montmul_p384.o \
bignum_montmul_p384_alt.o \
bignum_montmul_p384_neon.o \
bignum_montsqr_p384.o \
bignum_montsqr_p384_alt.o \
bignum_montsqr_p384_neon.o \
bignum_mux_6.o \
bignum_neg_p384.o \
bignum_nonzero_6.o \
Expand Down
Loading

0 comments on commit d3a7b19

Please sign in to comment.