Add bignum_mont{mul,sqr}_p384_neon
, speed improvements/refactoring in tactics
#122
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch adds
bignum_mont{mul,sqr}_p384_neon
which are slightly faster thanbignum_mont{mul,sqr}_p384
.They use SIMD instructions and better scheduling found with SLOTHY. Their correctness is verified using equivalence check w.r.t. specifications of their scalar ops. The new SUBROUTINE lemmas are added to the specification list using
Benchmark results on Graviton2:
Test and benchmark were updated to include these & fix incorrect naming bugs in my previous p256_neon patch.
Also, some speedups in tactics are made:
ARM_STEPS'_AND_ABBREV_TAC
andARM_STEPS'_AND_REWRITE_TAC
.They are tactics for symbolic execution when showing equivalence of two programs after reordering instructions.
ARM_STEPS'_AND_ABBREV_TAC
does symbolic execution of the 'left' program and abbreviates every RHS of newread comp s = RHS
s, meaning that after the tactic is done there are a bunch of equality assumptions whose number increases linearly to the number of instructions.ARM_STEPS'_AND_REWRITE_TAC
then does symbolic execution of the 'right' program and rewrites the results using the assumptions.This means the overall complexity of
ARM_STEPS'_AND_REWRITE_TAC
was quadratic to the number of instructions (# assum * # insts = (# insts)^2). This is fixed to be (close to) linear, by separately maintaining the abbreviations as a list of theorems internally rather than assumptions. This doesn’t mean that the therotical time complexity is now linear, but many tactics insideARM_STEPS'_AND_REWRITE_TAC
that inspect assumptions now run linearly.FIND_HOLE_TAC
FIND_HOLE_TAC
tactic finds the 'hole' in the memory space that can place the machine code that is used in program equivalence. This is done by inspectingnonoverlapping
assumptions, properly segmenting the memory with fixed-width ranges and doing case analysis. Previously the # splitted cases was something like 2^((# segments)^2), but now it is reduced to (# segments)^(#segments). Comparing these two numbers is easier if logarithm is used.Finally, some lemmas in existing
_neon.ml
proofs are updated so that they do not mix usage of '_mc' and '_core_mc'. '_core_mc' is a machine code that is a sub-list of '_mc' retrieved by stripping the callee-save register store/loads as well as the ret instruction.If possible, a lemmas is updated to only use '*_core_mc' because this makes the modular usage of the lemma possible in bigger theorems.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.