-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS-LC s2n-bignum update 2024-07-22 #1718
AWS-LC s2n-bignum update 2024-07-22 #1718
Conversation
This patch adds constant-time table-lookup functions (`bignum_copy_row_from_table*`) and their proofs. This patch only contains its AArch64 version, and the x86 version will follow later. The failure of proving its x86 version seems to be related to handling negative offsets, and (if this is right) this can be avoided by simply proving positive offsets. I will record this as a Github issue with a promise that the x86 scalar version will be provided after the RSA related things are finished. This patch contains four table-lookup functions: 1. `bignum_copy_row_from_table`: a lookup for a generic table size 2. `bignum_copy_row_from_table_8n_neon`: a Neon version for a table whose width is a multiple of 8 3. `bignum_copy_row_from_table_16_neon`: Neon implementation of a table whose width is 16*64=1024 bits 4. `bignum_copy_row_from_table_32_neon`: Neon implementation of a table whose width is 32*64=2048 bits The last two versions are initially written by Hanno Becker. To successfully compile and run `test` and `benchmark` in x86, the scalar `bignum_copy_row_from_table` function is processed as a way similar to Neon functions. s2n-bignum original commit: awslabs/s2n-bignum@f1ad23c
Add bignum_copy_row_from_table and its Neon-variants for AArch64 s2n-bignum original commit: awslabs/s2n-bignum@50aa85b
s2n-bignum original commit: awslabs/s2n-bignum@74d34c3
This implements the point compression encoding to a byte array from https://datatracker.ietf.org/doc/html/rfc8032#section-5.1.2 as function "edwards25519_encode". It assumes the input is a point (x,y) on the edwards25519 curve, with coordinates reduced mod p_25519 = 2^255 - 19, and does not check any of that. s2n-bignum original commit: awslabs/s2n-bignum@67430be
s2n-bignum original commit: awslabs/s2n-bignum@b49b4f9
This implements point decoding from a 256-bit little-endian byte sequence to a point (x,y) on the edwards25519 curve as specified in https://datatracker.ietf.org/doc/html/rfc8032#section-5.1.3 The function returns 0 for success and 1 for failure, the latter meaning that the input is not the encoding of any edwards25519 point. s2n-bignum original commit: awslabs/s2n-bignum@97f7493
The function bignum_mod_n25519 performs reduction of an input of any size (k digits) modulo the order of the curve25519/edwards25519 basepoint, n_25519 = 2^252 + 27742317777372353535851937790883648493. It generalizes bignum_mod_n25519_4, which is the special case of 4-digit (256-bit) inputs. s2n-bignum original commit: awslabs/s2n-bignum@e23fd30
s2n-bignum original commit: awslabs/s2n-bignum@2c8e273
This replaces the inlined variant of "bignum_modinv" with code from "bignum_inv_p25519" in all "curve25519_" functions returning an affine point and hence using modular inverse. There are also a few consequential changes related to the slightly different amount of temporary storage needed by this function. s2n-bignum original commit: awslabs/s2n-bignum@777d574
…ck_no Document that x25519 function does not implement zero-check s2n-bignum original commit: awslabs/s2n-bignum@5c4b15a
s2n-bignum original commit: awslabs/s2n-bignum@73ec55a
This replaces the inlined variant of "bignum_modinv" with code from "bignum_inv_p25519" in all "edwards25519_scalarmul*" functions. Again, there are consequential changes related to the slightly different amount of temporary storage needed by bignum_inv_p25519. s2n-bignum original commit: awslabs/s2n-bignum@7e7b18e
Ed25519 support and related updates s2n-bignum original commit: awslabs/s2n-bignum@db8409d
s2n-bignum original commit: awslabs/s2n-bignum@58a6bdf
Add BFM, BIC, FCSEL, INS, SUB, TRN1, TRN2, USHR, ZIP2 to ARM model s2n-bignum original commit: awslabs/s2n-bignum@f1caaf1
In general, BOUNDER_RULE now directly handles operations over Z and N, assuming an outer real_of_int / real_of_num cast into R (this is also automated in the tactic form BOUNDER_TAC). In particular, this change can greatly improve bounds for terms involving integer or natural number division and remainder (DIV, div, MOD and rem) as well as cutoff subtraction over N. There is also now support for conditionals, though the condition is not used as extra context, simply being the basis for a case split. This update rolls in various trivial typographic fixes in comments. s2n-bignum original commit: awslabs/s2n-bignum@ccefa2a
s2n-bignum original commit: awslabs/s2n-bignum@e6ef86f
…5519 Avoid duplicate labels in ed25519 x86 implementation s2n-bignum original commit: awslabs/s2n-bignum@f629458
s2n-bignum original commit: awslabs/s2n-bignum@c1fbdb5
64-bit SIMD regs in ARM model, better BOUNDER_RULE, slow-ARM field optimizations s2n-bignum original commit: awslabs/s2n-bignum@06781d2
s2n-bignum original commit: awslabs/s2n-bignum@d93670d
…_input_const Make _input_ parameter to ed25519 decode function const s2n-bignum original commit: awslabs/s2n-bignum@4097178
* Allow MIT-0 license as well as Apache-2.0 and ISC * Add appropriate year range to MIT-0 license s2n-bignum original commit: awslabs/s2n-bignum@48fb153
This completely changes the implementation of ARM curve25519_x25519 and curve25519_x25519_byte (not the _alt forms, which remain faster on their target microarchitectures) to a base-25.5 unsaturated version with interleaved integer and SIMD operations, the inner loop closely following Emil Lenngren's implementation described in the paper https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf and available here: https://github.com/Emill/X25519-AArch64 A version of this code was generated by SLOTHY from the reorganized implementation by Abdulrahman, Becker, Kannwischer and Klein here: https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s as described in the associated paper https://eprint.iacr.org/2022/1303.pdf with some additional annotations for use in the formal proof. The final modular inverse computation reverts to the usual saturated representation and s2n-bignum's divstep-based inverse function. s2n-bignum original commit: awslabs/s2n-bignum@fc0b9bf
s2n-bignum original commit: awslabs/s2n-bignum@f82da8f
Lenngren-based X25519 for non-alt ARM code s2n-bignum original commit: awslabs/s2n-bignum@57eb68a
Enable testing bignum_copy_row_from_table on x86 s2n-bignum original commit: awslabs/s2n-bignum@26bfe44
This simplifies the remaining Montgomery ladder implementations by avoiding the special code for zero handling, since it is not actually necessary given the behavior of the modular inverse in this case. In addition, the proofs have been tidied up a bit, factoring out the basic mathematics so that the loop invariant becomes simpler. The update also fixes a README typo pointed out by Dan Bernstein and removes a couple of stray comments arising from SLOTHY output in the Lenngren-derived X25519 code. s2n-bignum original commit: awslabs/s2n-bignum@e14394d
This patch performs a few syntactic updates to make AWS-LC's delocator work. s2n-bignum original commit: awslabs/s2n-bignum@3b4f73c
Update curve25519_x25519{_byte} to make AWS-LC's delocator work s2n-bignum original commit: awslabs/s2n-bignum@88324d8
s2n-bignum original commit: awslabs/s2n-bignum@729a27e
The code now handles specially the case where P1 = (x,y,z) is the point at infinity, i.e. has z = 0. It then returns the other point P2 augmented (since that is in affine coordinates, this being mixed addition) with z = 1 or its Montgomery equivalent to give the more desirable result 0 + P2 = P2. The selection is constant-time as usual with a single code path. s2n-bignum original commit: awslabs/s2n-bignum@72ccfda
This is analogous to the earlier changes for mixed addition. In a point addition operation P1 + P2, the cases where P1 = 0 or P2 = 0 are handled specially (though of course using constant-time selection) as 0 + P2 = P2 and P1 + 0 = P1. More precisely, writing P1 = (x1,y1,z1) and P2 = (x2,y2,z2), the special-case logic is triggered when precisely *one* of z1 = 0 or z2 = 0 holds; in the case that both z1 = 0 and z2 = 0 the standard computation is followed and yields the "right" result (one with its z coordinate also zero). s2n-bignum original commit: awslabs/s2n-bignum@061ea51
As with the earlier update for doublings, the Jacobian point adidtion and mixed addition operations for the curves P-256, P-384, P-521, secp256k1 and SM2 now all have the usual two versions targeting different microarchitectures, one of them called "_alt", following the general s2n-bignum convention. The "_alt" forms for ARM now present are just renamed versions of the originals (which were based on "_alt" field operations), with the new code taking over the old non-alt name. For x86 the non-alt ones are the same as before and the "_alt" forms are new. s2n-bignum original commit: awslabs/s2n-bignum@acd4fd3
Complete improvements to Weierstrass point additions s2n-bignum original commit: awslabs/s2n-bignum@b9266e7
…in tactics This patch adds `bignum_mont{mul,sqr}_p384_neon` which are slightly faster than `bignum_mont{mul,sqr}_p384`. They use SIMD instructions and better scheduling found with SLOTHY. Their correctness is verified using equivalence check w.r.t. specifications of their scalar ops. The new SUBROUTINE lemmas are added to the specification list using ``` ./tools/collect-specs.sh arm >arm/proofs/specifications.txt ``` Benchmark results on Graviton2: ``` bignum_montsqr_p384 : 58.6 ns each (var 0.3%, corr 0.06) = 17053295 ops/sec bignum_montsqr_p384_neon : 52.6 ns each (var 0.4%, corr -0.04) = 19017192 ops/sec bignum_montmul_p384 : 72.9 ns each (var 0.2%, corr -0.02) = 13726633 ops/sec bignum_montmul_p384_neon : 68.1 ns each (var 0.3%, corr 0.02) = 14680905 ops/sec ``` Test and benchmark were updated to include these & fix incorrect naming bugs in my previous p256_neon patch. Also, some speedups in tactics are made: 1. `ARM_STEPS'_AND_ABBREV_TAC` and `ARM_STEPS'_AND_REWRITE_TAC`. They are tactics for symbolic execution when showing equivalence of two programs after reordering instructions. `ARM_STEPS'_AND_ABBREV_TAC` does symbolic execution of the 'left' program and abbreviates every RHS of new `read comp s = RHS`s, meaning that after the tactic is done there are a bunch of equality assumptions whose number increases linearly to the number of instructions. `ARM_STEPS'_AND_REWRITE_TAC` then does symbolic execution of the 'right' program and rewrites the results using the assumptions. This means the overall complexity of `ARM_STEPS'_AND_REWRITE_TAC` was quadratic to the number of instructions (# assum * # insts = (# insts)^2). This is fixed to be (close to) linear, by separately maintaining the abbreviations as a list of theorems internally rather than assumptions. This doesn’t mean that the therotical time complexity is now linear, but many tactics inside `ARM_STEPS'_AND_REWRITE_TAC` that inspect assumptions now run linearly. 2. `FIND_HOLE_TAC` `FIND_HOLE_TAC` tactic finds the 'hole' in the memory space that can place the machine code that is used in program equivalence. This is done by inspecting `nonoverlapping` assumptions, properly segmenting the memory with fixed-width ranges and doing case analysis. Previously the # splitted cases was something like 2^((# segments)^2), but now it is reduced to (# segments)^(#segments). Comparing these two numbers is easier if logarithm is used. Finally, some lemmas in existing `_neon.ml` proofs are updated so that they do not mix usage of '*_mc' and '*_core_mc'. '*_core_mc' is a machine code that is a sub-list of '*_mc' retrieved by stripping the callee-save register store/loads as well as the ret instruction. If possible, a lemmas is updated to only use '*_core_mc' because this makes the modular usage of the lemma possible in bigger theorems. s2n-bignum original commit: awslabs/s2n-bignum@d3a7b19
Add `bignum_mont{mul,sqr}_p384_neon`, speed improvements/refactoring in tactics s2n-bignum original commit: awslabs/s2n-bignum@cbef866
This patch adds `bignum_mont{sqr,mul}_p521_neon`. ``` bignum_montsqr_p521 : 114.7 ns each (var 0.2%, corr 0.06) = 8720010 ops/sec bignum_montsqr_p521_neon : 83.8 ns each (var 0.4%, corr -0.04) = 11926387 ops/sec bignum_montmul_p521 : 130.8 ns each (var 0.2%, corr -0.00) = 7644702 ops/sec bignum_montmul_p521_neon : 111.4 ns each (var 0.2%, corr 0.04) = 8978421 ops/sec ``` The new subroutine specs are added to specification.txt, and test as well as benchmark are updated. Modular squaring/multiplication functions are not included in this patch. This patch also contains the following updates: - A tactic for showing equivalence of loops is added (the tactic is not used yet). - Definitions for input state equivalence are canonicalized as `.. /\ (?a. read c1 s = a /\ read c1 s' = a /\ (?b. read c2 s = b /\ read c2 s' = b /\ ( ... )))` - Minor buggy behaviors in equiv tactics are fixed and performance improvements done s2n-bignum original commit: awslabs/s2n-bignum@65f046e
Add `bignum_mont{sqr,mul}_p521_neon` s2n-bignum original commit: awslabs/s2n-bignum@e6ac9bd
s2n-bignum original commit: awslabs/s2n-bignum@e600752
This adds `bignum_{sqr,mul}_p521_neon` and their proofs. The new subroutine specs are added to specification.txt, and test as well as benchmark are updated. Benchmark results on GV2 are: ``` bignum_mul_p521 : 135.1 ns each (var 0.2%, corr -0.01) = 7404184 ops/sec bignum_mul_p521_neon : 115.5 ns each (var 0.3%, corr 0.00) = 8660108 ops/sec bignum_sqr_p521 : 108.9 ns each (var 0.2%, corr 0.08) = 9184994 ops/sec bignum_sqr_p521_neon : 78.7 ns each (var 0.3%, corr 0.06) = 12708368 ops/sec ``` s2n-bignum original commit: awslabs/s2n-bignum@02df8e4
s2n-bignum original commit: awslabs/s2n-bignum@b559a56
s2n-bignum original commit: awslabs/s2n-bignum@96086bb
P-256 scalar multiplication and related tweaks s2n-bignum original commit: awslabs/s2n-bignum@2237fe8
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1718 +/- ##
==========================================
- Coverage 78.35% 78.35% -0.01%
==========================================
Files 573 573
Lines 96065 96059 -6
Branches 13764 13771 +7
==========================================
- Hits 75270 75263 -7
+ Misses 20195 20194 -1
- Partials 600 602 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got two differences with this PR when following the runbook:
- third_party/s2n-bignum/arm/generic/bignum_copy_row_from_table_16_neon.S
- third_party/s2n-bignum/arm/generic/bignum_copy_row_from_table_32_neon.S
7ccbc44
to
8c5996d
Compare
8c5996d
to
a4f3e5a
Compare
## What's Changed * Added options to x509 tool by @ecdeye in #1696 * Add support to detect Neoverse V2 cores by @andrewhop in #1706 * Move OCSP functions for Ruby out of internal.h by @samuel40791765 in #1704 * Add aes-256-xts to EVP_get_cipherbyname by @torben-hansen in #1707 * Match using CMAKE_SYSTEM_PROCESSOR_LOWER by @justsmth in #1709 * Update MySQL to 9.0.0 by @skmcgrail in #1685 * [EC] Unify scalar multiplication for P-256/384/521 by @dkostic in #1693 * Adds const qualifier to ciphertext parameter in EVP_PKEY_decapsulate by @maddeleine in #1713 * Upstream merge 2024 06 24 by @nebeid in #1661 * NIST SP 800-108r1-upd1: KDF Counter Implementation by @skmcgrail in #1644 * Upstream merge 2024 07 09 by @nebeid in #1694 * Design for support of HMAC precomputed keys by @fabrice102 in #1574 * Fix for select point from table in ec_nistp scalar_mul by @dkostic in #1719 * X509toolcomparison by @ecdeye in #1714 * AWS-LC s2n-bignum update 2024-07-22 by @dkostic in #1718 * Add OpenVPN to CI by @smittals2 in #1705 * Lower required Go version, add CI test for specific version by @andrewhop in #1717 * ec2-test-framework enhancements and graviton 4 testing by @samuel40791765 in #1715 * sha + chacha: Move AArch64/X86-64 dispatching to C. by @justsmth in #1625 * Show number of pruned ec2 instances in dashboard by @samuel40791765 in #1728 * rsa and md5 tools by @ecdeye in #1722 * FIPS 203 IPD update: ML-KEM-IPD-768 and ML-KEM-IPD-1024 by @jakemas in #1724 * bump mysql CI to 9.0.1 by @samuel40791765 in #1727 * Support utility OCSP request functions by @samuel40791765 in #1708 * add support for OCSP_SINGLERESP functions by @samuel40791765 in #1703
Issues:
N/A
Description of changes:
Latest update from s2n-bignum.
Call-outs:
Point out areas that need special attention or support during the review process. Discuss architecture or design changes.
Testing:
How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.