Aws lc s2n bignum update 2024 03 06 #1478

aqjune-aws · 2024-03-06T17:48:44Z

Description of changes:

This pulls the latest changes in s2n-bignum which are:

Performance improvement in curve25519_x25519{_bytes}.s
Copyright updates (adds MIT-0)

Details of the performance improvement in x25519 is as follows, which is also described in the message of corresponding commit:

This completely changes the implementation of ARM curve25519_x25519
and curve25519_x25519_byte (not the _alt forms, which remain faster
on their target microarchitectures) to a base-25.5 unsaturated version
with interleaved integer and SIMD operations, the inner loop closely
following Emil Lenngren's implementation described in the paper

  https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf

and available here:

  https://github.com/Emill/X25519-AArch64

A version of this code was generated by SLOTHY from the reorganized
implementation by Abdulrahman, Becker, Kannwischer and Klein here:

 https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s

as described in the associated paper

  https://eprint.iacr.org/2022/1303.pdf

with some additional annotations for use in the formal proof. The
final modular inverse computation reverts to the usual saturated
representation and s2n-bignum's divstep-based inverse function.

./tool/bssl speed -filter 25519 shows 74% speedup on Graviton 2 (c6g.16xlarge):

Did 13000 Curve25519 arbitrary point multiplication operations in 1010330us (12867.1 ops/sec)
->
Did 23000 Curve25519 arbitrary point multiplication operations in 1028533us (22361.9 ops/sec)

Also, this patch updates delegate.go to deal with the condition (eq, ne, ...) of fcsel .

The commands in the internal Quip document was used. PATHS_TO_KEEP remains unchanged since this PR only contains updates to already existing s2n-bignum assembly files. The used PATHS_TO_KEEP definition is:

PATHS_TO_KEEP="\
./arm/p384 ./x86_att/p384 ./arm/p521 ./x86_att/p521 \
./arm/fastmul/bignum_emontredc_8n.S \
./arm/fastmul/bignum_kmul_16_32.S \
./arm/fastmul/bignum_kmul_32_64.S \
./arm/fastmul/bignum_ksqr_16_32.S \
./arm/fastmul/bignum_ksqr_32_64.S \
./arm/generic/bignum_ge.S \
./arm/generic/bignum_mul.S \
./arm/generic/bignum_optsub.S \
./arm/generic/bignum_sqr.S \
./x86_att/curve25519/curve25519_x25519.S \
./x86_att/curve25519/curve25519_x25519base.S \
./x86_att/curve25519/curve25519_x25519_alt.S \
./x86_att/curve25519/curve25519_x25519base_alt.S  \
./x86_att/curve25519/bignum_neg_p25519.S \
./x86_att/curve25519/bignum_mod_n25519.S  \
./x86_att/curve25519/edwards25519_decode.S  \
./x86_att/curve25519/edwards25519_decode_alt.S  \
./x86_att/curve25519/edwards25519_encode.S  \
./x86_att/curve25519/edwards25519_scalarmulbase.S  \
./x86_att/curve25519/edwards25519_scalarmulbase_alt.S  \
./x86_att/curve25519/edwards25519_scalarmuldouble.S  \
./x86_att/curve25519/edwards25519_scalarmuldouble_alt.S  \
./arm/curve25519/curve25519_x25519.S \
./arm/curve25519/curve25519_x25519base.S \
./arm/curve25519/curve25519_x25519_alt.S \
./arm/curve25519/curve25519_x25519base_alt.S \
./arm/curve25519/curve25519_x25519_byte.S \
./arm/curve25519/curve25519_x25519base_byte.S \
./arm/curve25519/curve25519_x25519_byte_alt.S \
./arm/curve25519/curve25519_x25519base_byte_alt.S \
./arm/curve25519/bignum_neg_p25519.S \
./arm/curve25519/bignum_mod_n25519.S  \
./arm/curve25519/edwards25519_decode.S  \
./arm/curve25519/edwards25519_decode_alt.S  \
./arm/curve25519/edwards25519_encode.S  \
./arm/curve25519/edwards25519_scalarmulbase.S  \
./arm/curve25519/edwards25519_scalarmulbase_alt.S  \
./arm/curve25519/edwards25519_scalarmuldouble.S  \
./arm/curve25519/edwards25519_scalarmuldouble_alt.S \
./include/_internal_s2n_bignum.h"

Testing:

./tool/bssl speed -filter 25519

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

In general, BOUNDER_RULE now directly handles operations over Z and N, assuming an outer real_of_int / real_of_num cast into R (this is also automated in the tactic form BOUNDER_TAC). In particular, this change can greatly improve bounds for terms involving integer or natural number division and remainder (DIV, div, MOD and rem) as well as cutoff subtraction over N. There is also now support for conditionals, though the condition is not used as extra context, simply being the basis for a case split. This update rolls in various trivial typographic fixes in comments. s2n-bignum original commit: awslabs/s2n-bignum@ccefa2a

s2n-bignum original commit: awslabs/s2n-bignum@e6ef86f

…5519 Avoid duplicate labels in ed25519 x86 implementation s2n-bignum original commit: awslabs/s2n-bignum@f629458

s2n-bignum original commit: awslabs/s2n-bignum@c1fbdb5

64-bit SIMD regs in ARM model, better BOUNDER_RULE, slow-ARM field optimizations s2n-bignum original commit: awslabs/s2n-bignum@06781d2

s2n-bignum original commit: awslabs/s2n-bignum@d93670d

…_input_const Make _input_ parameter to ed25519 decode function const s2n-bignum original commit: awslabs/s2n-bignum@4097178

* Allow MIT-0 license as well as Apache-2.0 and ISC * Add appropriate year range to MIT-0 license s2n-bignum original commit: awslabs/s2n-bignum@48fb153

This completely changes the implementation of ARM curve25519_x25519 and curve25519_x25519_byte (not the _alt forms, which remain faster on their target microarchitectures) to a base-25.5 unsaturated version with interleaved integer and SIMD operations, the inner loop closely following Emil Lenngren's implementation described in the paper https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf and available here: https://github.com/Emill/X25519-AArch64 A version of this code was generated by SLOTHY from the reorganized implementation by Abdulrahman, Becker, Kannwischer and Klein here: https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s as described in the associated paper https://eprint.iacr.org/2022/1303.pdf with some additional annotations for use in the formal proof. The final modular inverse computation reverts to the usual saturated representation and s2n-bignum's divstep-based inverse function. s2n-bignum original commit: awslabs/s2n-bignum@fc0b9bf

s2n-bignum original commit: awslabs/s2n-bignum@f82da8f

Lenngren-based X25519 for non-alt ARM code s2n-bignum original commit: awslabs/s2n-bignum@57eb68a

Enable testing bignum_copy_row_from_table on x86 s2n-bignum original commit: awslabs/s2n-bignum@26bfe44

This patch performs a few syntactic updates to make AWS-LC's delocator work. s2n-bignum original commit: awslabs/s2n-bignum@3b4f73c

Update curve25519_x25519{_byte} to make AWS-LC's delocator work s2n-bignum original commit: awslabs/s2n-bignum@88324d8

…24-03-06

jargh and others added 16 commits November 14, 2023 13:47

Avoid duplicate labels in ed25519 x86 implementation

34438c9

s2n-bignum original commit: awslabs/s2n-bignum@e6ef86f

Merge pull request aws#92 from torben-hansen/fix_duplicate_labels_ed2…

235b577

…5519 Avoid duplicate labels in ed25519 x86 implementation s2n-bignum original commit: awslabs/s2n-bignum@f629458

Merge branch 'awslabs:main' into main

edbe466

s2n-bignum original commit: awslabs/s2n-bignum@c1fbdb5

Merge pull request aws#94 from jargh/main

2df1938

64-bit SIMD regs in ARM model, better BOUNDER_RULE, slow-ARM field optimizations s2n-bignum original commit: awslabs/s2n-bignum@06781d2

Make parameter to ed25519 decode function const

9405fc5

s2n-bignum original commit: awslabs/s2n-bignum@d93670d

Merge pull request aws#99 from torben-hansen/ed25519_decode_signature…

04143ee

…_input_const Make _input_ parameter to ed25519 decode function const s2n-bignum original commit: awslabs/s2n-bignum@4097178

Allow MIT-0 license as well as Apache-2.0 and ISC (aws#104)

580c317

* Allow MIT-0 license as well as Apache-2.0 and ISC * Add appropriate year range to MIT-0 license s2n-bignum original commit: awslabs/s2n-bignum@48fb153

Tweak attribution of SLOTHY work

28a5cf4

s2n-bignum original commit: awslabs/s2n-bignum@f82da8f

Merge pull request aws#108 from jargh/main

79009c4

Lenngren-based X25519 for non-alt ARM code s2n-bignum original commit: awslabs/s2n-bignum@57eb68a

Merge pull request aws#109 from aqjune-aws/table-benchmark

7a374fb

Enable testing bignum_copy_row_from_table on x86 s2n-bignum original commit: awslabs/s2n-bignum@26bfe44

Update curve25519_x25519{_byte} to make AWS-LC's delocator work

f26ae72

This patch performs a few syntactic updates to make AWS-LC's delocator work. s2n-bignum original commit: awslabs/s2n-bignum@3b4f73c

Merge pull request aws#112 from aqjune-aws/fips25519

11a483c

Update curve25519_x25519{_byte} to make AWS-LC's delocator work s2n-bignum original commit: awslabs/s2n-bignum@88324d8

Merge branch 's2n-bignum-2024-03-06' into aws-lc-s2n-bignum-update-20…

90f02b9

…24-03-06

Add fcsel to delegate.go

9deb84f

aqjune-aws requested a review from a team as a code owner March 6, 2024 17:48

nebeid approved these changes Mar 6, 2024

View reviewed changes

dkostic approved these changes Mar 6, 2024

View reviewed changes

nebeid merged commit 3aa28fb into aws:main Mar 6, 2024
41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aws lc s2n bignum update 2024 03 06 #1478

Aws lc s2n bignum update 2024 03 06 #1478

aqjune-aws commented Mar 6, 2024 •

edited

Loading

Aws lc s2n bignum update 2024 03 06 #1478

Aws lc s2n bignum update 2024 03 06 #1478

Conversation

aqjune-aws commented Mar 6, 2024 • edited Loading

Description of changes:

Testing:

aqjune-aws commented Mar 6, 2024 •

edited

Loading