Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aws lc s2n bignum update 2024 03 06 #1478

Merged
merged 16 commits into from
Mar 6, 2024

Conversation

aqjune-aws
Copy link
Contributor

@aqjune-aws aqjune-aws commented Mar 6, 2024

Description of changes:

This pulls the latest changes in s2n-bignum which are:

  • Performance improvement in curve25519_x25519{_bytes}.s
  • Copyright updates (adds MIT-0)

Details of the performance improvement in x25519 is as follows, which is also described in the message of corresponding commit:

This completely changes the implementation of ARM curve25519_x25519
and curve25519_x25519_byte (not the _alt forms, which remain faster
on their target microarchitectures) to a base-25.5 unsaturated version
with interleaved integer and SIMD operations, the inner loop closely
following Emil Lenngren's implementation described in the paper

  https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf

and available here:

  https://github.com/Emill/X25519-AArch64

A version of this code was generated by SLOTHY from the reorganized
implementation by Abdulrahman, Becker, Kannwischer and Klein here:

 https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s

as described in the associated paper

  https://eprint.iacr.org/2022/1303.pdf

with some additional annotations for use in the formal proof. The
final modular inverse computation reverts to the usual saturated
representation and s2n-bignum's divstep-based inverse function.

./tool/bssl speed -filter 25519 shows 74% speedup on Graviton 2 (c6g.16xlarge):

Did 13000 Curve25519 arbitrary point multiplication operations in 1010330us (12867.1 ops/sec)
->
Did 23000 Curve25519 arbitrary point multiplication operations in 1028533us (22361.9 ops/sec)

Also, this patch updates delegate.go to deal with the condition (eq, ne, ...) of fcsel .

The commands in the internal Quip document was used. PATHS_TO_KEEP remains unchanged since this PR only contains updates to already existing s2n-bignum assembly files. The used PATHS_TO_KEEP definition is:

PATHS_TO_KEEP="\
./arm/p384 ./x86_att/p384 ./arm/p521 ./x86_att/p521 \
./arm/fastmul/bignum_emontredc_8n.S \
./arm/fastmul/bignum_kmul_16_32.S \
./arm/fastmul/bignum_kmul_32_64.S \
./arm/fastmul/bignum_ksqr_16_32.S \
./arm/fastmul/bignum_ksqr_32_64.S \
./arm/generic/bignum_ge.S \
./arm/generic/bignum_mul.S \
./arm/generic/bignum_optsub.S \
./arm/generic/bignum_sqr.S \
./x86_att/curve25519/curve25519_x25519.S \
./x86_att/curve25519/curve25519_x25519base.S \
./x86_att/curve25519/curve25519_x25519_alt.S \
./x86_att/curve25519/curve25519_x25519base_alt.S  \
./x86_att/curve25519/bignum_neg_p25519.S \
./x86_att/curve25519/bignum_mod_n25519.S  \
./x86_att/curve25519/edwards25519_decode.S  \
./x86_att/curve25519/edwards25519_decode_alt.S  \
./x86_att/curve25519/edwards25519_encode.S  \
./x86_att/curve25519/edwards25519_scalarmulbase.S  \
./x86_att/curve25519/edwards25519_scalarmulbase_alt.S  \
./x86_att/curve25519/edwards25519_scalarmuldouble.S  \
./x86_att/curve25519/edwards25519_scalarmuldouble_alt.S  \
./arm/curve25519/curve25519_x25519.S \
./arm/curve25519/curve25519_x25519base.S \
./arm/curve25519/curve25519_x25519_alt.S \
./arm/curve25519/curve25519_x25519base_alt.S \
./arm/curve25519/curve25519_x25519_byte.S \
./arm/curve25519/curve25519_x25519base_byte.S \
./arm/curve25519/curve25519_x25519_byte_alt.S \
./arm/curve25519/curve25519_x25519base_byte_alt.S \
./arm/curve25519/bignum_neg_p25519.S \
./arm/curve25519/bignum_mod_n25519.S  \
./arm/curve25519/edwards25519_decode.S  \
./arm/curve25519/edwards25519_decode_alt.S  \
./arm/curve25519/edwards25519_encode.S  \
./arm/curve25519/edwards25519_scalarmulbase.S  \
./arm/curve25519/edwards25519_scalarmulbase_alt.S  \
./arm/curve25519/edwards25519_scalarmuldouble.S  \
./arm/curve25519/edwards25519_scalarmuldouble_alt.S \
./include/_internal_s2n_bignum.h"

Testing:

./tool/bssl speed -filter 25519

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

jargh and others added 16 commits November 14, 2023 13:47
In general, BOUNDER_RULE now directly handles operations over Z and N,
assuming an outer real_of_int / real_of_num cast into R (this is also
automated in the tactic form BOUNDER_TAC). In particular, this change
can greatly improve bounds for terms involving integer or natural
number division and remainder (DIV, div, MOD and rem) as well as
cutoff subtraction over N. There is also now support for conditionals,
though the condition is not used as extra context, simply being the
basis for a case split.

This update rolls in various trivial typographic fixes in comments.

s2n-bignum original commit: awslabs/s2n-bignum@ccefa2a
…5519

Avoid duplicate labels in ed25519 x86 implementation
s2n-bignum original commit: awslabs/s2n-bignum@f629458
64-bit SIMD regs in ARM model, better BOUNDER_RULE, slow-ARM field optimizations
s2n-bignum original commit: awslabs/s2n-bignum@06781d2
…_input_const

Make _input_ parameter to ed25519 decode function const
s2n-bignum original commit: awslabs/s2n-bignum@4097178
* Allow MIT-0 license as well as Apache-2.0 and ISC

* Add appropriate year range to MIT-0 license
s2n-bignum original commit: awslabs/s2n-bignum@48fb153
This completely changes the implementation of ARM curve25519_x25519
and curve25519_x25519_byte (not the _alt forms, which remain faster
on their target microarchitectures) to a base-25.5 unsaturated version
with interleaved integer and SIMD operations, the inner loop closely
following Emil Lenngren's implementation described in the paper

  https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf

and available here:

  https://github.com/Emill/X25519-AArch64

A version of this code was generated by SLOTHY from the reorganized
implementation by Abdulrahman, Becker, Kannwischer and Klein here:

 https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s

as described in the associated paper

  https://eprint.iacr.org/2022/1303.pdf

with some additional annotations for use in the formal proof. The
final modular inverse computation reverts to the usual saturated
representation and s2n-bignum's divstep-based inverse function.

s2n-bignum original commit: awslabs/s2n-bignum@fc0b9bf
Lenngren-based X25519 for non-alt ARM code
s2n-bignum original commit: awslabs/s2n-bignum@57eb68a
Enable testing bignum_copy_row_from_table on x86
s2n-bignum original commit: awslabs/s2n-bignum@26bfe44
This patch performs a few syntactic updates to make AWS-LC's delocator
work.

s2n-bignum original commit: awslabs/s2n-bignum@3b4f73c
Update curve25519_x25519{_byte} to make AWS-LC's delocator work
s2n-bignum original commit: awslabs/s2n-bignum@88324d8
@aqjune-aws aqjune-aws requested a review from a team as a code owner March 6, 2024 17:48
@nebeid nebeid merged commit 3aa28fb into aws:main Mar 6, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants