Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P-256 ARMv8 optimization - first steps #52

Merged
merged 7 commits into from
Oct 13, 2020
Merged

Conversation

nebeid
Copy link
Contributor

@nebeid nebeid commented Oct 1, 2020

Issues:

Resolves CryptoAlg-536, CryptoAlg-556

Description of changes:

This PR is the first in a sequence of PR's that introduce ARMv8 assembly optimisations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1 (at this commit).
The goal is achieved by reusing the AWS-LC P-256 implementation in p256-x86_64.c (nistz256) with ARMv8. This is possible because the assembly functions required to support that code have their analogous functions in the imported OpenSSL. Namely, the file openss/crypto/ec/asm/ecp_nistz256-armv8.pl was imported with slight modification in the first 2 commits.
However, there are 3 x86_64 assembly functions that do not have corresponding functions in that file.
Those functions are ecp_nistz256_select_w5 and ecp_nistz256_select_w7 and beeu_mod_inverse_vartime.
The 3rd commit in this PR implements: ecp_nistz256_select_w5.

Call-outs:

  • The code duplication of C functions from p256-x86_64.c into p256.c will be removed in subsequent commits; when all assembly functions called by nistz256 are available (or worked around, if needed). Instead the file p256-x86_64.c will be renamed (possibly p256-nistz256.c) and will be used with both x86_64 and armv8 platforms.

Testing:

Tested boringssl/crypto/crypto_test on ARMv8 and x86_64 platforms.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

nebeid added 3 commits October 1, 2020 12:33
Assembly code from OpenSSL called from C code from BoringSSL
in a new EC_METHOD: EC_GFp_nistz256_arm_method().

Tests:
On EC2M6g (ARMv8):
- compiled with -DCMAKE_BUILD_TYPE=Release
  ensured that readelf -Ws bssl | grep -i nistz
  showed the new nistz functions
- tested:
  $ ninja -C build run_tests
- speed tests showed slight improvement in ECDH and ECDSA sign

On Mac (x86_64):
- compiles and runs as usual
These are the functions that could be moved easily to use nistz256 with
C implementation from BoringSSL's p256-x86_64.c which call
readily available assemply functions from OpenSSL in p256-armv8-asm.pl.

Tests:
- As previous commit.
- speed tests showed further improvement, especially in ECDSA sign.
…lect_w5

This function selects an entry from a table of points in constant-time
It is used in BoringSSL by the x86_64 optimized C code (nistz256).
It did not exist in OpenSSL's ARMv8 assembly that supports nistz256.
@nebeid nebeid changed the title Ca536 P-256 ARMv8 optimization - first steps Oct 1, 2020
@nebeid nebeid self-assigned this Oct 1, 2020
Copy link
Contributor

@torben-hansen torben-hansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test my understanding, is this a correct description of the purpose of this PR?
This PR is the first in a sequence of PR's that is including ARMv8 assembly optimisations for P256 into AWS-LC. The ARMv8 assembly code is taken from OpenSSL 1.1.1. The goal is to reuse the AWS-LC P256 implementation in p256-x86_64.c. This is achieved by making the assembly functions for ARMv8 match the x86-64 assembly functions. The initial missing ARMv8 functions are beeu_mod_inverse_vartime, ecp_nistz256_select_w5 and ecp_nistz256_select_w7. This PR also implements: ecp_nistz256_select_w5.

I have manually verified the following:

  • The replicated C-functions are the same as in the file p256-x86_64.c
  • The ARMv8 code taken from OpenSSL 1.1.1 (p256-armv8-asm.pl) is identical to the code imported here, except for 1) modifying some paths 2) including "openssl/arm_arch.h" 3) implementation of ecp_nistz256_select_w5.

I have the following general questions and remarks:

  • There are test cases in p256-x86-64-test.cc that tests some of the assembly functions and the p256 in general. These should also be run for ARM (currently disabled for anything else than x86_64). What is the testing plan for this? What is the testing plan for the new assembly functions and the changes in general? We should enable this as early as possible.
  • What are the differences between scatter-gather functions and select functions? Is one obviously worse than the other in terms of resistance against various side-channel attack vectors?
  • For good measure: did you verify that the assembly functions are compiled and actually run on the ARM instance?
  • For good measure: did you run the clang linter?

patches/series Outdated
@@ -10,3 +10,4 @@
0010-reduce-threads-num-in-rsa-test.patch
0011-fix-fuzz-test-path-to-libc.patch
0012-replace-boringssl-api-version-macro.patch
0013-start-arm-optimized-nistz256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we use a more descriptive name? I.e. what does this particular PR implement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is p256-armv8-optimizations-select_w5 a suitable name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const EC_METHOD *EC_GFp_nistz256_method(void);

+// [TODO] EC_GFp_nistz256_arm_method is a GFp method using montgomery multiplication, with
+// x86-64 optimized P256. See http://eprint.iacr.org/2013/816.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this TODO about? Presumably we are using arm optimised P256? Could you elaborate on the story here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Edited.

#include <openssl/mem.h>
#include <openssl/type_check.h>

+#if !defined(OPENSSL_NO_ASM) && defined(OPENSSL_AARCH64) && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason we can't have one if/def? If not, it would be better to move everything under the same if/def and comment on why it is there and what is the plan with it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks. It's one ifdef now.

+ OPENSSL_memcpy(r->Z.words, a.Z, P256_LIMBS * sizeof(BN_ULONG));
+}
+
+typedef P256_POINT_AFFINE PRECOMP256_ROW[64];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All typedefs, globals and includes should probably be grouped together in the beginning of the if/def. That makes a better logical division of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

+.text
+___
+########################################################################
+# Convert ecp_nistz256_table.c to layout expected by ecp_nistz_gather_w7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this conversion? Are we still going to use gather/scatter? Does this affect the select methods?

Copy link
Contributor Author

@nebeid nebeid Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed now.

+___
+}
+########################################################################
+# scatter-gather subroutines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we remove these functions? They are being replaced by the "select" constant-time functions instead and nothin in AWS-LC should be calling these functions.

Copy link
Contributor Author

@nebeid nebeid Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are removed now.

EC_GFp_nistz256_method();
+#elif !defined(OPENSSL_NO_ASM) && defined(OPENSSL_AARCH64) && \
+ !defined(OPENSSL_SMALL)
+ EC_GFp_nistz256_arm_method();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name the x86_64 EC_METHOD for EC_GFp_nistz256_x86_64_method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should end up with this method being used both with x86_64 and ARMv8

@nebeid nebeid changed the base branch from main to p256-armv8 October 5, 2020 18:28
* Add batch build spec.

* Use batch-list.

* Remove invalid characters from identifier.

* Fix syntax warning.

* Upgrade cdk to latest version 1.64.

* Add github_codebuild_batch_stack.py.

* Move build specs.

* Refactor cdk code.

* Use aws-lc AWS account.

* Unify default values of env variable.

* Fix windows docker build issue.

* Add s3 bucket deletion when destroy resource.

* Add new variable github_source_version.

* Upgrade ec2 windows from 2016 to 2019.

* Add git to PATH.

* Revert "Upgrade ec2 windows from 2016 to 2019."

This reverts commit a72c367.

* Add 2019 windows.

* Move cyg and quilt installation to build spec.

* Add docker image build doc.

* Fix identifier name.

* Switch to 2016. CodeBuild does not support 2019.

* Clean up and add more docs.

* Fix typo.

* Modify README.md

* Remove '*'.

* Change privileged-mode to false.

* ARM sanitizer needs privileged-mode.
@nebeid
Copy link
Contributor Author

nebeid commented Oct 6, 2020

Thanks @torben-hansen's for the great comment, I edited the PR message.
In reply to your questions

I have the following general questions and remarks:

* There are test cases in `p256-x86-64-test.cc` that tests some of the assembly functions 
   and the p256 in general. These should also be run for ARM (currently disabled for anything 
   else than x86_64). What is the testing plan for this? 
   What is the testing plan for the new assembly functions and the changes in general? 
   We should enable this as early as possible.

Great call! They are now enabled in the last commit, 34022ff. Only the tests of the 2 functions that are to be implemented are disabled.

* What are the differences between scatter-gather functions and select functions? 
   Is one obviously worse than the other in terms of resistance against various side-channel attack 
   vectors?

I didn't go in details but there are attacks on the scatter/gather techniques for RSA, CacheBleed, Yarom etal., CHES-2016. It relies on measuring time variations due to contention on the same cache bank. The attack was carried on OpenSSL v. 1.0.2f. It's plausible that OpenSSL improved the resistance of memory access to this type of attacks.
On the other hand, the select method was what was originally in the patch provided by Gueron and Krasnov to OpenSSL and was adopted by BoringSSL since. I believe it performs better on platforms with SIMD instructions. In fact, in the patch comments, it is mentioned that OpenSSL preferred to keep the scatter/gather technique because the select technique was less performant on platforms without SIMD.

* For good measure: did you verify that the assembly functions are compiled and actually run 
   on the ARM instance?

Yes, on an ARMv8 M6g instance and an x86_64 MacBook.

* For good measure: did you run the clang linter?

I did on files under ec. It outputs a lot of warnings for existent code. We can discuss further the course of action.

- As per Torben's comments, the tests in `p256-x86_64_test.cc` for the assembly
functions were only enabled for x86_64. In this change, they are enabled
for ARMv8 assembly as well, excluding the functions that are to be implemented.
- Other comments from Torben were also addressed:
  - cleaning up TODOs, globals, #ifdefs
  - removing the gather/scatter routines from `p256-armv8-asm.pl`
    and the corresponding (what I believe is) scattering of the table
    of precomputed points.
- Fixed a typo in a comment in `ecp_nistz256_select_w5`.

Tested on:
- ARMv8 (M6g)
- x86_64 (MacBook)
Copy link
Contributor

@torben-hansen torben-hansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have good confidence in the functional correctness of the new assembly function. But I'm not familiar enough with ARMv8 to verify that the function is constant-time, e.g. does the instructions used operate in constant-time and how does our supported OS's/platforms factor into that? Given the cryptographic sensitivity of this code I propose we do the following before merging into Main:

  • More thoroughly review the new ARMv8 code
  • Apply fuzz testing since the code is handling sensitive user-supplied input

@nebeid nebeid merged commit 3489c88 into aws:p256-armv8 Oct 13, 2020
nebeid added a commit that referenced this pull request Oct 16, 2020
### Issues:
CryptoAlg-573

### Description of changes: 
Continuing on top of [PR#52](#52), this is the second assembly function needed to support the optimized implementation of P-256, nistz256, to be used on ARMv8 as it is currently used only with x86_64.

### Call-outs:
* This PR would be merged on top of [PR#52](#52).
* C code duplication continued in this change. This is only transient so that the assembly functions are added and tested gradually. The code duplication will be removed in subsequent PR(s).

### Testing:
* The test for that function was enabled in `p256-x86_64_test.cc` by this change.
* Tested on M6g (ARMv8) and M5a (x86_64) instances, as well as on a MacBook (x86_64).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
bryce-shang pushed a commit to bryce-shang/aws-lc that referenced this pull request Nov 6, 2020
Resolves CryptoAlg-536, CryptoAlg-556

This PR is the first in a sequence of PR's that introduce ARMv8 assembly optimisations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1 (at [this commit](openssl/openssl@46a9ee8)).
The goal is achieved by reusing the AWS-LC P-256 implementation in `p256-x86_64.c` (nistz256) with ARMv8. This is possible because the assembly functions required to support that code have their analogous functions in the imported OpenSSL. Namely, the file [openss/crypto/ec/asm/ecp_nistz256-armv8.pl](https://github.com/openssl/openssl/blob/46a9ee8c796c8b5f8d95290676119b4f3d72be91/crypto/ec/asm/ecp_nistz256-armv8.pl) was imported with slight modification in the first 2 commits.
However, there are 3 x86_64 assembly functions that do not have corresponding functions in that file.
Those functions are `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7` and `beeu_mod_inverse_vartime`.
The 3rd commit in this PR implements: `ecp_nistz256_select_w5`.

- The code duplication of C functions from `p256-x86_64.c` into `p256.c` will be removed in subsequent commits; when all assembly functions called by nistz256 are available (or worked around, if needed). Instead the file `p256-x86_64.c` will be renamed (possibly `p256-nistz256.c`) and will be used with both x86_64 and armv8 platforms.

Tested `boringssl/crypto/crypto_test` on ARMv8 and x86_64 platforms.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
bryce-shang pushed a commit to bryce-shang/aws-lc that referenced this pull request Nov 6, 2020
CryptoAlg-573

Continuing on top of [PR#52](aws#52), this is the second assembly function needed to support the optimized implementation of P-256, nistz256, to be used on ARMv8 as it is currently used only with x86_64.

* This PR would be merged on top of [PR#52](aws#52).
* C code duplication continued in this change. This is only transient so that the assembly functions are added and tested gradually. The code duplication will be removed in subsequent PR(s).

* The test for that function was enabled in `p256-x86_64_test.cc` by this change.
* Tested on M6g (ARMv8) and M5a (x86_64) instances, as well as on a MacBook (x86_64).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
nebeid added a commit to nebeid/aws-lc that referenced this pull request Nov 6, 2020
Resolves CryptoAlg-536, CryptoAlg-556

This PR is the first in a sequence of PR's that introduce ARMv8 assembly optimisations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1 (at [this commit](openssl/openssl@46a9ee8)).
The goal is achieved by reusing the AWS-LC P-256 implementation in `p256-x86_64.c` (nistz256) with ARMv8. This is possible because the assembly functions required to support that code have their analogous functions in the imported OpenSSL. Namely, the file [openss/crypto/ec/asm/ecp_nistz256-armv8.pl](https://github.com/openssl/openssl/blob/46a9ee8c796c8b5f8d95290676119b4f3d72be91/crypto/ec/asm/ecp_nistz256-armv8.pl) was imported with slight modification in the first 2 commits.
However, there are 3 x86_64 assembly functions that do not have corresponding functions in that file.
Those functions are `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7` and `beeu_mod_inverse_vartime`.
The 3rd commit in this PR implements: `ecp_nistz256_select_w5`.

- The code duplication of C functions from `p256-x86_64.c` into `p256.c` will be removed in subsequent commits; when all assembly functions called by nistz256 are available (or worked around, if needed). Instead the file `p256-x86_64.c` will be renamed (possibly `p256-nistz256.c`) and will be used with both x86_64 and armv8 platforms.

Tested `boringssl/crypto/crypto_test` on ARMv8 and x86_64 platforms.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
nebeid added a commit to nebeid/aws-lc that referenced this pull request Nov 6, 2020
CryptoAlg-573

Continuing on top of [PR#52](aws#52), this is the second assembly function needed to support the optimized implementation of P-256, nistz256, to be used on ARMv8 as it is currently used only with x86_64.

* This PR would be merged on top of [PR#52](aws#52).
* C code duplication continued in this change. This is only transient so that the assembly functions are added and tested gradually. The code duplication will be removed in subsequent PR(s).

* The test for that function was enabled in `p256-x86_64_test.cc` by this change.
* Tested on M6g (ARMv8) and M5a (x86_64) instances, as well as on a MacBook (x86_64).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
nebeid added a commit that referenced this pull request Nov 20, 2020
… nebeid/p256-armv8-quilt)

PR message:
### Issues:
* CryptoAlg-536: import OpenSSL ARMv8 assembly functions for P-256 ([PR#52](#52))
* CryptoAlg-556: implement the 1st of 3 missing assembly functions in the imported code ([PR#52] (#52))
* CryptoAlg-573: implement the 2nd of 3 missing assembly functions ([PR#56](#56) )
* CryptoAlg-574: implement the 3rd of 3 missing assembly functions ([PR#57](#57))
* CryptoAlg-591: Remove duplicate C code that was used during gradual development of ARMv8 optimization ([PR#60](#60))
* CryptoAlg-587: Introduce a new macro, OPENSSL_AARCH64_P256, to enable P-256 ARMv8 optimisations ([PR#61](#61))
* CryptoAlg-530: Merge changed files directly, delete the patch, create a new ci test and update `BUILDING.md`, this PR ([PR#62](#62))

### Description of changes: 
The first 5 commits in this PR are cherry-picked from branch [p256-armv8](https://github.com/awslabs/aws-lc/tree/p256-armv8), where they were changes to a `quilt` patch. Subsequent commits committed the changed files directly and deleted the patch.

The changes in this PR introduce ARMv8 (AArch64) assembly optimizations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1's [`openss/crypto/ec/asm/ecp_nistz256-armv8.pl`](https://github.com/openssl/openssl/blob/46a9ee8c796c8b5f8d95290676119b4f3d72be91/crypto/ec/asm/ecp_nistz256-armv8.pl)(at [this commit](openssl/openssl@46a9ee8)).

The goal is achieved by reusing the AWS-LC P-256 implementation in `p256-x86_64.c` (nistz256) with ARMv8. This is possible because the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly file. 

Specifically, `ecp_nistz256-armv8.pl` was imported with the following modifications:
* Renamed to `p256-armv8-asm.pl`
* Modified the location of `arm-xlate.pl` and `arm_arch.h`
* Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`.
* Another assembly function called by the C code in `p256-x86_64.c` is `beeu_mod_inverse_vartime()`. This function was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`.

A new macro `OPENSSL_AARCH64_P256` is introduced to control the enabling of the ARMv8 P-256 optimizations. A new ci test is created to enable it and test the new code.

The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.

### Call-outs:
* This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7.
* The assembly functions added contain _pointer authentication_ instructions that match the rest of the imported code. In subsequent PRs, pointer authentication (PAuth) and Branch Target Identification (BTI), will be modified/added to the assembly code introduced in this PR. They will conform to [this commit to BoringSSL](https://boringssl-review.googlesource.com/c/boringssl/+/42084) which is merged in `main`.
* ~The files containing `p256-x86_64` in their name should be ideally renamed to, e.g., `p256-nistz256` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.~
* The changes described above were introduced gradually in the code base, hence the code duplication from `p256-x86_64.c` into `p256.c`, which was subsequently removed when all assembly functions called by that code were implemented for ARMv8.

### Testing:
Tested on ARMv8 and x86_64 platforms by running `cmake` as follows:
```
cmake -DOPENSSL_AARCH64_P256=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
./tool/bssl speed -filter "ECDH P-256"
./tool/bssl speed -filter "ECDSA P-256"
```
The speed tests report a significant difference based on whether `-DOPENSSL_AARCH64_P256=1` or `=0`.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@nebeid nebeid deleted the CA536 branch November 29, 2021 13:09
torben-hansen pushed a commit to torben-hansen/aws-lc that referenced this pull request Nov 15, 2023
Add edwards25519 (Ed25519) basepoint multiplication
s2n-bignum original commit: awslabs/s2n-bignum@c499f8f
torben-hansen pushed a commit to torben-hansen/aws-lc that referenced this pull request Sep 18, 2024
Add edwards25519 (Ed25519) basepoint multiplication
s2n-bignum original commit: awslabs/s2n-bignum@c499f8f

s2n-bignum original commit: awslabs/s2n-bignum@c816a48
dkostic pushed a commit to dkostic/aws-lc that referenced this pull request Dec 5, 2024
Add edwards25519 (Ed25519) basepoint multiplication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants