-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P-256 ARMv8 optimization - first steps #52
Conversation
Assembly code from OpenSSL called from C code from BoringSSL in a new EC_METHOD: EC_GFp_nistz256_arm_method(). Tests: On EC2M6g (ARMv8): - compiled with -DCMAKE_BUILD_TYPE=Release ensured that readelf -Ws bssl | grep -i nistz showed the new nistz functions - tested: $ ninja -C build run_tests - speed tests showed slight improvement in ECDH and ECDSA sign On Mac (x86_64): - compiles and runs as usual
These are the functions that could be moved easily to use nistz256 with C implementation from BoringSSL's p256-x86_64.c which call readily available assemply functions from OpenSSL in p256-armv8-asm.pl. Tests: - As previous commit. - speed tests showed further improvement, especially in ECDSA sign.
…lect_w5 This function selects an entry from a table of points in constant-time It is used in BoringSSL by the x86_64 optimized C code (nistz256). It did not exist in OpenSSL's ARMv8 assembly that supports nistz256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test my understanding, is this a correct description of the purpose of this PR?
This PR is the first in a sequence of PR's that is including ARMv8 assembly optimisations for P256 into AWS-LC. The ARMv8 assembly code is taken from OpenSSL 1.1.1. The goal is to reuse the AWS-LC P256 implementation in p256-x86_64.c
. This is achieved by making the assembly functions for ARMv8 match the x86-64 assembly functions. The initial missing ARMv8 functions are beeu_mod_inverse_vartime
, ecp_nistz256_select_w5
and ecp_nistz256_select_w7
. This PR also implements: ecp_nistz256_select_w5
.
I have manually verified the following:
- The replicated C-functions are the same as in the file
p256-x86_64.c
- The ARMv8 code taken from OpenSSL 1.1.1 (
p256-armv8-asm.pl
) is identical to the code imported here, except for 1) modifying some paths 2) including"openssl/arm_arch.h"
3) implementation ofecp_nistz256_select_w5
.
I have the following general questions and remarks:
- There are test cases in
p256-x86-64-test.cc
that tests some of the assembly functions and the p256 in general. These should also be run for ARM (currently disabled for anything else than x86_64). What is the testing plan for this? What is the testing plan for the new assembly functions and the changes in general? We should enable this as early as possible. - What are the differences between scatter-gather functions and select functions? Is one obviously worse than the other in terms of resistance against various side-channel attack vectors?
- For good measure: did you verify that the assembly functions are compiled and actually run on the ARM instance?
- For good measure: did you run the clang linter?
patches/series
Outdated
@@ -10,3 +10,4 @@ | |||
0010-reduce-threads-num-in-rsa-test.patch | |||
0011-fix-fuzz-test-path-to-libc.patch | |||
0012-replace-boringssl-api-version-macro.patch | |||
0013-start-arm-optimized-nistz256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we use a more descriptive name? I.e. what does this particular PR implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is p256-armv8-optimizations-select_w5
a suitable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
const EC_METHOD *EC_GFp_nistz256_method(void); | ||
|
||
+// [TODO] EC_GFp_nistz256_arm_method is a GFp method using montgomery multiplication, with | ||
+// x86-64 optimized P256. See http://eprint.iacr.org/2013/816. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this TODO about? Presumably we are using arm optimised P256? Could you elaborate on the story here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Edited.
#include <openssl/mem.h> | ||
#include <openssl/type_check.h> | ||
|
||
+#if !defined(OPENSSL_NO_ASM) && defined(OPENSSL_AARCH64) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason we can't have one if/def? If not, it would be better to move everything under the same if/def and comment on why it is there and what is the plan with it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks. It's one ifdef now.
+ OPENSSL_memcpy(r->Z.words, a.Z, P256_LIMBS * sizeof(BN_ULONG)); | ||
+} | ||
+ | ||
+typedef P256_POINT_AFFINE PRECOMP256_ROW[64]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All typedefs, globals and includes should probably be grouped together in the beginning of the if/def. That makes a better logical division of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
+.text | ||
+___ | ||
+######################################################################## | ||
+# Convert ecp_nistz256_table.c to layout expected by ecp_nistz_gather_w7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this conversion? Are we still going to use gather/scatter? Does this affect the select methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's removed now.
+___ | ||
+} | ||
+######################################################################## | ||
+# scatter-gather subroutines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we remove these functions? They are being replaced by the "select" constant-time functions instead and nothin in AWS-LC should be calling these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are removed now.
EC_GFp_nistz256_method(); | ||
+#elif !defined(OPENSSL_NO_ASM) && defined(OPENSSL_AARCH64) && \ | ||
+ !defined(OPENSSL_SMALL) | ||
+ EC_GFp_nistz256_arm_method(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we name the x86_64 EC_METHOD for EC_GFp_nistz256_x86_64_method
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should end up with this method being used both with x86_64 and ARMv8
* Add batch build spec. * Use batch-list. * Remove invalid characters from identifier. * Fix syntax warning. * Upgrade cdk to latest version 1.64. * Add github_codebuild_batch_stack.py. * Move build specs. * Refactor cdk code. * Use aws-lc AWS account. * Unify default values of env variable. * Fix windows docker build issue. * Add s3 bucket deletion when destroy resource. * Add new variable github_source_version. * Upgrade ec2 windows from 2016 to 2019. * Add git to PATH. * Revert "Upgrade ec2 windows from 2016 to 2019." This reverts commit a72c367. * Add 2019 windows. * Move cyg and quilt installation to build spec. * Add docker image build doc. * Fix identifier name. * Switch to 2016. CodeBuild does not support 2019. * Clean up and add more docs. * Fix typo. * Modify README.md * Remove '*'. * Change privileged-mode to false. * ARM sanitizer needs privileged-mode.
Thanks @torben-hansen's for the great comment, I edited the PR message.
Great call! They are now enabled in the last commit, 34022ff. Only the tests of the 2 functions that are to be implemented are disabled.
I didn't go in details but there are attacks on the scatter/gather techniques for RSA, CacheBleed, Yarom etal., CHES-2016. It relies on measuring time variations due to contention on the same cache bank. The attack was carried on OpenSSL v. 1.0.2f. It's plausible that OpenSSL improved the resistance of memory access to this type of attacks.
Yes, on an ARMv8 M6g instance and an x86_64 MacBook.
I did on files under |
- As per Torben's comments, the tests in `p256-x86_64_test.cc` for the assembly functions were only enabled for x86_64. In this change, they are enabled for ARMv8 assembly as well, excluding the functions that are to be implemented. - Other comments from Torben were also addressed: - cleaning up TODOs, globals, #ifdefs - removing the gather/scatter routines from `p256-armv8-asm.pl` and the corresponding (what I believe is) scattering of the table of precomputed points. - Fixed a typo in a comment in `ecp_nistz256_select_w5`. Tested on: - ARMv8 (M6g) - x86_64 (MacBook)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have good confidence in the functional correctness of the new assembly function. But I'm not familiar enough with ARMv8 to verify that the function is constant-time, e.g. does the instructions used operate in constant-time and how does our supported OS's/platforms factor into that? Given the cryptographic sensitivity of this code I propose we do the following before merging into Main:
- More thoroughly review the new ARMv8 code
- Apply fuzz testing since the code is handling sensitive user-supplied input
### Issues: CryptoAlg-573 ### Description of changes: Continuing on top of [PR#52](#52), this is the second assembly function needed to support the optimized implementation of P-256, nistz256, to be used on ARMv8 as it is currently used only with x86_64. ### Call-outs: * This PR would be merged on top of [PR#52](#52). * C code duplication continued in this change. This is only transient so that the assembly functions are added and tested gradually. The code duplication will be removed in subsequent PR(s). ### Testing: * The test for that function was enabled in `p256-x86_64_test.cc` by this change. * Tested on M6g (ARMv8) and M5a (x86_64) instances, as well as on a MacBook (x86_64). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Resolves CryptoAlg-536, CryptoAlg-556 This PR is the first in a sequence of PR's that introduce ARMv8 assembly optimisations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1 (at [this commit](openssl/openssl@46a9ee8)). The goal is achieved by reusing the AWS-LC P-256 implementation in `p256-x86_64.c` (nistz256) with ARMv8. This is possible because the assembly functions required to support that code have their analogous functions in the imported OpenSSL. Namely, the file [openss/crypto/ec/asm/ecp_nistz256-armv8.pl](https://github.com/openssl/openssl/blob/46a9ee8c796c8b5f8d95290676119b4f3d72be91/crypto/ec/asm/ecp_nistz256-armv8.pl) was imported with slight modification in the first 2 commits. However, there are 3 x86_64 assembly functions that do not have corresponding functions in that file. Those functions are `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7` and `beeu_mod_inverse_vartime`. The 3rd commit in this PR implements: `ecp_nistz256_select_w5`. - The code duplication of C functions from `p256-x86_64.c` into `p256.c` will be removed in subsequent commits; when all assembly functions called by nistz256 are available (or worked around, if needed). Instead the file `p256-x86_64.c` will be renamed (possibly `p256-nistz256.c`) and will be used with both x86_64 and armv8 platforms. Tested `boringssl/crypto/crypto_test` on ARMv8 and x86_64 platforms. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
CryptoAlg-573 Continuing on top of [PR#52](aws#52), this is the second assembly function needed to support the optimized implementation of P-256, nistz256, to be used on ARMv8 as it is currently used only with x86_64. * This PR would be merged on top of [PR#52](aws#52). * C code duplication continued in this change. This is only transient so that the assembly functions are added and tested gradually. The code duplication will be removed in subsequent PR(s). * The test for that function was enabled in `p256-x86_64_test.cc` by this change. * Tested on M6g (ARMv8) and M5a (x86_64) instances, as well as on a MacBook (x86_64). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Resolves CryptoAlg-536, CryptoAlg-556 This PR is the first in a sequence of PR's that introduce ARMv8 assembly optimisations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1 (at [this commit](openssl/openssl@46a9ee8)). The goal is achieved by reusing the AWS-LC P-256 implementation in `p256-x86_64.c` (nistz256) with ARMv8. This is possible because the assembly functions required to support that code have their analogous functions in the imported OpenSSL. Namely, the file [openss/crypto/ec/asm/ecp_nistz256-armv8.pl](https://github.com/openssl/openssl/blob/46a9ee8c796c8b5f8d95290676119b4f3d72be91/crypto/ec/asm/ecp_nistz256-armv8.pl) was imported with slight modification in the first 2 commits. However, there are 3 x86_64 assembly functions that do not have corresponding functions in that file. Those functions are `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7` and `beeu_mod_inverse_vartime`. The 3rd commit in this PR implements: `ecp_nistz256_select_w5`. - The code duplication of C functions from `p256-x86_64.c` into `p256.c` will be removed in subsequent commits; when all assembly functions called by nistz256 are available (or worked around, if needed). Instead the file `p256-x86_64.c` will be renamed (possibly `p256-nistz256.c`) and will be used with both x86_64 and armv8 platforms. Tested `boringssl/crypto/crypto_test` on ARMv8 and x86_64 platforms. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
CryptoAlg-573 Continuing on top of [PR#52](aws#52), this is the second assembly function needed to support the optimized implementation of P-256, nistz256, to be used on ARMv8 as it is currently used only with x86_64. * This PR would be merged on top of [PR#52](aws#52). * C code duplication continued in this change. This is only transient so that the assembly functions are added and tested gradually. The code duplication will be removed in subsequent PR(s). * The test for that function was enabled in `p256-x86_64_test.cc` by this change. * Tested on M6g (ARMv8) and M5a (x86_64) instances, as well as on a MacBook (x86_64). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
… nebeid/p256-armv8-quilt) PR message: ### Issues: * CryptoAlg-536: import OpenSSL ARMv8 assembly functions for P-256 ([PR#52](#52)) * CryptoAlg-556: implement the 1st of 3 missing assembly functions in the imported code ([PR#52] (#52)) * CryptoAlg-573: implement the 2nd of 3 missing assembly functions ([PR#56](#56) ) * CryptoAlg-574: implement the 3rd of 3 missing assembly functions ([PR#57](#57)) * CryptoAlg-591: Remove duplicate C code that was used during gradual development of ARMv8 optimization ([PR#60](#60)) * CryptoAlg-587: Introduce a new macro, OPENSSL_AARCH64_P256, to enable P-256 ARMv8 optimisations ([PR#61](#61)) * CryptoAlg-530: Merge changed files directly, delete the patch, create a new ci test and update `BUILDING.md`, this PR ([PR#62](#62)) ### Description of changes: The first 5 commits in this PR are cherry-picked from branch [p256-armv8](https://github.com/awslabs/aws-lc/tree/p256-armv8), where they were changes to a `quilt` patch. Subsequent commits committed the changed files directly and deleted the patch. The changes in this PR introduce ARMv8 (AArch64) assembly optimizations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1's [`openss/crypto/ec/asm/ecp_nistz256-armv8.pl`](https://github.com/openssl/openssl/blob/46a9ee8c796c8b5f8d95290676119b4f3d72be91/crypto/ec/asm/ecp_nistz256-armv8.pl)(at [this commit](openssl/openssl@46a9ee8)). The goal is achieved by reusing the AWS-LC P-256 implementation in `p256-x86_64.c` (nistz256) with ARMv8. This is possible because the assembly functions found in `crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl` required to support that code have their analogous functions in the imported OpenSSL ARMv8 Perl assembly file. Specifically, `ecp_nistz256-armv8.pl` was imported with the following modifications: * Renamed to `p256-armv8-asm.pl` * Modified the location of `arm-xlate.pl` and `arm_arch.h` * Replaced the `scatter-gather subroutines` with `select subroutines`. The `select subroutines` are implemented for ARMv8 similarly to their x86_64 counterparts, `ecp_nistz256_select_w5` and `ecp_nistz256_select_w7`. * Another assembly function called by the C code in `p256-x86_64.c` is `beeu_mod_inverse_vartime()`. This function was implemented for AArch64 in `p256_beeu-armv8-asm.pl` similarly to its implementation in `p256_beeu-x86_64-asm.pl`. A new macro `OPENSSL_AARCH64_P256` is introduced to control the enabling of the ARMv8 P-256 optimizations. A new ci test is created to enable it and test the new code. The files containing `p256-x86_64` in their name were renamed to, `p256-nistz` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled. ### Call-outs: * This change focuses on AArch64 (64-bit architecture of ARMv8). It does not support ARMv4 or ARMv7. * The assembly functions added contain _pointer authentication_ instructions that match the rest of the imported code. In subsequent PRs, pointer authentication (PAuth) and Branch Target Identification (BTI), will be modified/added to the assembly code introduced in this PR. They will conform to [this commit to BoringSSL](https://boringssl-review.googlesource.com/c/boringssl/+/42084) which is merged in `main`. * ~The files containing `p256-x86_64` in their name should be ideally renamed to, e.g., `p256-nistz256` since the functions and tests defined in them are hereby running on ARMv8 as well, if enabled.~ * The changes described above were introduced gradually in the code base, hence the code duplication from `p256-x86_64.c` into `p256.c`, which was subsequently removed when all assembly functions called by that code were implemented for ARMv8. ### Testing: Tested on ARMv8 and x86_64 platforms by running `cmake` as follows: ``` cmake -DOPENSSL_AARCH64_P256=1 -DCMAKE_BUILD_TYPE=Release -GNinja .. ninja ./tool/bssl speed -filter "ECDH P-256" ./tool/bssl speed -filter "ECDSA P-256" ``` The speed tests report a significant difference based on whether `-DOPENSSL_AARCH64_P256=1` or `=0`. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Add edwards25519 (Ed25519) basepoint multiplication s2n-bignum original commit: awslabs/s2n-bignum@c499f8f
Add edwards25519 (Ed25519) basepoint multiplication s2n-bignum original commit: awslabs/s2n-bignum@c499f8f s2n-bignum original commit: awslabs/s2n-bignum@c816a48
Add edwards25519 (Ed25519) basepoint multiplication
Issues:
Resolves CryptoAlg-536, CryptoAlg-556
Description of changes:
This PR is the first in a sequence of PR's that introduce ARMv8 assembly optimisations for P-256 into AWS-LC. The goal is bringing the P-256 performance on ARMv8 at par with x86_64. The ARMv8 assembly code is taken from OpenSSL 1.1.1 (at this commit).
The goal is achieved by reusing the AWS-LC P-256 implementation in
p256-x86_64.c
(nistz256) with ARMv8. This is possible because the assembly functions required to support that code have their analogous functions in the imported OpenSSL. Namely, the file openss/crypto/ec/asm/ecp_nistz256-armv8.pl was imported with slight modification in the first 2 commits.However, there are 3 x86_64 assembly functions that do not have corresponding functions in that file.
Those functions are
ecp_nistz256_select_w5
andecp_nistz256_select_w7
andbeeu_mod_inverse_vartime
.The 3rd commit in this PR implements:
ecp_nistz256_select_w5
.Call-outs:
p256-x86_64.c
intop256.c
will be removed in subsequent commits; when all assembly functions called by nistz256 are available (or worked around, if needed). Instead the filep256-x86_64.c
will be renamed (possiblyp256-nistz256.c
) and will be used with both x86_64 and armv8 platforms.Testing:
Tested
boringssl/crypto/crypto_test
on ARMv8 and x86_64 platforms.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.