[sw, otbn, crypto] Sync with the master branch #23492

sameo · 2024-06-04T10:06:32Z

We applied all changes from master, for the sw/otbn/crypto folder.

sameo · 2024-06-05T04:28:54Z

This is built on top of #23480 for now, until it gets merged.

In FPGA implementation of the earlgrey, blanking is not used for Bignum RF due to high resource consumption and this causes leakage with FPGA experiments. This is an intermediate solution. We are keeping shares in registers with higher Hamming distance between addresses so that we are reducing the probability of reading out both shares when address switches. Co-authored-by: Jade Philipoom <[email protected]> Signed-off-by: Abdullah Varici <[email protected]>

This changes are done regarding to outcomes of the COCO-Alma tool usage. Moreover, many more rules are put in the OTBN Security Guidance Document. Please see it before developing algorithms on OTBN which handle secret data. If we don't have these changes, the tool reports there are transient leakage and we may see leakage on the final product even if we don't see a leakage on FPGA experiments. We are also seeing a stable leakage on the MSB of the intermediate values. We acknowledge that and since the algorithm runs with 64-bit excess randomness, we don't expect that to be possible to use that leakage and retrieve secret values. We also verified that the stable leakage disappeared after running the routine on 320-bit instead of 321-bit. Signed-off-by: Abdullah Varici <[email protected]>

Previously, kmac_disable_masking() did not correctly initialize the full KMAC config struct. As a result of using uninitialized memory, it could happen that the endianness when reading the state got changed which completely broke all SHA3 captures. This commit fixes that by creating a static KMAC config which, for the sake of documentation, initializes all relevant struct variables to the correct value. kmac_disable_masking() then just modifies the variables required to switch off the masking. Signed-off-by: Pirmin Vogel <[email protected]>

Adds batch capture for ECC256 keygen TVLA fixed-vs-random key test. Signed-off-by: Vladimir Rozic <[email protected]>

This commit modifies the p256_scalar_mult function so that it returns an arithmetically masked result. The arithmetic masking is done before converting back to affine coordinates. For this the internal scalar_mult_int function is changed to return projective coordinates. The transformation to affine coordinates now needs to be done outside scalar_mult_int. This commit also adds code to convert from arithmetic masking to boolean masking. The conversion is implemented following Gaubin's algorithm to prevent side-channel leakage. The conversion is implemented in a 257-bit domain to enable carry bit handling, which is necessary when the arithmetic masking is performed within a mod p domain. Both described changes are then put together to generate the P256 ECDH shared session key. Further, this commit also adds the relevant tests. Signed-off-by: Moritz Wettermann <[email protected]>

The changes to P256 OTBN software, introduced in the previous commit, increase code size so much that IMEM of OTBN is too small for the application. Therefore, this commit restructures P256 code to prevent an IMEM overflow. For this, p256.s is split into several individual files containing different functions. In case the functions are not needed for a specific binary, they now won't get inserted into the binary anymore. Signed-off-by: Moritz Wettermann <[email protected]>

This commit includes the following changes: - The internal scalar multiplication function now requires the scalar value to be handed over as additive (modulo n) shares. - Therefore, scalar k and private key d are now stored in shares in DMEM - The shares are expanded by 64 bits to 448-bit values. This is a side-channel protection measurement. - Scalar and base point multiplication functions as well as the signing function are adapted to the mentioned scalar/key handling. - For signing the shares are masked multiplicatively to not reveal the scalar/key value via side-channel leakage during signature computation. - For multiplicative masking a new 448x128-bit multiplication function is added, as well as a modular reduction function (modulo n). - The mentioned changes increase code size quite significantly. Because of this, (internal) scalar and base point multiplication are separated into own source code files to prevent IMEM overflow. Signed-off-by: Moritz Wettermann <[email protected]>

sameo · 2024-06-10T12:32:01Z

@jadephilipoom I rebased this PR on top of integrated_dev, now that we got the hw/otbn changes in.

I had to pull the sw/device/sca changes as well because the current code would not build with the new sw/otbn/crypto, see https://dev.azure.com/lowrisc/opentitan/_build/results?buildId=153946&view=logs&j=5df5adf8-3577-5927-f0b2-31afed2697ee&t=1253ba27-4e8c-57af-d0ba-888237644875

Update ecc384_serial.c and p384_ecdsa_sca.s files to be compatible with the latest capture.py file in the ot-sca repo. Signed-off-by: Bilgiday Yuce <[email protected]>

Signed-off-by: Moritz Wettermann <[email protected]>

This commit adds three functions. One to generate a nonzero random value in the scalar field which is then used by two other functions to generate either a random secret key d or a random secret scalar k. Random values are returned as two additive 448-bit shares (modulo n). Further, a test is added to check if generated values of d and k are distinct and in range. Note that this test is very basic and doesn't replace proper testing for randomness through statistical analysis. Signed-off-by: Moritz Wettermann <[email protected]>

This commit fixes a bug in p256_random_scalar. To generate a random scalar, randomness is taken from RND. The WDRs which contain the RND bits are then XORed with bits from URND, just in case there's any vulnerability in EDN that lets the attacker recover bits before they reach OTBN. But one of the three WDRs that contain RND bits isn't XORed with URND. This commit fixes this. Signed-off-by: Moritz Wettermann <[email protected]>

Signed-off-by: Moritz Wettermann <[email protected]>

This commit separates the field order p specific reduction algorithm from the modulo-multiplication function, and puts it into an own function. This allows us to use the reduction algorithm in other functions too. Signed-off-by: Moritz Wettermann <[email protected]>

This commit adds code to convert from arithmetic masking to boolean masking. The conversion is implemented following Gaubin's algorithm to prevent side-channel leakage. The conversion is implemented in a 385-bit domain to enable carry bit handling, which is necessary when the arithmetic masking is performed within a mod p domain. Further, this commit adds the relevant tests. Signed-off-by: Moritz Wettermann <[email protected]>

This makes the test easier to use and parallelize. Signed-off-by: Jade Philipoom <[email protected]>

This reduces the code size for binaries that only need to include p256_sign. No code is changed in this commit, only moved. Signed-off-by: Jade Philipoom <[email protected]>

This entrypoint is designed to be always-loaded during boot and run all of the crypto that ROM_EXT needs for attestation and for verifying code signatures. Signed-off-by: Jade Philipoom <[email protected]>

The number of encryptions per batch (number of segments) is determined by the ChipWhisperer Husky scope and depends on the number of samples per segment. Now that we clock ratios changed and we're capturing fewer samples per segment, the number of segments increases and the buffers storing the pre-generated plaintexts and keys need to be increased. Previously, the buffer size was chosen very pessimistically. It should be perfectly fine to allocate space for up to 256 plaintexts and keys (8 kBytes) as we're having 128 kBytes of main SRAM on all relevant top levels. Signed-off-by: Pirmin Vogel <[email protected]>

Added two commands to support waverunner batch capture. Binary generated at: https://github.com/vogelpi/opentitan/tree/ot-sca_2023-07-15_dd8b709_cw310 Signed-off-by: Johann Heyszl <[email protected]>

Signed-off-by: Jade Philipoom <[email protected]>

In order to comply with the Derived Test Requirements for AES TVLA all input data needs to be generated using AES-based PRNG. This commit replaces the currently used Mersenne twister PRNG with SW AES running on ibex. Signed-off-by: Vladimir Rozic <[email protected]>

This builds on the previous commits and means that we can now use the CSR names instead of (somewhat arcane) CSR indices. Signed-off-by: Rupert Swarbrick <[email protected]>

Signed-off-by: Rupert Swarbrick <[email protected]>

This commit - adds entrypoint code for P-384 ECDSA. It is split up into three separate binaries which correspond to the main ECDSA operations key generation, signing, and verifying. This separation is necessary due to the limited IMEM size. Further, in case of signature verification, validity checking of the public key has to be done in advance via another separate OTBN binary (similar to ECDH). - relocates several functions into own files to minimize IMEM usage. - removes mod_inv_var and replaces every call by a call to mod_inv_n_p384. This is necessary because mod_inv_var requires much more IMEM. Signed-off-by: Moritz Wettermann <[email protected]>

ECDH was the only call site for `p256_scalar_mult`, so this commit renames it to `p256_shared_key` and further specializes it to ECDH shared-key generation. Then, it can be moved to `p256_a2b.s`, which I then renamed to `p256_shared_key.s` to better fit its new purpose. Signed-off-by: Jade Philipoom <[email protected]>

This change speeds up ECDSA signing and verification by about 20%. The older modular multiply is still used for elements of the scalar field (i.e. arithmetic modulo the curve order n). However, the performance-critical multiplications in the coordinate field (i.e. arithmetic modulo the prime p) use the new, specialized routine. Signed-off-by: Jade Philipoom <[email protected]>

I found this useful for unit testing. Signed-off-by: Jade Philipoom <[email protected]>

As of FIPS 186-5, it is permissible to restrict the primes p and q to a certain value modulo 8. If this value is 3 or 7 (i.e. the candidate prime is always 3 mod 4), then constant-time Miller-Rabin becomes a lot simplier. This is nice from a code-complexity and performance perspective, but the code-size benefit is the real prize. RSA key generation is close to the IMEM size limit, so this frees up space for further speed improvements. Signed-off-by: Jade Philipoom <[email protected]>

Setting these bits is allowed by FIPS 186-5 and lets us skip the range check on p and q. This commit also removes the tests that ensure we check the range, since now it's just a postcondition of `generate_prime_candidate` and we never expect an out-of-range value in `check_p` or `check_q`. Signed-off-by: Jade Philipoom <[email protected]>

Signed-off-by: Jade Philipoom <[email protected]>

Previously, some tests sent randomly-generated candidate primes that were not 3 mod 4 through the primality check; now that the primality check is specialized to primes that are 3 mod 4, the test data needs to change. Signed-off-by: Jade Philipoom <[email protected]>

This is essentially a division followed by deriving d; since the `derive_d` procedure originally tail-called `rsa_keygen` if `d` was invalid, it also required splitting `derive_d` into `derive_d` and `check_d` so that the key-from-cofactor mode could only call `derive_d`. Signed-off-by: Jade Philipoom <[email protected]>

Implements the same algorithm we used in the specialized RSA-3072 implementation to speed up computation of R^2. Also fixes an outdated comment in rsa_1024_enc_test. Signed-off-by: Jade Philipoom <[email protected]>

sameo · 2024-06-19T14:05:13Z

Just a couple notes in response, but no blockers. This LGTM now.

👍

I added the silicon creator code back in. We will need it with Darjeeling sooner than later.

I think you'll also need the keymgr changes under silicon_creator/lib/drivers to make it work

I believe they are in the PR, see a62ea53. Or are you talking about something else?

Since this is already a pretty big change, I'd recommend keeping it out of this PR and merging it in separately along with the necessary driver changes, maybe simply by checking out the latest versions rather than cherry-picking commits. One can always refer to the history on master if needed.

Makes sense to me.

I dont think we want p384 for now, as this would have some real RTL changes impact (double imem, iiuc).

Not yet, actually -- the P384 code is still made to fit in 4kB IMEM, although it will soon be refactored to make better use of the new 8kB space. But probably best not to lump that in with this PR anyway; like the otbn_boot_services lib, you can always checkout or cherry-pick the code separately.

Noted, thanks.

jadephilipoom · 2024-06-21T12:24:58Z

I think you'll also need the keymgr changes under silicon_creator/lib/drivers to make it work

I believe they are in the PR, see a62ea53. Or are you talking about something else?

My mistake, those are indeed the changes I meant!

sameo · 2024-06-26T07:58:28Z

@jadephilipoom @moidx As this PR s too big for GitHub, I am trying to split into two logically independent ones.

I cut the line at @jadephilipoom [crypto] Speed up computation of R^2 in RSA.commit for now. Let's see if that passes CI.

sameo · 2024-06-26T11:23:11Z

@jadephilipoom @moidx As this PR s too big for GitHub, I am trying to split into two logically independent ones.

I cut the line at @jadephilipoom [crypto] Speed up computation of R^2 in RSA.commit for now. Let's see if that passes CI.

So that seems to work, and the only think we would need to get that PR merged is a waiver for the hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv changes. FWIW they obviously build and pass CI.

andreaskurth · 2024-06-27T08:35:35Z

CHANGE AUTHORIZED: hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv

This only changes the CW310 instance of Darjeeling, which isn't part of RTL that goes into TO. Nonetheless, double checking with Rivos HW teams that this is fine? @Razer6 @neeraj-rv

Razer6 · 2024-06-27T09:37:56Z

This only changes the CW310 instance of Darjeeling, which isn't part of RTL that goes into TO. Nonetheless, double checking with Rivos HW teams that this is fine? @Razer6 @neeraj-rv

This is fine.

jadephilipoom · 2024-06-27T09:50:05Z

CHANGE AUTHORIZED: hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv

The rationale above makes sense, LGTM.

sameo marked this pull request as draft June 4, 2024 10:06

sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch 2 times, most recently from 3c30bb6 to fc5258c Compare June 5, 2024 04:24

sameo added the Component:Darjeeling label Jun 5, 2024

Abdullah Varici and others added 7 commits June 10, 2024 09:51

[sca] Add batch capture for ECC256 keygen

aefe0f9

Adds batch capture for ECC256 keygen TVLA fixed-vs-random key test. Signed-off-by: Vladimir Rozic <[email protected]>

sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch from fc5258c to 401e634 Compare June 10, 2024 12:25

sameo marked this pull request as ready for review June 10, 2024 12:32

sameo requested a review from a team as a code owner June 10, 2024 12:32

sameo requested review from pamaury and removed request for a team June 10, 2024 12:32

bilgiday and others added 10 commits June 10, 2024 14:47

[sca] Update the ecdsa384_sca code

37d4677

Update ecc384_serial.c and p384_ecdsa_sca.s files to be compatible with the latest capture.py file in the ot-sca repo. Signed-off-by: Bilgiday Yuce <[email protected]>

[sw/crypto] Fix format/comment + remove redundant instr. in P384 sign

4924b55

Signed-off-by: Moritz Wettermann <[email protected]>

[sw/crypto] Fix a few comments in P-256 implementation

26d8a24

Signed-off-by: Moritz Wettermann <[email protected]>

[crypto] Separate large RSA keygen test into multiple targets.

60828d8

This makes the test easier to use and parallelize. Signed-off-by: Jade Philipoom <[email protected]>

[crypto] Move verify-only ECDSA-P256 subroutines to p256_verify.

dc32044

This reduces the code size for binaries that only need to include p256_sign. No code is changed in this commit, only moved. Signed-off-by: Jade Philipoom <[email protected]>

[rom_ext] Add a boot-time OTBN program.

8002441

This entrypoint is designed to be always-loaded during boot and run all of the crypto that ROM_EXT needs for attestation and for verifying code signatures. Signed-off-by: Jade Philipoom <[email protected]>

sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch 2 times, most recently from f85b9f7 to c6e3909 Compare June 10, 2024 13:07

vogelpi and others added 16 commits June 19, 2024 15:57

[sca, aes] Add commands for alternative batch capture

add6cae

Added two commands to support waverunner batch capture. Binary generated at: https://github.com/vogelpi/opentitan/tree/ot-sca_2023-07-15_dd8b709_cw310 Signed-off-by: Johann Heyszl <[email protected]>

[crypto] Remove a redundant load.

13a1973

Signed-off-by: Jade Philipoom <[email protected]>

[otbn,sw] Tweak assembly to use named OTBN CSRs

14c4bf5

This builds on the previous commits and means that we can now use the CSR names instead of (somewhat arcane) CSR indices. Signed-off-by: Rupert Swarbrick <[email protected]>

[otbn,sw] Tweak assembly to use named WSRs

038db23

Signed-off-by: Rupert Swarbrick <[email protected]>

[crypto] Add a test for multiplication modulo p.

92f3bec

I found this useful for unit testing. Signed-off-by: Jade Philipoom <[email protected]>

[crypto] Re-set bottom 2 bits in candidate RSA primes.

aea48ef

Signed-off-by: Jade Philipoom <[email protected]>

[crypto] Speed up computation of R^2 in RSA.

a6b37ac

Implements the same algorithm we used in the specialized RSA-3072 implementation to speed up computation of R^2. Also fixes an outdated comment in rsa_1024_enc_test. Signed-off-by: Jade Philipoom <[email protected]>

sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch from c950fcb to df3adb4 Compare June 19, 2024 13:57

sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch 2 times, most recently from b053288 to 6ddd758 Compare June 20, 2024 11:07

sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch from 6ddd758 to a6b37ac Compare June 26, 2024 07:56

jadephilipoom merged commit 55dd165 into lowRISC:integrated_dev Jun 27, 2024
21 checks passed

This was referenced Jun 27, 2024

[sw, otbn, crypto] Sync with the master branch - Part 2 #23825

Merged

[top_darjeeling] OTBN SW crypto sync - p256 #23478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sw, otbn, crypto] Sync with the master branch #23492

[sw, otbn, crypto] Sync with the master branch #23492

sameo commented Jun 4, 2024

sameo commented Jun 5, 2024

sameo commented Jun 10, 2024 •

edited

Loading

sameo commented Jun 19, 2024

jadephilipoom commented Jun 21, 2024

sameo commented Jun 26, 2024

sameo commented Jun 26, 2024

andreaskurth commented Jun 27, 2024

Razer6 commented Jun 27, 2024

jadephilipoom commented Jun 27, 2024

[sw, otbn, crypto] Sync with the master branch #23492

[sw, otbn, crypto] Sync with the master branch #23492

Conversation

sameo commented Jun 4, 2024

sameo commented Jun 5, 2024

sameo commented Jun 10, 2024 • edited Loading

sameo commented Jun 19, 2024

jadephilipoom commented Jun 21, 2024

sameo commented Jun 26, 2024

sameo commented Jun 26, 2024

andreaskurth commented Jun 27, 2024

Razer6 commented Jun 27, 2024

jadephilipoom commented Jun 27, 2024

sameo commented Jun 10, 2024 •

edited

Loading