-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sw, otbn, crypto] Sync with the master branch #23492
[sw, otbn, crypto] Sync with the master branch #23492
Conversation
3c30bb6
to
fc5258c
Compare
This is built on top of #23480 for now, until it gets merged. |
In FPGA implementation of the earlgrey, blanking is not used for Bignum RF due to high resource consumption and this causes leakage with FPGA experiments. This is an intermediate solution. We are keeping shares in registers with higher Hamming distance between addresses so that we are reducing the probability of reading out both shares when address switches. Co-authored-by: Jade Philipoom <[email protected]> Signed-off-by: Abdullah Varici <[email protected]>
This changes are done regarding to outcomes of the COCO-Alma tool usage. Moreover, many more rules are put in the OTBN Security Guidance Document. Please see it before developing algorithms on OTBN which handle secret data. If we don't have these changes, the tool reports there are transient leakage and we may see leakage on the final product even if we don't see a leakage on FPGA experiments. We are also seeing a stable leakage on the MSB of the intermediate values. We acknowledge that and since the algorithm runs with 64-bit excess randomness, we don't expect that to be possible to use that leakage and retrieve secret values. We also verified that the stable leakage disappeared after running the routine on 320-bit instead of 321-bit. Signed-off-by: Abdullah Varici <[email protected]>
Previously, kmac_disable_masking() did not correctly initialize the full KMAC config struct. As a result of using uninitialized memory, it could happen that the endianness when reading the state got changed which completely broke all SHA3 captures. This commit fixes that by creating a static KMAC config which, for the sake of documentation, initializes all relevant struct variables to the correct value. kmac_disable_masking() then just modifies the variables required to switch off the masking. Signed-off-by: Pirmin Vogel <[email protected]>
Adds batch capture for ECC256 keygen TVLA fixed-vs-random key test. Signed-off-by: Vladimir Rozic <[email protected]>
This commit modifies the p256_scalar_mult function so that it returns an arithmetically masked result. The arithmetic masking is done before converting back to affine coordinates. For this the internal scalar_mult_int function is changed to return projective coordinates. The transformation to affine coordinates now needs to be done outside scalar_mult_int. This commit also adds code to convert from arithmetic masking to boolean masking. The conversion is implemented following Gaubin's algorithm to prevent side-channel leakage. The conversion is implemented in a 257-bit domain to enable carry bit handling, which is necessary when the arithmetic masking is performed within a mod p domain. Both described changes are then put together to generate the P256 ECDH shared session key. Further, this commit also adds the relevant tests. Signed-off-by: Moritz Wettermann <[email protected]>
The changes to P256 OTBN software, introduced in the previous commit, increase code size so much that IMEM of OTBN is too small for the application. Therefore, this commit restructures P256 code to prevent an IMEM overflow. For this, p256.s is split into several individual files containing different functions. In case the functions are not needed for a specific binary, they now won't get inserted into the binary anymore. Signed-off-by: Moritz Wettermann <[email protected]>
This commit includes the following changes: - The internal scalar multiplication function now requires the scalar value to be handed over as additive (modulo n) shares. - Therefore, scalar k and private key d are now stored in shares in DMEM - The shares are expanded by 64 bits to 448-bit values. This is a side-channel protection measurement. - Scalar and base point multiplication functions as well as the signing function are adapted to the mentioned scalar/key handling. - For signing the shares are masked multiplicatively to not reveal the scalar/key value via side-channel leakage during signature computation. - For multiplicative masking a new 448x128-bit multiplication function is added, as well as a modular reduction function (modulo n). - The mentioned changes increase code size quite significantly. Because of this, (internal) scalar and base point multiplication are separated into own source code files to prevent IMEM overflow. Signed-off-by: Moritz Wettermann <[email protected]>
fc5258c
to
401e634
Compare
@jadephilipoom I rebased this PR on top of integrated_dev, now that we got the I had to pull the |
Update ecc384_serial.c and p384_ecdsa_sca.s files to be compatible with the latest capture.py file in the ot-sca repo. Signed-off-by: Bilgiday Yuce <[email protected]>
Signed-off-by: Moritz Wettermann <[email protected]>
This commit adds three functions. One to generate a nonzero random value in the scalar field which is then used by two other functions to generate either a random secret key d or a random secret scalar k. Random values are returned as two additive 448-bit shares (modulo n). Further, a test is added to check if generated values of d and k are distinct and in range. Note that this test is very basic and doesn't replace proper testing for randomness through statistical analysis. Signed-off-by: Moritz Wettermann <[email protected]>
This commit fixes a bug in p256_random_scalar. To generate a random scalar, randomness is taken from RND. The WDRs which contain the RND bits are then XORed with bits from URND, just in case there's any vulnerability in EDN that lets the attacker recover bits before they reach OTBN. But one of the three WDRs that contain RND bits isn't XORed with URND. This commit fixes this. Signed-off-by: Moritz Wettermann <[email protected]>
Signed-off-by: Moritz Wettermann <[email protected]>
This commit separates the field order p specific reduction algorithm from the modulo-multiplication function, and puts it into an own function. This allows us to use the reduction algorithm in other functions too. Signed-off-by: Moritz Wettermann <[email protected]>
This commit adds code to convert from arithmetic masking to boolean masking. The conversion is implemented following Gaubin's algorithm to prevent side-channel leakage. The conversion is implemented in a 385-bit domain to enable carry bit handling, which is necessary when the arithmetic masking is performed within a mod p domain. Further, this commit adds the relevant tests. Signed-off-by: Moritz Wettermann <[email protected]>
This makes the test easier to use and parallelize. Signed-off-by: Jade Philipoom <[email protected]>
This reduces the code size for binaries that only need to include p256_sign. No code is changed in this commit, only moved. Signed-off-by: Jade Philipoom <[email protected]>
This entrypoint is designed to be always-loaded during boot and run all of the crypto that ROM_EXT needs for attestation and for verifying code signatures. Signed-off-by: Jade Philipoom <[email protected]>
f85b9f7
to
c6e3909
Compare
The number of encryptions per batch (number of segments) is determined by the ChipWhisperer Husky scope and depends on the number of samples per segment. Now that we clock ratios changed and we're capturing fewer samples per segment, the number of segments increases and the buffers storing the pre-generated plaintexts and keys need to be increased. Previously, the buffer size was chosen very pessimistically. It should be perfectly fine to allocate space for up to 256 plaintexts and keys (8 kBytes) as we're having 128 kBytes of main SRAM on all relevant top levels. Signed-off-by: Pirmin Vogel <[email protected]>
Added two commands to support waverunner batch capture. Binary generated at: https://github.com/vogelpi/opentitan/tree/ot-sca_2023-07-15_dd8b709_cw310 Signed-off-by: Johann Heyszl <[email protected]>
Signed-off-by: Jade Philipoom <[email protected]>
In order to comply with the Derived Test Requirements for AES TVLA all input data needs to be generated using AES-based PRNG. This commit replaces the currently used Mersenne twister PRNG with SW AES running on ibex. Signed-off-by: Vladimir Rozic <[email protected]>
This builds on the previous commits and means that we can now use the CSR names instead of (somewhat arcane) CSR indices. Signed-off-by: Rupert Swarbrick <[email protected]>
Signed-off-by: Rupert Swarbrick <[email protected]>
This commit - adds entrypoint code for P-384 ECDSA. It is split up into three separate binaries which correspond to the main ECDSA operations key generation, signing, and verifying. This separation is necessary due to the limited IMEM size. Further, in case of signature verification, validity checking of the public key has to be done in advance via another separate OTBN binary (similar to ECDH). - relocates several functions into own files to minimize IMEM usage. - removes mod_inv_var and replaces every call by a call to mod_inv_n_p384. This is necessary because mod_inv_var requires much more IMEM. Signed-off-by: Moritz Wettermann <[email protected]>
ECDH was the only call site for `p256_scalar_mult`, so this commit renames it to `p256_shared_key` and further specializes it to ECDH shared-key generation. Then, it can be moved to `p256_a2b.s`, which I then renamed to `p256_shared_key.s` to better fit its new purpose. Signed-off-by: Jade Philipoom <[email protected]>
This change speeds up ECDSA signing and verification by about 20%. The older modular multiply is still used for elements of the scalar field (i.e. arithmetic modulo the curve order n). However, the performance-critical multiplications in the coordinate field (i.e. arithmetic modulo the prime p) use the new, specialized routine. Signed-off-by: Jade Philipoom <[email protected]>
I found this useful for unit testing. Signed-off-by: Jade Philipoom <[email protected]>
As of FIPS 186-5, it is permissible to restrict the primes p and q to a certain value modulo 8. If this value is 3 or 7 (i.e. the candidate prime is always 3 mod 4), then constant-time Miller-Rabin becomes a lot simplier. This is nice from a code-complexity and performance perspective, but the code-size benefit is the real prize. RSA key generation is close to the IMEM size limit, so this frees up space for further speed improvements. Signed-off-by: Jade Philipoom <[email protected]>
Setting these bits is allowed by FIPS 186-5 and lets us skip the range check on p and q. This commit also removes the tests that ensure we check the range, since now it's just a postcondition of `generate_prime_candidate` and we never expect an out-of-range value in `check_p` or `check_q`. Signed-off-by: Jade Philipoom <[email protected]>
Signed-off-by: Jade Philipoom <[email protected]>
Previously, some tests sent randomly-generated candidate primes that were not 3 mod 4 through the primality check; now that the primality check is specialized to primes that are 3 mod 4, the test data needs to change. Signed-off-by: Jade Philipoom <[email protected]>
This is essentially a division followed by deriving d; since the `derive_d` procedure originally tail-called `rsa_keygen` if `d` was invalid, it also required splitting `derive_d` into `derive_d` and `check_d` so that the key-from-cofactor mode could only call `derive_d`. Signed-off-by: Jade Philipoom <[email protected]>
Implements the same algorithm we used in the specialized RSA-3072 implementation to speed up computation of R^2. Also fixes an outdated comment in rsa_1024_enc_test. Signed-off-by: Jade Philipoom <[email protected]>
c950fcb
to
df3adb4
Compare
👍
I believe they are in the PR, see a62ea53. Or are you talking about something else?
Makes sense to me.
Noted, thanks. |
b053288
to
6ddd758
Compare
My mistake, those are indeed the changes I meant! |
6ddd758
to
a6b37ac
Compare
@jadephilipoom @moidx As this PR s too big for GitHub, I am trying to split into two logically independent ones. I cut the line at @jadephilipoom |
So that seems to work, and the only think we would need to get that PR merged is a waiver for the |
CHANGE AUTHORIZED: hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv This only changes the CW310 instance of Darjeeling, which isn't part of RTL that goes into TO. Nonetheless, double checking with Rivos HW teams that this is fine? @Razer6 @neeraj-rv |
This is fine. |
CHANGE AUTHORIZED: hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv The rationale above makes sense, LGTM. |
We applied all changes from
master
, for thesw/otbn/crypto
folder.Fixes #23478
Fixes #23479