Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sw, otbn, crypto] Sync with the master branch #23492

Merged

Conversation

sameo
Copy link
Contributor

@sameo sameo commented Jun 4, 2024

We applied all changes from master, for the sw/otbn/crypto folder.

Fixes #23478

Fixes #23479

@sameo sameo marked this pull request as draft June 4, 2024 10:06
@sameo sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch 2 times, most recently from 3c30bb6 to fc5258c Compare June 5, 2024 04:24
@sameo
Copy link
Contributor Author

sameo commented Jun 5, 2024

This is built on top of #23480 for now, until it gets merged.

Abdullah Varici and others added 7 commits June 10, 2024 09:51
In FPGA implementation of the earlgrey, blanking is not used
for Bignum RF due to high resource consumption and this causes
leakage with FPGA experiments. This is an intermediate solution.
We are keeping shares in registers with higher Hamming distance
between addresses so that we are reducing the probability of
reading out both shares when address switches.

Co-authored-by: Jade Philipoom <[email protected]>
Signed-off-by: Abdullah Varici <[email protected]>
This changes are done regarding to outcomes of the COCO-Alma
tool usage. Moreover, many more rules are put in the OTBN Security
Guidance Document. Please see it before developing algorithms
on OTBN which handle secret data.

If we don't have these changes, the tool reports there
are transient leakage and we may see leakage on the final
product even if we don't see a leakage on FPGA experiments.

We are also seeing a stable leakage on the MSB of the
intermediate values. We acknowledge that and since the
algorithm runs with 64-bit excess randomness, we don't
expect that to be possible to use that leakage and
retrieve secret values. We also verified that the stable
leakage disappeared after running the routine on 320-bit
instead of 321-bit.

Signed-off-by: Abdullah Varici <[email protected]>
Previously, kmac_disable_masking() did not correctly initialize the full
KMAC config struct. As a result of using uninitialized memory, it could
happen that the endianness when reading the state got changed which
completely broke all SHA3 captures.

This commit fixes that by creating a static KMAC config which, for the
sake of documentation, initializes all relevant struct variables to the
correct value. kmac_disable_masking() then just modifies the variables
required to switch off the masking.

Signed-off-by: Pirmin Vogel <[email protected]>
Adds batch capture for ECC256 keygen TVLA fixed-vs-random key test.

Signed-off-by: Vladimir Rozic <[email protected]>
This commit modifies the p256_scalar_mult function so that it returns
an arithmetically masked result. The arithmetic masking is done before
converting back to affine coordinates. For this the internal
scalar_mult_int function is changed to return projective coordinates.
The transformation to affine coordinates now needs to be done outside
scalar_mult_int.

This commit also adds code to convert from arithmetic masking to
boolean masking. The conversion is implemented following Gaubin's
algorithm to prevent side-channel leakage. The conversion is
implemented in a 257-bit domain to enable carry bit handling,
which is necessary when the arithmetic masking is performed
within a mod p domain.

Both described changes are then put together to generate the P256
ECDH shared session key.

Further, this commit also adds the relevant tests.

Signed-off-by: Moritz Wettermann <[email protected]>
The changes to P256 OTBN software, introduced in the previous commit,
increase code size so much that IMEM of OTBN is too small for the
application. Therefore, this commit restructures P256 code to prevent
an IMEM overflow. For this, p256.s is split into several individual
files containing different functions. In case the functions are not
needed for a specific binary, they now won't get inserted into the
binary anymore.

Signed-off-by: Moritz Wettermann <[email protected]>
This commit includes the following changes:
- The internal scalar multiplication function now requires the scalar
value to be handed over as additive (modulo n) shares.
- Therefore, scalar k and private key d are now stored in shares in DMEM
- The shares are expanded by 64 bits to 448-bit values. This is a
side-channel protection measurement.
- Scalar and base point multiplication functions as well as the signing
function are adapted to the mentioned scalar/key handling.
- For signing the shares are masked multiplicatively to not reveal the
scalar/key value via side-channel leakage during signature computation.
- For multiplicative masking a new 448x128-bit multiplication function
is added, as well as a modular reduction function (modulo n).
- The mentioned changes increase code size quite significantly. Because
of this, (internal) scalar and base point multiplication are separated
into own source code files to prevent IMEM overflow.

Signed-off-by: Moritz Wettermann <[email protected]>
@sameo sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch from fc5258c to 401e634 Compare June 10, 2024 12:25
@sameo
Copy link
Contributor Author

sameo commented Jun 10, 2024

@jadephilipoom I rebased this PR on top of integrated_dev, now that we got the hw/otbn changes in.

I had to pull the sw/device/sca changes as well because the current code would not build with the new sw/otbn/crypto, see https://dev.azure.com/lowrisc/opentitan/_build/results?buildId=153946&view=logs&j=5df5adf8-3577-5927-f0b2-31afed2697ee&t=1253ba27-4e8c-57af-d0ba-888237644875

@sameo sameo marked this pull request as ready for review June 10, 2024 12:32
@sameo sameo requested a review from a team as a code owner June 10, 2024 12:32
@sameo sameo requested review from pamaury and removed request for a team June 10, 2024 12:32
bilgiday and others added 10 commits June 10, 2024 14:47
Update ecc384_serial.c and p384_ecdsa_sca.s files to be compatible with
the latest capture.py file in the ot-sca repo.

Signed-off-by: Bilgiday Yuce <[email protected]>
This commit adds three functions.
One to generate a nonzero random value in the scalar field which is then
used by two other functions to generate either a random secret key d or
a random secret scalar k.
Random values are returned as two additive 448-bit shares (modulo n).

Further, a test is added to check if generated values of d and k are
distinct and in range. Note that this test is very basic and doesn't
replace proper testing for randomness through statistical analysis.

Signed-off-by: Moritz Wettermann <[email protected]>
This commit fixes a bug in p256_random_scalar.
To generate a random scalar, randomness is taken from RND. The WDRs which
contain the RND bits are then XORed with bits from URND, just in case
there's any vulnerability in EDN that lets the attacker recover bits
before they reach OTBN.
But one of the three WDRs that contain RND bits isn't XORed with URND.
This commit fixes this.

Signed-off-by: Moritz Wettermann <[email protected]>
This commit separates the field order p specific reduction algorithm
from the modulo-multiplication function, and puts it into an own
function. This allows us to use the reduction algorithm in other
functions too.

Signed-off-by: Moritz Wettermann <[email protected]>
This commit adds code to convert from arithmetic masking to
boolean masking. The conversion is implemented following Gaubin's
algorithm to prevent side-channel leakage. The conversion is
implemented in a 385-bit domain to enable carry bit handling,
which is necessary when the arithmetic masking is performed
within a mod p domain.
Further, this commit adds the relevant tests.

Signed-off-by: Moritz Wettermann <[email protected]>
This makes the test easier to use and parallelize.

Signed-off-by: Jade Philipoom <[email protected]>
This reduces the code size for binaries that only need to include
p256_sign. No code is changed in this commit, only moved.

Signed-off-by: Jade Philipoom <[email protected]>
This entrypoint is designed to be always-loaded during boot and run all
of the crypto that ROM_EXT needs for attestation and for verifying code
signatures.

Signed-off-by: Jade Philipoom <[email protected]>
@sameo sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch 2 times, most recently from f85b9f7 to c6e3909 Compare June 10, 2024 13:07
vogelpi and others added 16 commits June 19, 2024 15:57
The number of encryptions per batch (number of segments) is determined
by the ChipWhisperer Husky scope and depends on the number of samples
per segment. Now that we clock ratios changed and we're capturing
fewer samples per segment, the number of segments increases and the
buffers storing the pre-generated plaintexts and keys need to be
increased. Previously, the buffer size was chosen very pessimistically.
It should be perfectly fine to allocate space for up to 256 plaintexts
and keys (8 kBytes) as we're having 128 kBytes of main SRAM on all
relevant top levels.

Signed-off-by: Pirmin Vogel <[email protected]>
Added two commands to support waverunner batch capture. Binary generated at:
https://github.com/vogelpi/opentitan/tree/ot-sca_2023-07-15_dd8b709_cw310

Signed-off-by: Johann Heyszl <[email protected]>
In order to comply with the Derived Test Requirements for AES TVLA
all input data needs to be generated using AES-based PRNG.
This commit replaces the currently used Mersenne twister PRNG with
SW AES running on ibex.

Signed-off-by: Vladimir Rozic <[email protected]>
This builds on the previous commits and means that we can now use the
CSR names instead of (somewhat arcane) CSR indices.

Signed-off-by: Rupert Swarbrick <[email protected]>
This commit
- adds entrypoint code for P-384 ECDSA. It is split up into three
separate binaries which correspond to the main ECDSA operations key
generation, signing, and verifying. This separation is necessary due to
the limited IMEM size. Further, in case of signature verification,
validity checking of the public key has to be done in advance via
another separate OTBN binary (similar to ECDH).
- relocates several functions into own files to minimize IMEM usage.
- removes mod_inv_var and replaces every call by a call to
mod_inv_n_p384. This is necessary because mod_inv_var requires much more
IMEM.

Signed-off-by: Moritz Wettermann <[email protected]>
ECDH was the only call site for `p256_scalar_mult`, so this commit
renames it to `p256_shared_key` and further specializes it to ECDH
shared-key generation.  Then, it can be moved to `p256_a2b.s`, which I
then renamed to `p256_shared_key.s` to better fit its new purpose.

Signed-off-by: Jade Philipoom <[email protected]>
This change speeds up ECDSA signing and verification by about 20%. The
older modular multiply is still used for elements of the scalar field
(i.e. arithmetic modulo the curve order n). However, the
performance-critical multiplications in the coordinate field (i.e.
arithmetic modulo the prime p) use the new, specialized routine.

Signed-off-by: Jade Philipoom <[email protected]>
I found this useful for unit testing.

Signed-off-by: Jade Philipoom <[email protected]>
As of FIPS 186-5, it is permissible to restrict the primes p and q to a
certain value modulo 8. If this value is 3 or 7 (i.e. the candidate
prime is always 3 mod 4), then constant-time Miller-Rabin becomes a lot
simplier. This is nice from a code-complexity and performance
perspective, but the code-size benefit is the real prize. RSA key
generation is close to the IMEM size limit, so this frees up space for
further speed improvements.

Signed-off-by: Jade Philipoom <[email protected]>
Setting these bits is allowed by FIPS 186-5 and lets us skip the range
check on p and q. This commit also removes the tests that ensure we
check the range, since now it's just a postcondition of
`generate_prime_candidate` and we never expect an out-of-range value in
`check_p` or `check_q`.

Signed-off-by: Jade Philipoom <[email protected]>
Previously, some tests sent randomly-generated candidate primes that
were not 3 mod 4 through the primality check; now that the primality
check is specialized to primes that are 3 mod 4, the test data needs
to change.

Signed-off-by: Jade Philipoom <[email protected]>
This is essentially a division followed by deriving d; since the
`derive_d` procedure originally tail-called `rsa_keygen` if `d` was
invalid, it also required splitting `derive_d` into `derive_d` and
`check_d` so that the key-from-cofactor mode could only call `derive_d`.

Signed-off-by: Jade Philipoom <[email protected]>
Implements the same algorithm we used in the specialized RSA-3072
implementation to speed up computation of R^2.

Also fixes an outdated comment in rsa_1024_enc_test.

Signed-off-by: Jade Philipoom <[email protected]>
@sameo sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch from c950fcb to df3adb4 Compare June 19, 2024 13:57
@sameo
Copy link
Contributor Author

sameo commented Jun 19, 2024

Just a couple notes in response, but no blockers. This LGTM now.

👍

I added the silicon creator code back in. We will need it with Darjeeling sooner than later.

I think you'll also need the keymgr changes under silicon_creator/lib/drivers to make it work

I believe they are in the PR, see a62ea53. Or are you talking about something else?

Since this is already a pretty big change, I'd recommend keeping it out of this PR and merging it in separately along with the necessary driver changes, maybe simply by checking out the latest versions rather than cherry-picking commits. One can always refer to the history on master if needed.

Makes sense to me.

I dont think we want p384 for now, as this would have some real RTL changes impact (double imem, iiuc).

Not yet, actually -- the P384 code is still made to fit in 4kB IMEM, although it will soon be refactored to make better use of the new 8kB space. But probably best not to lump that in with this PR anyway; like the otbn_boot_services lib, you can always checkout or cherry-pick the code separately.

Noted, thanks.

@sameo sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch 2 times, most recently from b053288 to 6ddd758 Compare June 20, 2024 11:07
@jadephilipoom
Copy link
Contributor

I think you'll also need the keymgr changes under silicon_creator/lib/drivers to make it work

I believe they are in the PR, see a62ea53. Or are you talking about something else?

My mistake, those are indeed the changes I meant!

@sameo sameo force-pushed the topic/dj-otbn-sw-crypto-p256 branch from 6ddd758 to a6b37ac Compare June 26, 2024 07:56
@sameo
Copy link
Contributor Author

sameo commented Jun 26, 2024

@jadephilipoom @moidx As this PR s too big for GitHub, I am trying to split into two logically independent ones.

I cut the line at @jadephilipoom [crypto] Speed up computation of R^2 in RSA.commit for now. Let's see if that passes CI.

@sameo
Copy link
Contributor Author

sameo commented Jun 26, 2024

@jadephilipoom @moidx As this PR s too big for GitHub, I am trying to split into two logically independent ones.

I cut the line at @jadephilipoom [crypto] Speed up computation of R^2 in RSA.commit for now. Let's see if that passes CI.

So that seems to work, and the only think we would need to get that PR merged is a waiver for the hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv changes. FWIW they obviously build and pass CI.

@andreaskurth
Copy link
Contributor

CHANGE AUTHORIZED: hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv

This only changes the CW310 instance of Darjeeling, which isn't part of RTL that goes into TO. Nonetheless, double checking with Rivos HW teams that this is fine? @Razer6 @neeraj-rv

@Razer6
Copy link
Member

Razer6 commented Jun 27, 2024

This only changes the CW310 instance of Darjeeling, which isn't part of RTL that goes into TO. Nonetheless, double checking with Rivos HW teams that this is fine? @Razer6 @neeraj-rv

This is fine.

@jadephilipoom
Copy link
Contributor

CHANGE AUTHORIZED: hw/top_darjeeling/rtl/autogen/chip_darjeeling_cw310.sv

The rationale above makes sense, LGTM.

@jadephilipoom jadephilipoom merged commit 55dd165 into lowRISC:integrated_dev Jun 27, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants