Refactor: Have a common base for Kyber/Dilithium structures #4024

reneme · 2024-04-19T14:50:10Z

This is essentially a complete overhaul of the Kyber and Dilithium implementations aiming at as much shared code as possible between the two algorithms. The rough structure is outlined below. I split-up the changes into three commits, that could be separate pull requests, if needed.

Module: `pqcrystals`

Polynomial Structures
This is a templated base for the polynomials (as well as poly vectors/matrices) used in both implementations. The base implementation shares math where possible and is pluggable where necessary. It also exposes its domain (NTT or "normal") in the type system. Also, it makes some initial optimization effort (mostly memory layout), but more work could be done here.
Polynomial Encoding/Decoding
Both algorithms contain bit-packing algorithms to efficiently encode polynomial coefficients of various known value ranges. Both Kyber and Dilithium now share a generic and pluggable implementation of this.

Module: `kyber`

This now makes use of the pqcrystals module. The polynomial instantiation happens in kyber_polynomial.h (replacing the old kyber_structures.h). Most low-level "algorithms" (pseudo-codes of the spec) live in kyber_algos.cpp. The indcpa "K-PKE" scheme is implemented in kyber_keys.cpp and the cca2-secure wrapper can be found in kyber.cpp. Auxiliary stuff like constants and strong types are in other headers.

Note that this now contains almost no verbatim code from the reference implementation anymore.

Module: `dilithium`

Likewise, this uses the pqcrystals module (instantiating the polynomials in dilithium_polynomial.h (replacing the almost verbatim reference implementation code). The dilithium_algos.cpp is home to most low-level "algorithms" (as seen in the spec). dilithium.cpp houses the actual signature/verification algorihtms that bind it all together..

Care has been taken to have a consistent implementation structure between Kyber and Dilithium. Note, though, that Dilithium's flexibility might require some love before a clean integration of ML-KEM (similar to what happened to Kyber in #3887).

Benchmark

~~Despite not explicitly aiming to increase speed,~~ ./botan speed shows nice gains for Kyber and no losses for Dilithium. ~~It might be worth looking into why Kyber is so much faster, though. Feels suspicious.~~ Edit: The new implementation pre-computes and stores the expanded public matrix A. So as long as one re-uses the PK_KEM_Encryptor for multiple encapsulations, one saves a lot of invocations to the XOF. That explains. 😏

GCC 13.2 vs. Clang 18.1 on Linux (in WSL)

Algorithm	This PR (GCC)	Master (GCC)	This PR (Clang)	Master (Clang)
Dilithium-4x4-r3	4602 sign/sec	4332 sign/sec	5358 sign/sec	4681 sign/sec
Dilithium-4x4-r3	28913 verify/sec	31303 verify/sec	37226 verify/sec	35459 verify/sec
Dilithium-4x4-AES-r3	5050 sign/sec	4689 sign/sec	6363 sign/sec	5007 sign/sec
Dilithium-4x4-AES-r3	27824 verify/sec	31180 verify/sec	38452 verify/sec	34830 verify/sec
Dilithium-6x5-r3	2648 sign/sec	2695 sign/sec	2893 sign/sec	2830 sign/sec
Dilithium-6x5-r3	19250 verify/sec	20999 verify/sec	26201 verify/sec	24643 verify/sec
Dilithium-6x5-AES-r3	2848 sign/sec	2850 sign/sec	3398 sign/sec	3033 sign/sec
Dilithium-6x5-AES-r3	18503 verify/sec	21491 verify/sec	26030 verify/sec	24351 verify/sec
Dilithium-8x7-r3	2389 sign/sec	2251 sign/sec	2853 sign/sec	2543 sign/sec
Dilithium-8x7-r3	12768 verify/sec	14551 verify/sec	18622 verify/sec	17211 verify/sec
Dilithium-8x7-AES-r3	2758 sign/sec	2490 sign/sec	3044 sign/sec	2970 sign/sec
Dilithium-8x7-AES-r3	13856 verify/sec	13881 verify/sec	18468 verify/sec	17629 verify/sec

Algorithm	This PR (GCC)	Master (GCC)	This PR (Clang)	Master (Clang)
Kyber-512-r3	69864 encrypt/sec	39354 encrypt/sec	59956 encrypt/sec	38009 encrypt/sec
Kyber-512-r3	51452 decrypt/sec	32157 decrypt/sec	48114 decrypt/sec	32214 decrypt/sec
Kyber-512-90s-r3	81350 encrypt/sec	39878 encrypt/sec	75388 encrypt/sec	44527 encrypt/sec
Kyber-512-90s-r3	57531 decrypt/sec	32740 decrypt/sec	57515 decrypt/sec	37094 decrypt/sec
Kyber-768-r3	50542 encrypt/sec	22767 encrypt/sec	45523 encrypt/sec	21180 encrypt/sec
Kyber-768-r3	37306 decrypt/sec	19418 decrypt/sec	36505 decrypt/sec	18593 decrypt/sec
Kyber-768-90s-r3	48522 encrypt/sec	22167 encrypt/sec	59785 encrypt/sec	24793 encrypt/sec
Kyber-768-90s-r3	34923 decrypt/sec	19070 decrypt/sec	46596 decrypt/sec	21613 decrypt/sec
Kyber-1024-r3	35053 encrypt/sec	14597 encrypt/sec	40132 encrypt/sec	14884 encrypt/sec
Kyber-1024-r3	26609 decrypt/sec	12840 decrypt/sec	32397 decrypt/sec	13338 decrypt/sec
Kyber-1024-90s-r3	36312 encrypt/sec	14504 encrypt/sec	43981 encrypt/sec	16118 encrypt/sec
Kyber-1024-90s-r3	27100 decrypt/sec	13312 decrypt/sec	34729 decrypt/sec	14435 decrypt/sec

coveralls · 2024-04-22T07:06:38Z

coverage: 91.69% (-0.02%) from 91.711%
when pulling 8bd26c7 on Rohde-Schwarz:refactor/crystals_april
into 55007af on randombit:master.

reneme

Below are a few highlights and potential conversion starters of things I saw along the way. :)

reneme · 2024-06-20T15:45:41Z

src/lib/pubkey/kyber/kyber_common/kyber_helpers.h

+   // This is a mitigation for a potential side channel called "KyberSlash".
+   // It implements the division by Q using a multiplication and a shift. Most
+   // compilers would generate similar code for such a division by a constant.
+   // Though, in some cases, compilers might use a variable-time int division,
+   // resulting in a potential side channel.
+   //
+   //   See "Hacker's Delight" (Second Edition) by Henry S. Warren, Jr.
+   //   Chapter 10-9 "Unsigned Division by Divisors >= 1"
+   auto divide_by_q = [](uint32_t n) -> unsigned_T {
+      static_assert(KyberConstants::Q == 3329);
+      BOTAN_DEBUG_ASSERT(n < (1 << 23));
+
+      // These constants work for all values that appear in Kyber with the
+      // greatest being 3328 * 2^11 + Q // 2 = 6,817,408 < 2**23 = 8,388,608.
+      constexpr uint64_t m = 2580335;
+      constexpr size_t p = 33;
+      return static_cast<unsigned_T>((n * m) >> p);
+   };
+
+   constexpr unsigned_T mask = (1 << d) - 1;
+   return divide_by_q((static_cast<uint32_t>(x) << d) + KyberConstants::Q / 2) & mask;


I've adapted the KyberSlash countermeasure to also work for the compression instances with d = 10 and d = 11. Not sure it's necessary but doesn't hurt either. I'd definitely be happy about some other sharp pair of eyes.

reneme · 2024-06-20T15:49:01Z

src/lib/pubkey/pqcrystals/pqcrystals.h

+      // TODO: perhaps secure vector
+      std::vector<T> m_coeffs_storage;


Polynomials store their coefficients as a std::vector<>. Some of those store secret values and should therefore use secure_vector. Also, it might be beneficial to add first-level support for Polynomials (and friends) as Strong types. (Most likely this would be another pull request, though.)

reneme · 2024-06-20T15:58:19Z

src/lib/pubkey/pqcrystals/pqcrystals_encoding.h

+namespace detail {
+
+constexpr auto as_byte_source(BufferSlicer& slicer) {
+   return [&](std::span<uint8_t> out) { slicer.copy_into(out); };
+}
+
+constexpr auto as_byte_source(Botan::XOF& xof) {
+   return [&](std::span<uint8_t> out) { xof.output(out); };
+}
+
+}  // namespace detail
+
+template <typename T>
+concept byte_source =
+   requires(T& t) { requires std::invocable<decltype(detail::as_byte_source(t)), std::span<uint8_t>>; };


I feel like this is a top-level concept we might want to establish deeper in the library. Perhaps as an (internal) replacement for the DataSource class.

Namely: A thing that can provide a concrete number of bytes where I don't really know how many bytes will be needed. The source may be infinite or not.

Things that could fall in this abstraction may be:

the RandomNumberGenerator

a XOF

read from some buffer (think BufferSlicer)

from some I/O source (e.g. the network)

In Kyber/Dithithium this is used to either sample some random polynomial from a XOF or to unmarshal it from a bitpacked encoding.

Other use cases may be:

parsing in general (TLS, X.509 and friends)

data loading, e.g. like:
load_be<uint64_t>(rng) to get a random 64 bit integer. (See also Internal: load_be/le should accept a BufferSlicer #4068)

among other things

From a technical perspective, one might even implement it like shown here: As a C++20 concept that tries to build a thin wrapper around "the source". That way, new data sources could just hook themselves into this "helper" and decide for themselves to be a "data source" or not.

Cool idea 👍. I'm not sure if we want to commit to a pure concept construction, though. In some cases, it may be more convenient to have a class like DataSource. We could still extend the DataSource class to fulfill the respective concept if one wants to juggle with templates.

reneme · 2024-06-20T16:01:35Z

src/lib/pubkey/kyber/kyber_common/kyber.cpp

+   // TODO: Remove the need for the montgomery transformation
+   //
+   // -> When calculating A*s below, A is not in montgomery form, but s is. The
+   //    operation uses fqmul internally, which performs a montgomery reduction.
+   auto A = montgomery(kyber_sample_matrix(rho, false /* not transposed */, mode));


This was different in the reference implementation. @FAlbertDev we did look into this a while ago, I feel we should take this up again and get to the bottom of it.

I've just investigated the usage of Montgomery in Kyber. First, a short recap on Montgomery:
The main observation is that every Montgomery multiplication $c=a \cdot b$ adds a factor of $R^{-1}$ to the result c where $R$ is the Montgomery factor (i.e., we compute $a \cdot b \cdot R^{-1}$) At some point we need to multiply $R$ to neutralize this $R^{-1}$ factor. Therefore, we could multiply $R$ to either $a$ or $b$. However, we can also multiply $R$ to the result $c$.
In our Kyber and Dilithium code, the following operations add or remove the Montgomery factor.

the function montgomery adds a factor $R$

the function inv_ntt adds a factor $R$. Note that ntt doesn't add a Montgomery factor.

As already said, a Montgomery multiplication adds a factor $R^{-1}$ to each poly, vector, or matrix element. This holds for all multiplication types, i.e., polynomial-polynomial, vector-vector, matrix-vector multiplication, etc.

For encryption and decryption, this setup works well since there is always one multiplication (adding factor $R^{-1}$) followed by inv_ntt (adding factor $R$). This does not work here since we store the value t in NTT domain and, therefore, do not call inv_ntt. We have a factor of $R^{-1}$ to eliminate. Currently, we do that by adding the factor $R$ to the matrix beforehand using montgomery. Instead, I propose to add this factor to t after the multiplication. The fewer elements multiplied by $R$, the faster it should be.

auto A = kyber_sample_matrix(rho, false /* not transposed */, mode); auto s = ntt(ps.sample_polynomial_vector_cbd_eta1()); const auto e = ntt(ps.sample_polynomial_vector_cbd_eta1()); auto t = montgomery(A * s); t += e; t.reduce(); // End Algorithm 12 ---------------------------

Nice! Thanks for double-checking and the detailed write-up! It would probably be really nice to track the montgomery parameter in the type system as well, but it does feel like overkill.

Also: by applying montgomery() to the t vector, we can remove the overload for PolynomialMatrix altogether. :)

reneme · 2024-06-20T16:04:58Z

src/lib/pubkey/kyber/kyber_common/kyber_types.h

+using KyberPolyNTT = Botan::CRYSTALS::Polynomial<KyberPolyTraits, Botan::CRYSTALS::Domain::NTT>;
+using KyberPolyVecNTT = Botan::CRYSTALS::PolynomialVector<KyberPolyTraits, Botan::CRYSTALS::Domain::NTT>;
+using KyberPolyMat = Botan::CRYSTALS::PolynomialMatrix<KyberPolyTraits>;
+
+using KyberPoly = Botan::CRYSTALS::Polynomial<KyberPolyTraits, Botan::CRYSTALS::Domain::Normal>;
+using KyberPolyVec = Botan::CRYSTALS::PolynomialVector<KyberPolyTraits, Botan::CRYSTALS::Domain::Normal>;


As mentioned before: it would be good if these could be the basis for strong types to disambiguate them in interfaces E.g. to replace the multiple poly_pack_t0, ..._t1, ... methods in dilithium_algos.cpp with an overloaded poly_pack() that just "does the right thing" ™ based on the strong type.

src/lib/pubkey/pqcrystals/pqcrystals_helpers.h

src/lib/xof/xof.h

reneme · 2024-06-20T16:13:28Z

src/lib/pubkey/dilithium/dilithium_common/dilithium_constants.h

+      enum DilithiumTau : uint32_t { _39 = 39, _49 = 49, _60 = 60 };
+
+      enum DilithiumLambda : uint32_t { _128 = 128, _192 = 192, _256 = 256 };
+
+      enum DilithiumGamma1 : uint32_t { ToThe17th = (1 << 17), ToThe19th = (1 << 19) };
+
+      enum DilithiumGamma2 : uint32_t { Qminus1DevidedBy88 = (Q - 1) / 88, Qminus1DevidedBy32 = (Q - 1) / 32 };
+
+      enum DilithiumEta : uint32_t { _2 = 2, _4 = 4 };
+
+      enum DilithiumBeta : uint32_t { _78 = 78, _196 = 196, _120 = 120 };
+
+      enum DilithiumOmega : uint32_t { _80 = 80, _55 = 55, _75 = 75 };


Dilithium has many integer constants that change depending on the instantiation of the algorithm. I found it worthwhile to restrict them with explicit enums. I'm okay with this concept, also given the restricted (algorithm-internal) scope. But I feel there must be a better way for this. :)

I also like this idea. While the implied switch-cases are annoying, they guarantee we do not forget a value.

reneme · 2024-06-20T16:15:39Z

src/lib/pubkey/dilithium/dilithium_common/dilithium_algos.cpp

+      dilithium_poly_pack_t0(p, stuffer);
+   }
+
+   BOTAN_ASSERT_NOMSG(stuffer.full());


Those became ubiquitous and at this point I do start wondering whether we should just put that in the destructors of Stuffer and Slicer (perhaps explicitly disengagable). What do y'all think?

Sounds good to me.

I'll do that after 3.5.0 is cut, to avoid unwanted surprises.

FAlbertDev

I've only looked into dilithium and the crystals module so far. The other sections (including Kyber) will come tomorrow.
For algorithms of this high complexity, the new structure looks very well-organized and clear :)
I've only found some minor nits so far.
(Btw sorry if I also marked lots of removed code. The VSCode extension for GitHub caused this)

src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp

src/lib/pubkey/dilithium/dilithium_common/dilithium_algos.cpp

src/lib/pubkey/dilithium/dilithium_common/dilithium_types.h

src/lib/pubkey/pqcrystals/pqcrystals.h

src/lib/pubkey/pqcrystals/pqcrystals_helpers.h

src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp

FAlbertDev

Final review iteration.
I really like your work. IMO it's the better reference implementation ;)

src/lib/pubkey/kyber/kyber_common/kyber.cpp

FAlbertDev · 2024-06-25T11:44:20Z

src/lib/pubkey/kyber/kyber_common/kyber.cpp

+   // TODO: Remove the need for the montgomery transformation
+   //
+   // -> When calculating A*s below, A is not in montgomery form, but s is. The
+   //    operation uses fqmul internally, which performs a montgomery reduction.
+   auto A = montgomery(kyber_sample_matrix(rho, false /* not transposed */, mode));


I've just investigated the usage of Montgomery in Kyber. First, a short recap on Montgomery:
The main observation is that every Montgomery multiplication $c=a \cdot b$ adds a factor of $R^{-1}$ to the result c where $R$ is the Montgomery factor (i.e., we compute $a \cdot b \cdot R^{-1}$) At some point we need to multiply $R$ to neutralize this $R^{-1}$ factor. Therefore, we could multiply $R$ to either $a$ or $b$. However, we can also multiply $R$ to the result $c$.
In our Kyber and Dilithium code, the following operations add or remove the Montgomery factor.

the function montgomery adds a factor $R$

the function inv_ntt adds a factor $R$. Note that ntt doesn't add a Montgomery factor.

As already said, a Montgomery multiplication adds a factor $R^{-1}$ to each poly, vector, or matrix element. This holds for all multiplication types, i.e., polynomial-polynomial, vector-vector, matrix-vector multiplication, etc.

For encryption and decryption, this setup works well since there is always one multiplication (adding factor $R^{-1}$) followed by inv_ntt (adding factor $R$). This does not work here since we store the value t in NTT domain and, therefore, do not call inv_ntt. We have a factor of $R^{-1}$ to eliminate. Currently, we do that by adding the factor $R$ to the matrix beforehand using montgomery. Instead, I propose to add this factor to t after the multiplication. The fewer elements multiplied by $R$, the faster it should be.

auto A = kyber_sample_matrix(rho, false /* not transposed */, mode); auto s = ntt(ps.sample_polynomial_vector_cbd_eta1()); const auto e = ntt(ps.sample_polynomial_vector_cbd_eta1()); auto t = montgomery(A * s); t += e; t.reduce(); // End Algorithm 12 ---------------------------

src/lib/pubkey/kyber/kyber_common/kyber_algos.h

src/lib/pubkey/kyber/kyber_common/kyber_polynomial.h

src/lib/rng/rng.h

src/tests/test_crystals.cpp

randombit

Partial review still need to look at changes in dilithium and kyber

src/lib/pubkey/pqcrystals/pqcrystals_helpers.h

src/lib/pubkey/pqcrystals/pqcrystals_encoding.h

src/lib/pubkey/pqcrystals/pqcrystals_helpers.h

src/tests/test_crystals.cpp

src/lib/pubkey/pqcrystals/info.txt

src/lib/pubkey/kyber/kyber_common/kyber_helpers.h

randombit · 2024-06-30T14:33:46Z

Everything so far looks good but I'm not enthused about dropping big refactors just before a release so pushing this to 3.6.0

See: randombit/botan#3887 This is a fairly minimal adaption, once the full refactoring is merged, we'll have to rewrite this more substantially. That won't happen before Botan 3.5.0, though. See also: randombit/botan#4024

reneme · 2024-07-03T06:59:54Z

src/lib/pubkey/pqcrystals/pqcrystals_encoding.h

+template <int32_t range, crystals_trait PolyTrait, Domain D, coeff_map_fn<typename PolyTrait::T> MapFnT>
+constexpr void pack(const Polynomial<PolyTrait, D>& p, BufferStuffer& stuffer, MapFnT map) {
+   using trait = BitPackingTrait<range, PolyTrait>;


Could the gist of this be useful beyond the CRYSTALS scope? E.g. FrodoKEM has a pack/unpack routine that -- on first glance at least -- looks like it could use it as well. @randombit perhaps there are more examples?

botan/src/lib/pubkey/frodokem/frodokem_common/frodo_matrix.cpp

Lines 340 to 384 in 3657a72

void FrodoMatrix::pack(const FrodoKEMConstants& constants, StrongSpan<FrodoPackedMatrix> out) const {

const size_t outlen = packed_size(constants);

BOTAN_ASSERT_NOMSG(out.size() == outlen);

size_t i = 0; // whole bytes already filled in

size_t j = 0; // whole uint16_t already copied

uint16_t w = 0; // the leftover, not yet copied

uint8_t bits = 0; // the number of lsb in w

while(i < outlen && (j < element_count() || ((j == element_count()) && (bits > 0)))) {

/*

in: | | |********|********|

^

j

w : | ****|

^

bits

out:|**|**|**|**|**|**|**|**|* |

^^

ib

*/

uint8_t b = 0; // bits in out[i] already filled in

while(b < 8) {

const uint8_t nbits = std::min(static_cast<uint8_t>(8 - b), bits);

const uint16_t mask = static_cast<uint16_t>(1 << nbits) - 1;

const auto t = static_cast<uint8_t>((w >> (bits - nbits)) & mask); // the bits to copy from w to out

out[i] = out[i] + static_cast<uint8_t>(t << (8 - b - nbits));

b += nbits;

bits -= nbits;

if(bits == 0) {

if(j < element_count()) {

w = m_elements.at(j);

bits = static_cast<uint8_t>(constants.d());

j++;

} else {

break; // the input vector is exhausted

}

}

}

if(b == 8) { // out[i] is filled in

i++;

}

}

}

randombit

This looks very good. I left one bikeshed suggestion that to me feels a little cleaner - but it's marginal, feel free to ignore if you disagree.

src/lib/pubkey/dilithium/dilithium_common/dilithium_algos.h

src/lib/pubkey/kyber/kyber_common/kyber_algos.h

randombit · 2024-07-10T09:45:05Z

Final comment, could use some history cleanup before merge

Most notably, this is an abstract implementation to handle CRYSTALS polynomials, poly vectors and poly matrices. Also, a generic implementation for bit-packed encoding/decoding of coefficients useful for both Kyber and Dilithium. Co-Authored-By: Fabian Albert <[email protected]>

t1 is part of the public key and thus independent of any other verification input. Precomputing it saves about 20% of verification runtime when performing more than a single verification with the same Botan::PK_Verifier. Co-Authored-By: Fabian Albert <[email protected]>

For Kyber those are {512,768,1024} and for Dilithium {44,65,87} depending on the respective choice of parameter set.

reneme · 2024-07-15T10:35:04Z

Final comment, could use some history cleanup before merge

Rebased to master, cleaned up history and built the bikeshed. Waiting for CI, then finally drawing a line here. 🥳

This comment was marked as resolved.

Sign in to view

reneme mentioned this pull request Apr 23, 2024

HSS-LMS Signature Algorithm Implementation #3716

Merged

This comment was marked as outdated.

Sign in to view

atreiber94 mentioned this pull request May 17, 2024

PQC: ML-DSA (Initial Public Draft) #4062

Closed

3 tasks

reneme mentioned this pull request May 29, 2024

🤖 Audit: Auto-update with latest upstream patches sehlen-bsi/botan-docs#212

Merged

reneme force-pushed the refactor/crystals_april branch from 64f9db8 to 066fc6d Compare May 31, 2024 13:24

reneme force-pushed the refactor/crystals_april branch 8 times, most recently from dd45e61 to c30b9fe Compare June 20, 2024 15:28

reneme marked this pull request as ready for review June 20, 2024 15:41

This comment was marked as outdated.

Sign in to view

reneme commented Jun 20, 2024

View reviewed changes

reneme force-pushed the refactor/crystals_april branch 2 times, most recently from cf16bf3 to 3fc00ec Compare June 21, 2024 09:40

reneme requested a review from FAlbertDev June 21, 2024 14:28

reneme force-pushed the refactor/crystals_april branch from 3fc00ec to 8dcd72b Compare June 21, 2024 14:38

FAlbertDev reviewed Jun 24, 2024

View reviewed changes

FAlbertDev reviewed Jun 25, 2024

View reviewed changes

src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp Outdated Show resolved Hide resolved

reneme mentioned this pull request Jun 25, 2024

Faster pcurves reductions for P-256 and P-384 #4147

Merged

reneme force-pushed the refactor/crystals_april branch 2 times, most recently from 1a3ac57 to ae4298c Compare June 26, 2024 06:44

FAlbertDev approved these changes Jun 26, 2024

View reviewed changes

randombit reviewed Jun 26, 2024

View reviewed changes

reneme mentioned this pull request Jun 27, 2024

add RandomNumberGenerator::random_array<>() #4152

Merged

reneme force-pushed the refactor/crystals_april branch 4 times, most recently from e6a97fd to e9863c1 Compare June 28, 2024 10:40

randombit added this to the Botan 3.6.0 milestone Jun 30, 2024

reneme mentioned this pull request Jul 2, 2024

PQC: ML-KEM #3893

Merged

7 tasks

reneme commented Jul 3, 2024

View reviewed changes

randombit approved these changes Jul 10, 2024

View reviewed changes

src/lib/pubkey/dilithium/dilithium_common/dilithium_algos.h Outdated Show resolved Hide resolved

src/lib/pubkey/kyber/kyber_common/kyber_algos.h Outdated Show resolved Hide resolved

reneme and others added 7 commits July 15, 2024 12:33

Add to_underlying() (from C++23)

d6e0bf7

Add XOF::output<count>() to obtain an array

bb374ca

Rework Kyber using the common CRYSTALS structs

06480e6

Rework Dilithium using the common CRYSTALS structs

ae8eb45

Kyber/Dilithium_PK::key_length() returns canonical parameter set ID

8bd26c7

For Kyber those are {512,768,1024} and for Dilithium {44,65,87} depending on the respective choice of parameter set.

reneme force-pushed the refactor/crystals_april branch from a476620 to 8bd26c7 Compare July 15, 2024 10:33

reneme merged commit 2fe687b into randombit:master Jul 15, 2024
39 checks passed

reneme deleted the refactor/crystals_april branch July 15, 2024 11:09

reneme mentioned this pull request Jul 18, 2024

Release: Botan 3.5.0 sehlen-bsi/botan-docs#225

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Have a common base for Kyber/Dilithium structures #4024

Refactor: Have a common base for Kyber/Dilithium structures #4024

reneme commented Apr 19, 2024 •

edited

Loading

coveralls commented Apr 22, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as outdated.

reneme left a comment

reneme Jun 20, 2024

reneme Jun 20, 2024

reneme Jun 20, 2024

reneme Jun 20, 2024

FAlbertDev Jun 24, 2024

reneme Jun 20, 2024

FAlbertDev Jun 25, 2024

reneme Jun 27, 2024 •

edited

Loading

reneme Jun 20, 2024

reneme Jun 20, 2024

FAlbertDev Jun 26, 2024

reneme Jun 20, 2024

randombit Jun 22, 2024

reneme Jun 25, 2024

FAlbertDev left a comment

FAlbertDev left a comment

FAlbertDev Jun 25, 2024

randombit left a comment

randombit commented Jun 30, 2024

reneme Jul 3, 2024

randombit left a comment

randombit commented Jul 10, 2024

reneme commented Jul 15, 2024

		// TODO: perhaps secure vector
		std::vector<T> m_coeffs_storage;

	void FrodoMatrix::pack(const FrodoKEMConstants& constants, StrongSpan<FrodoPackedMatrix> out) const {
	const size_t outlen = packed_size(constants);
	BOTAN_ASSERT_NOMSG(out.size() == outlen);

	size_t i = 0; // whole bytes already filled in
	size_t j = 0; // whole uint16_t already copied
	uint16_t w = 0; // the leftover, not yet copied
	uint8_t bits = 0; // the number of lsb in w

	while(i < outlen && (j < element_count() \|\| ((j == element_count()) && (bits > 0)))) {
	/*
	in: \| \| \|******\|******\|
	^
	j
	w : \| ****\|
	^
	bits
	out:\|\|\|\|\|\|\|\|\|* \|
	^^
	ib
	*/
	uint8_t b = 0; // bits in out[i] already filled in
	while(b < 8) {
	const uint8_t nbits = std::min(static_cast<uint8_t>(8 - b), bits);
	const uint16_t mask = static_cast<uint16_t>(1 << nbits) - 1;
	const auto t = static_cast<uint8_t>((w >> (bits - nbits)) & mask); // the bits to copy from w to out
	out[i] = out[i] + static_cast<uint8_t>(t << (8 - b - nbits));
	b += nbits;
	bits -= nbits;

	if(bits == 0) {
	if(j < element_count()) {
	w = m_elements.at(j);
	bits = static_cast<uint8_t>(constants.d());
	j++;
	} else {
	break; // the input vector is exhausted
	}
	}
	}
	if(b == 8) { // out[i] is filled in
	i++;
	}
	}
	}

Refactor: Have a common base for Kyber/Dilithium structures #4024

Refactor: Have a common base for Kyber/Dilithium structures #4024

Conversation

reneme commented Apr 19, 2024 • edited Loading

Module: pqcrystals

Module: kyber

Module: dilithium

Benchmark

coveralls commented Apr 22, 2024 • edited Loading

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as outdated.

reneme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reneme Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FAlbertDev left a comment

Choose a reason for hiding this comment

FAlbertDev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

randombit left a comment

Choose a reason for hiding this comment

randombit commented Jun 30, 2024

Choose a reason for hiding this comment

randombit left a comment

Choose a reason for hiding this comment

randombit commented Jul 10, 2024

reneme commented Jul 15, 2024

reneme commented Apr 19, 2024 •

edited

Loading

Module: `pqcrystals`

Module: `kyber`

Module: `dilithium`

coveralls commented Apr 22, 2024 •

edited

Loading

reneme Jun 27, 2024 •

edited

Loading