Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Have a common base for Kyber/Dilithium structures #4024

Merged
merged 7 commits into from
Jul 15, 2024

Conversation

reneme
Copy link
Collaborator

@reneme reneme commented Apr 19, 2024

This is essentially a complete overhaul of the Kyber and Dilithium implementations aiming at as much shared code as possible between the two algorithms. The rough structure is outlined below. I split-up the changes into three commits, that could be separate pull requests, if needed.

Module: pqcrystals

  • Polynomial Structures
    This is a templated base for the polynomials (as well as poly vectors/matrices) used in both implementations. The base implementation shares math where possible and is pluggable where necessary. It also exposes its domain (NTT or "normal") in the type system. Also, it makes some initial optimization effort (mostly memory layout), but more work could be done here.
  • Polynomial Encoding/Decoding
    Both algorithms contain bit-packing algorithms to efficiently encode polynomial coefficients of various known value ranges. Both Kyber and Dilithium now share a generic and pluggable implementation of this.

Module: kyber

This now makes use of the pqcrystals module. The polynomial instantiation happens in kyber_polynomial.h (replacing the old kyber_structures.h). Most low-level "algorithms" (pseudo-codes of the spec) live in kyber_algos.cpp. The indcpa "K-PKE" scheme is implemented in kyber_keys.cpp and the cca2-secure wrapper can be found in kyber.cpp. Auxiliary stuff like constants and strong types are in other headers.

Note that this now contains almost no verbatim code from the reference implementation anymore.

Module: dilithium

Likewise, this uses the pqcrystals module (instantiating the polynomials in dilithium_polynomial.h (replacing the almost verbatim reference implementation code). The dilithium_algos.cpp is home to most low-level "algorithms" (as seen in the spec). dilithium.cpp houses the actual signature/verification algorihtms that bind it all together..

Care has been taken to have a consistent implementation structure between Kyber and Dilithium. Note, though, that Dilithium's flexibility might require some love before a clean integration of ML-KEM (similar to what happened to Kyber in #3887).

Benchmark

Despite not explicitly aiming to increase speed, ./botan speed shows nice gains for Kyber and no losses for Dilithium. It might be worth looking into why Kyber is so much faster, though. Feels suspicious. Edit: The new implementation pre-computes and stores the expanded public matrix A. So as long as one re-uses the PK_KEM_Encryptor for multiple encapsulations, one saves a lot of invocations to the XOF. That explains. 😏

GCC 13.2 vs. Clang 18.1 on Linux (in WSL)

Algorithm This PR (GCC) Master (GCC) This PR (Clang) Master (Clang)
Dilithium-4x4-r3 4602 sign/sec 4332 sign/sec 5358 sign/sec 4681 sign/sec
Dilithium-4x4-r3 28913 verify/sec 31303 verify/sec 37226 verify/sec 35459 verify/sec
Dilithium-4x4-AES-r3 5050 sign/sec 4689 sign/sec 6363 sign/sec 5007 sign/sec
Dilithium-4x4-AES-r3 27824 verify/sec 31180 verify/sec 38452 verify/sec 34830 verify/sec
Dilithium-6x5-r3 2648 sign/sec 2695 sign/sec 2893 sign/sec 2830 sign/sec
Dilithium-6x5-r3 19250 verify/sec 20999 verify/sec 26201 verify/sec 24643 verify/sec
Dilithium-6x5-AES-r3 2848 sign/sec 2850 sign/sec 3398 sign/sec 3033 sign/sec
Dilithium-6x5-AES-r3 18503 verify/sec 21491 verify/sec 26030 verify/sec 24351 verify/sec
Dilithium-8x7-r3 2389 sign/sec 2251 sign/sec 2853 sign/sec 2543 sign/sec
Dilithium-8x7-r3 12768 verify/sec 14551 verify/sec 18622 verify/sec 17211 verify/sec
Dilithium-8x7-AES-r3 2758 sign/sec 2490 sign/sec 3044 sign/sec 2970 sign/sec
Dilithium-8x7-AES-r3 13856 verify/sec 13881 verify/sec 18468 verify/sec 17629 verify/sec
Algorithm This PR (GCC) Master (GCC) This PR (Clang) Master (Clang)
Kyber-512-r3 69864 encrypt/sec 39354 encrypt/sec 59956 encrypt/sec 38009 encrypt/sec
Kyber-512-r3 51452 decrypt/sec 32157 decrypt/sec 48114 decrypt/sec 32214 decrypt/sec
Kyber-512-90s-r3 81350 encrypt/sec 39878 encrypt/sec 75388 encrypt/sec 44527 encrypt/sec
Kyber-512-90s-r3 57531 decrypt/sec 32740 decrypt/sec 57515 decrypt/sec 37094 decrypt/sec
Kyber-768-r3 50542 encrypt/sec 22767 encrypt/sec 45523 encrypt/sec 21180 encrypt/sec
Kyber-768-r3 37306 decrypt/sec 19418 decrypt/sec 36505 decrypt/sec 18593 decrypt/sec
Kyber-768-90s-r3 48522 encrypt/sec 22167 encrypt/sec 59785 encrypt/sec 24793 encrypt/sec
Kyber-768-90s-r3 34923 decrypt/sec 19070 decrypt/sec 46596 decrypt/sec 21613 decrypt/sec
Kyber-1024-r3 35053 encrypt/sec 14597 encrypt/sec 40132 encrypt/sec 14884 encrypt/sec
Kyber-1024-r3 26609 decrypt/sec 12840 decrypt/sec 32397 decrypt/sec 13338 decrypt/sec
Kyber-1024-90s-r3 36312 encrypt/sec 14504 encrypt/sec 43981 encrypt/sec 16118 encrypt/sec
Kyber-1024-90s-r3 27100 decrypt/sec 13312 decrypt/sec 34729 decrypt/sec 14435 decrypt/sec

@coveralls
Copy link

coveralls commented Apr 22, 2024

Coverage Status

coverage: 91.69% (-0.02%) from 91.711%
when pulling 8bd26c7 on Rohde-Schwarz:refactor/crystals_april
into 55007af on randombit:master.

@reneme

This comment was marked as resolved.

@reneme

This comment was marked as outdated.

@reneme reneme force-pushed the refactor/crystals_april branch from 64f9db8 to 066fc6d Compare May 31, 2024 13:24
@reneme reneme force-pushed the refactor/crystals_april branch 8 times, most recently from dd45e61 to c30b9fe Compare June 20, 2024 15:28
@reneme reneme marked this pull request as ready for review June 20, 2024 15:41
@reneme

This comment was marked as outdated.

Copy link
Collaborator Author

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are a few highlights and potential conversion starters of things I saw along the way. :)

Comment on lines 37 to 57
// This is a mitigation for a potential side channel called "KyberSlash".
// It implements the division by Q using a multiplication and a shift. Most
// compilers would generate similar code for such a division by a constant.
// Though, in some cases, compilers might use a variable-time int division,
// resulting in a potential side channel.
//
// See "Hacker's Delight" (Second Edition) by Henry S. Warren, Jr.
// Chapter 10-9 "Unsigned Division by Divisors >= 1"
auto divide_by_q = [](uint32_t n) -> unsigned_T {
static_assert(KyberConstants::Q == 3329);
BOTAN_DEBUG_ASSERT(n < (1 << 23));

// These constants work for all values that appear in Kyber with the
// greatest being 3328 * 2^11 + Q // 2 = 6,817,408 < 2**23 = 8,388,608.
constexpr uint64_t m = 2580335;
constexpr size_t p = 33;
return static_cast<unsigned_T>((n * m) >> p);
};

constexpr unsigned_T mask = (1 << d) - 1;
return divide_by_q((static_cast<uint32_t>(x) << d) + KyberConstants::Q / 2) & mask;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adapted the KyberSlash countermeasure to also work for the compression instances with d = 10 and d = 11. Not sure it's necessary but doesn't hurt either. I'd definitely be happy about some other sharp pair of eyes.

Comment on lines +217 to +218
// TODO: perhaps secure vector
std::vector<T> m_coeffs_storage;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polynomials store their coefficients as a std::vector<>. Some of those store secret values and should therefore use secure_vector. Also, it might be beneficial to add first-level support for Polynomials (and friends) as Strong types. (Most likely this would be another pull request, though.)

Comment on lines 26 to 44
namespace detail {

constexpr auto as_byte_source(BufferSlicer& slicer) {
return [&](std::span<uint8_t> out) { slicer.copy_into(out); };
}

constexpr auto as_byte_source(Botan::XOF& xof) {
return [&](std::span<uint8_t> out) { xof.output(out); };
}

} // namespace detail

template <typename T>
concept byte_source =
requires(T& t) { requires std::invocable<decltype(detail::as_byte_source(t)), std::span<uint8_t>>; };
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a top-level concept we might want to establish deeper in the library. Perhaps as an (internal) replacement for the DataSource class.

Namely: A thing that can provide a concrete number of bytes where I don't really know how many bytes will be needed. The source may be infinite or not.

Things that could fall in this abstraction may be:

  • the RandomNumberGenerator
  • a XOF
  • read from some buffer (think BufferSlicer)
  • from some I/O source (e.g. the network)

In Kyber/Dithithium this is used to either sample some random polynomial from a XOF or to unmarshal it from a bitpacked encoding.

Other use cases may be:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a technical perspective, one might even implement it like shown here: As a C++20 concept that tries to build a thin wrapper around "the source". That way, new data sources could just hook themselves into this "helper" and decide for themselves to be a "data source" or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea 👍. I'm not sure if we want to commit to a pure concept construction, though. In some cases, it may be more convenient to have a class like DataSource. We could still extend the DataSource class to fulfill the respective concept if one wants to juggle with templates.

Comment on lines 213 to 217
// TODO: Remove the need for the montgomery transformation
//
// -> When calculating A*s below, A is not in montgomery form, but s is. The
// operation uses fqmul internally, which performs a montgomery reduction.
auto A = montgomery(kyber_sample_matrix(rho, false /* not transposed */, mode));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was different in the reference implementation. @FAlbertDev we did look into this a while ago, I feel we should take this up again and get to the bottom of it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just investigated the usage of Montgomery in Kyber. First, a short recap on Montgomery:
The main observation is that every Montgomery multiplication $c=a \cdot b$ adds a factor of $R^{-1}$ to the result c where $R$ is the Montgomery factor (i.e., we compute $a \cdot b \cdot R^{-1}$) At some point we need to multiply $R$ to neutralize this $R^{-1}$ factor. Therefore, we could multiply $R$ to either $a$ or $b$. However, we can also multiply $R$ to the result $c$.
In our Kyber and Dilithium code, the following operations add or remove the Montgomery factor.

  • the function montgomery adds a factor $R$
  • the function inv_ntt adds a factor $R$. Note that ntt doesn't add a Montgomery factor.
  • As already said, a Montgomery multiplication adds a factor $R^{-1}$ to each poly, vector, or matrix element. This holds for all multiplication types, i.e., polynomial-polynomial, vector-vector, matrix-vector multiplication, etc.

For encryption and decryption, this setup works well since there is always one multiplication (adding factor $R^{-1}$) followed by inv_ntt (adding factor $R$). This does not work here since we store the value t in NTT domain and, therefore, do not call inv_ntt. We have a factor of $R^{-1}$ to eliminate. Currently, we do that by adding the factor $R$ to the matrix beforehand using montgomery. Instead, I propose to add this factor to t after the multiplication. The fewer elements multiplied by $R$, the faster it should be.

   auto A = kyber_sample_matrix(rho, false /* not transposed */, mode);
   auto s = ntt(ps.sample_polynomial_vector_cbd_eta1());
   const auto e = ntt(ps.sample_polynomial_vector_cbd_eta1());

   auto t = montgomery(A * s);
   t += e;
   t.reduce();

   // End Algorithm 12 ---------------------------

Copy link
Collaborator Author

@reneme reneme Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for double-checking and the detailed write-up! It would probably be really nice to track the montgomery parameter in the type system as well, but it does feel like overkill.

Also: by applying montgomery() to the t vector, we can remove the overload for PolynomialMatrix altogether. :)

Comment on lines +25 to +30
using KyberPolyNTT = Botan::CRYSTALS::Polynomial<KyberPolyTraits, Botan::CRYSTALS::Domain::NTT>;
using KyberPolyVecNTT = Botan::CRYSTALS::PolynomialVector<KyberPolyTraits, Botan::CRYSTALS::Domain::NTT>;
using KyberPolyMat = Botan::CRYSTALS::PolynomialMatrix<KyberPolyTraits>;

using KyberPoly = Botan::CRYSTALS::Polynomial<KyberPolyTraits, Botan::CRYSTALS::Domain::Normal>;
using KyberPolyVec = Botan::CRYSTALS::PolynomialVector<KyberPolyTraits, Botan::CRYSTALS::Domain::Normal>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before: it would be good if these could be the basis for strong types to disambiguate them in interfaces E.g. to replace the multiple poly_pack_t0, ..._t1, ... methods in dilithium_algos.cpp with an overloaded poly_pack() that just "does the right thing" ™ based on the strong type.

src/lib/pubkey/pqcrystals/pqcrystals_helpers.h Outdated Show resolved Hide resolved
src/lib/xof/xof.h Show resolved Hide resolved
Comment on lines +63 to +75
enum DilithiumTau : uint32_t { _39 = 39, _49 = 49, _60 = 60 };

enum DilithiumLambda : uint32_t { _128 = 128, _192 = 192, _256 = 256 };

enum DilithiumGamma1 : uint32_t { ToThe17th = (1 << 17), ToThe19th = (1 << 19) };

enum DilithiumGamma2 : uint32_t { Qminus1DevidedBy88 = (Q - 1) / 88, Qminus1DevidedBy32 = (Q - 1) / 32 };

enum DilithiumEta : uint32_t { _2 = 2, _4 = 4 };

enum DilithiumBeta : uint32_t { _78 = 78, _196 = 196, _120 = 120 };

enum DilithiumOmega : uint32_t { _80 = 80, _55 = 55, _75 = 75 };
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dilithium has many integer constants that change depending on the instantiation of the algorithm. I found it worthwhile to restrict them with explicit enums. I'm okay with this concept, also given the restricted (algorithm-internal) scope. But I feel there must be a better way for this. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like this idea. While the implied switch-cases are annoying, they guarantee we do not forget a value.

dilithium_poly_pack_t0(p, stuffer);
}

BOTAN_ASSERT_NOMSG(stuffer.full());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those became ubiquitous and at this point I do start wondering whether we should just put that in the destructors of Stuffer and Slicer (perhaps explicitly disengagable). What do y'all think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that after 3.5.0 is cut, to avoid unwanted surprises.

@reneme reneme force-pushed the refactor/crystals_april branch 2 times, most recently from cf16bf3 to 3fc00ec Compare June 21, 2024 09:40
@reneme reneme requested a review from FAlbertDev June 21, 2024 14:28
@reneme reneme force-pushed the refactor/crystals_april branch from 3fc00ec to 8dcd72b Compare June 21, 2024 14:38
Copy link
Collaborator

@FAlbertDev FAlbertDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked into dilithium and the crystals module so far. The other sections (including Kyber) will come tomorrow.
For algorithms of this high complexity, the new structure looks very well-organized and clear :)
I've only found some minor nits so far.
(Btw sorry if I also marked lots of removed code. The VSCode extension for GitHub caused this)

src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/dilithium/dilithium_common/dilithium.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/pqcrystals/pqcrystals.h Outdated Show resolved Hide resolved
src/lib/pubkey/pqcrystals/pqcrystals.h Outdated Show resolved Hide resolved
src/lib/pubkey/pqcrystals/pqcrystals_helpers.h Outdated Show resolved Hide resolved
@reneme reneme force-pushed the refactor/crystals_april branch 2 times, most recently from 1a3ac57 to ae4298c Compare June 26, 2024 06:44
Copy link
Collaborator

@FAlbertDev FAlbertDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review iteration.
I really like your work. IMO it's the better reference implementation ;)

src/lib/pubkey/kyber/kyber_common/kyber.cpp Outdated Show resolved Hide resolved
Comment on lines 213 to 217
// TODO: Remove the need for the montgomery transformation
//
// -> When calculating A*s below, A is not in montgomery form, but s is. The
// operation uses fqmul internally, which performs a montgomery reduction.
auto A = montgomery(kyber_sample_matrix(rho, false /* not transposed */, mode));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just investigated the usage of Montgomery in Kyber. First, a short recap on Montgomery:
The main observation is that every Montgomery multiplication $c=a \cdot b$ adds a factor of $R^{-1}$ to the result c where $R$ is the Montgomery factor (i.e., we compute $a \cdot b \cdot R^{-1}$) At some point we need to multiply $R$ to neutralize this $R^{-1}$ factor. Therefore, we could multiply $R$ to either $a$ or $b$. However, we can also multiply $R$ to the result $c$.
In our Kyber and Dilithium code, the following operations add or remove the Montgomery factor.

  • the function montgomery adds a factor $R$
  • the function inv_ntt adds a factor $R$. Note that ntt doesn't add a Montgomery factor.
  • As already said, a Montgomery multiplication adds a factor $R^{-1}$ to each poly, vector, or matrix element. This holds for all multiplication types, i.e., polynomial-polynomial, vector-vector, matrix-vector multiplication, etc.

For encryption and decryption, this setup works well since there is always one multiplication (adding factor $R^{-1}$) followed by inv_ntt (adding factor $R$). This does not work here since we store the value t in NTT domain and, therefore, do not call inv_ntt. We have a factor of $R^{-1}$ to eliminate. Currently, we do that by adding the factor $R$ to the matrix beforehand using montgomery. Instead, I propose to add this factor to t after the multiplication. The fewer elements multiplied by $R$, the faster it should be.

   auto A = kyber_sample_matrix(rho, false /* not transposed */, mode);
   auto s = ntt(ps.sample_polynomial_vector_cbd_eta1());
   const auto e = ntt(ps.sample_polynomial_vector_cbd_eta1());

   auto t = montgomery(A * s);
   t += e;
   t.reduce();

   // End Algorithm 12 ---------------------------

src/lib/pubkey/kyber/kyber_common/kyber_algos.h Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber_common/kyber_polynomial.h Outdated Show resolved Hide resolved
src/lib/rng/rng.h Show resolved Hide resolved
src/tests/test_crystals.cpp Outdated Show resolved Hide resolved
src/tests/test_crystals.cpp Outdated Show resolved Hide resolved
Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review still need to look at changes in dilithium and kyber

src/lib/pubkey/pqcrystals/pqcrystals_encoding.h Outdated Show resolved Hide resolved
src/lib/pubkey/pqcrystals/pqcrystals_helpers.h Outdated Show resolved Hide resolved
src/tests/test_crystals.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/pqcrystals/info.txt Show resolved Hide resolved
src/lib/pubkey/kyber/kyber_common/kyber_helpers.h Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber_common/kyber_helpers.h Outdated Show resolved Hide resolved
@reneme reneme force-pushed the refactor/crystals_april branch 4 times, most recently from e6a97fd to e9863c1 Compare June 28, 2024 10:40
@randombit
Copy link
Owner

Everything so far looks good but I'm not enthused about dropping big refactors just before a release so pushing this to 3.6.0

@randombit randombit added this to the Botan 3.6.0 milestone Jun 30, 2024
reneme added a commit to sehlen-bsi/botan-docs that referenced this pull request Jul 2, 2024
See: randombit/botan#3887

This is a fairly minimal adaption, once the full refactoring
is merged, we'll have to rewrite this more substantially. That won't
happen before Botan 3.5.0, though.

See also: randombit/botan#4024
@reneme reneme mentioned this pull request Jul 2, 2024
7 tasks
Comment on lines +115 to +117
template <int32_t range, crystals_trait PolyTrait, Domain D, coeff_map_fn<typename PolyTrait::T> MapFnT>
constexpr void pack(const Polynomial<PolyTrait, D>& p, BufferStuffer& stuffer, MapFnT map) {
using trait = BitPackingTrait<range, PolyTrait>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the gist of this be useful beyond the CRYSTALS scope? E.g. FrodoKEM has a pack/unpack routine that -- on first glance at least -- looks like it could use it as well. @randombit perhaps there are more examples?

void FrodoMatrix::pack(const FrodoKEMConstants& constants, StrongSpan<FrodoPackedMatrix> out) const {
const size_t outlen = packed_size(constants);
BOTAN_ASSERT_NOMSG(out.size() == outlen);
size_t i = 0; // whole bytes already filled in
size_t j = 0; // whole uint16_t already copied
uint16_t w = 0; // the leftover, not yet copied
uint8_t bits = 0; // the number of lsb in w
while(i < outlen && (j < element_count() || ((j == element_count()) && (bits > 0)))) {
/*
in: | | |********|********|
^
j
w : | ****|
^
bits
out:|**|**|**|**|**|**|**|**|* |
^^
ib
*/
uint8_t b = 0; // bits in out[i] already filled in
while(b < 8) {
const uint8_t nbits = std::min(static_cast<uint8_t>(8 - b), bits);
const uint16_t mask = static_cast<uint16_t>(1 << nbits) - 1;
const auto t = static_cast<uint8_t>((w >> (bits - nbits)) & mask); // the bits to copy from w to out
out[i] = out[i] + static_cast<uint8_t>(t << (8 - b - nbits));
b += nbits;
bits -= nbits;
if(bits == 0) {
if(j < element_count()) {
w = m_elements.at(j);
bits = static_cast<uint8_t>(constants.d());
j++;
} else {
break; // the input vector is exhausted
}
}
}
if(b == 8) { // out[i] is filled in
i++;
}
}
}

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good. I left one bikeshed suggestion that to me feels a little cleaner - but it's marginal, feel free to ignore if you disagree.

src/lib/pubkey/kyber/kyber_common/kyber_algos.h Outdated Show resolved Hide resolved
@randombit
Copy link
Owner

Final comment, could use some history cleanup before merge

reneme and others added 7 commits July 15, 2024 12:33
Most notably, this is an abstract implementation to handle CRYSTALS
polynomials, poly vectors and poly matrices. Also, a generic
implementation for bit-packed encoding/decoding of coefficients
useful for both Kyber and Dilithium.

Co-Authored-By: Fabian Albert <[email protected]>
t1 is part of the public key and thus independent of any other verification input.
Precomputing it saves about 20% of verification runtime when performing more than
a single verification with the same Botan::PK_Verifier.

Co-Authored-By: Fabian Albert <[email protected]>
For Kyber those are {512,768,1024} and for Dilithium {44,65,87}
depending on the respective choice of parameter set.
@reneme reneme force-pushed the refactor/crystals_april branch from a476620 to 8bd26c7 Compare July 15, 2024 10:33
@reneme
Copy link
Collaborator Author

reneme commented Jul 15, 2024

Final comment, could use some history cleanup before merge

Rebased to master, cleaned up history and built the bikeshed. Waiting for CI, then finally drawing a line here. 🥳

@reneme reneme merged commit 2fe687b into randombit:master Jul 15, 2024
39 checks passed
@reneme reneme deleted the refactor/crystals_april branch July 15, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants