Kyber optimizations #3387

randombit · 2023-03-16T00:43:40Z

In aggregate these improve Kyber performance between 1.5x and 2.5x

codecov-commenter · 2023-03-16T01:22:13Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (8460ca2) 88.12% compared to head (1123636) 88.13%.

❗ Current head 1123636 differs from pull request most recent head a7d7457. Consider uploading reports for the commit a7d7457 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3387   +/-   ##
=======================================
  Coverage   88.12%   88.13%           
=======================================
  Files         617      616    -1     
  Lines       70331    70303   -28     
  Branches     6985     6985           
=======================================
- Hits        61978    61960   -18     
+ Misses       5424     5401   -23     
- Partials     2929     2942   +13

Impacted Files	Coverage Δ
src/lib/pubkey/kyber/kyber_common/kyber.cpp	`96.96% <100.00%> (-0.17%)`	⬇️
src/tests/test_kyber.cpp	`95.37% <100.00%> (ø)`

... and 14 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

reneme

Nice! Thanks for taking this on. I have added a few minor suggestions.

Out of curiosity: Did you go through the code looking for optimization opportunities or was this the result of some profiling?

src/lib/base/buf_comp.h

src/lib/pubkey/kyber/kyber/kyber_modern.h

src/lib/pubkey/kyber/kyber_common/kyber_symmetric_primitives.h

src/lib/pubkey/kyber/kyber_90s/kyber_90s.h

src/lib/pubkey/kyber/kyber_common/kyber.cpp

reneme · 2023-03-16T07:54:49Z

src/lib/pubkey/kyber/kyber_common/kyber.cpp

+           m_polynomials(std::move(polynomials)),
+           m_seed(std::move(seed)),
+           m_public_key_bits_raw(concat(m_polynomials.to_bytes<std::vector<uint8_t>>(), m_seed)),
+           m_H_public_key_bits_raw(unlock(m_mode.H()->process(m_public_key_bits_raw)))


Side-note: Maybe a Hash_Function::process() with a templated output container would be great to avoid the copy. Similar to the new RandomNumberGenerator::random_vec():

botan/src/lib/rng/rng.h

Lines 202 to 212 in c810e6c

template<typename T = secure_vector<uint8_t>>

requires(concepts::contiguous_container<T> &&

concepts::resizable_container<T> &&

concepts::default_initializable<T> &&

std::same_as<typename T::value_type, uint8_t>)

T random_vec(size_t bytes)

{

T result;

random_vec(result, bytes);

return result;

}

Yes I definitely had that in mind for a follow up

randombit · 2023-03-16T12:18:16Z

Did you go through the code looking for optimization opportunities or was this the result of some profiling?

This was sparked by seeing this PR adding X25519+Kyber key exchange for TLS ziglang/zig#14920 where the author quotes numbers for Kyber which were much better than reported by botan speed. I think part of that is his machine is faster since X25519 numbers are also faster.

Using his X25519 numbers versus what I see locally as a scale, Kyber encryption is now as fast as the Zig implementation. However decryption is still significantly slower. I'm still confused on that point, because that PR quotes decryption as faster than encryption. But indcpa_enc is the vast majority of the cost for encrypt or decrypt, and decrypt additionally has to do indcpa_dec. Perhaps there are some additional decryption-side precomputations that we are missing.

Edit: I realized I did not actually answer your question. This work was all based on profiling with valgrind's callgrind tool plus qcachegrind for visualization.

bwesterb · 2023-03-16T12:36:56Z

because that PR quotes decryption as faster than encryption. But indcpa_enc is the vast majority of the cost for encrypt or decrypt, and decrypt additionally has to do indcpa_dec. Perhaps there are some additional decryption-side precomputations that we are missing.

Encapsulation recomputes H(pk) whereas decapsulation doesn't. Yeah, Kyber is so fast that such a short hash makes a difference.

bwesterb · 2023-03-16T12:43:25Z

You're missing this trick which the Zig implementation uses.

The amortizes the overhead of the virtual call and the stream ciphers buffering logic.

Co-authored-by: René Meusel <[email protected]>

reneme · 2023-03-17T09:04:30Z

@randombit I took the liberty to adapt this to the now merged #3297 and #3294. Due to merge-conflicts I needed to force push.

randombit · 2023-03-17T12:07:49Z

Thanks @reneme

I read @bwesterb's paper on reducing the number of Barrett reductions in the inverse NTT, seems like a nice win, I'll take it up in another PR, maybe this weekend.

reneme approved these changes Mar 16, 2023

View reviewed changes

randombit force-pushed the jack/kyber-opt branch 2 times, most recently from b90c027 to 1123636 Compare March 16, 2023 22:03

randombit and others added 7 commits March 17, 2023 09:50

Cache symmetric objects for Kyber

33ffcb0

In Kyber, batch reading from the XOF

5c5e316

The amortizes the overhead of the virtual call and the stream ciphers buffering logic.

In Kyber use span instead of unncessary allocations

c6fa087

Co-authored-by: René Meusel <[email protected]>

Cache the hash of the public key instead of recomputing each time

88796d2

New XOF interface for Kyber

fb2f7d6

Co-authored-by: René Meusel <[email protected]>

Cache Kyber at matrix in the operations struct

7bae133

take advantage of GH #3297

a7d7457

reneme force-pushed the jack/kyber-opt branch from 1123636 to a7d7457 Compare March 17, 2023 09:03

randombit merged commit eb51e40 into master Mar 17, 2023

randombit deleted the jack/kyber-opt branch March 17, 2023 12:09

reneme mentioned this pull request Mar 20, 2023

Chore: Iterate the Buffered_Computation API #3396

Merged

reneme mentioned this pull request Jan 18, 2024

PQC: ML-KEM #3893

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kyber optimizations #3387

Kyber optimizations #3387

randombit commented Mar 16, 2023 •

edited

Loading

codecov-commenter commented Mar 16, 2023 •

edited

Loading

reneme left a comment

reneme Mar 16, 2023

randombit Mar 16, 2023

randombit commented Mar 16, 2023 •

edited

Loading

bwesterb commented Mar 16, 2023

bwesterb commented Mar 16, 2023

reneme commented Mar 17, 2023

randombit commented Mar 17, 2023

	template<typename T = secure_vector<uint8_t>>
	requires(concepts::contiguous_container<T> &&
	concepts::resizable_container<T> &&
	concepts::default_initializable<T> &&
	std::same_as<typename T::value_type, uint8_t>)
	T random_vec(size_t bytes)
	{
	T result;
	random_vec(result, bytes);
	return result;
	}

Kyber optimizations #3387

Kyber optimizations #3387

Conversation

randombit commented Mar 16, 2023 • edited Loading

codecov-commenter commented Mar 16, 2023 • edited Loading

Codecov Report

reneme left a comment

Choose a reason for hiding this comment

reneme Mar 16, 2023

Choose a reason for hiding this comment

randombit Mar 16, 2023

Choose a reason for hiding this comment

randombit commented Mar 16, 2023 • edited Loading

bwesterb commented Mar 16, 2023

bwesterb commented Mar 16, 2023

reneme commented Mar 17, 2023

randombit commented Mar 17, 2023

randombit commented Mar 16, 2023 •

edited

Loading

codecov-commenter commented Mar 16, 2023 •

edited

Loading

randombit commented Mar 16, 2023 •

edited

Loading