Add functionality for compile time decoding of word constants #3962

randombit · 2024-03-31T23:02:07Z

This is immediately useful for the NIST reduction code which has precomputed tables of constant values but likely will prove useful elsewhere as use of constexpr/constinit expands in the mp layer.

randombit · 2024-04-01T00:34:16Z

@reneme I imagine you would appreciate this and may have some suggestions since you're a lot better at this C++ compile time stuff 😅

coveralls · 2024-04-01T00:48:17Z

coverage: 92.072% (-0.02%) from 92.087%
when pulling eee0a0c on jack/word-literals
into aa12651 on master.

reneme · 2024-04-02T17:45:29Z

Mhh, admittedly, I got somewhat nerd-sniped by that. Anyway: I suggest splitting this up into two functions:

decode the hex string into bytes
decode the byte buffer into words using load_be

That results in a bit more boilerplate but is easier to understand in my opinion. Also, it reuses existing functionality and has the potential to replace the hex-to-bytes function by a constexpr-version of Botan::hex_decode() down the lane.

consteval void decode_from_hex(std::span<uint8_t> out_bytes, std::span<const char> in_hex) {
   auto decode = [](char c) -> uint8_t {
      if(c >= '0' && c <= '9') {
         return static_cast<uint8_t>(c - '0');
      } else if(c >= 'a' && c <= 'f') {
         return static_cast<uint8_t>(c - 'a' + 10);
      } else if(c >= 'A' && c <= 'F') {
         return static_cast<uint8_t>(c - 'A' + 10);
      } else {
         throw std::runtime_error("Invalid hex character: " + std::to_string(c));
      }
   };

   // If the hex string has an odd length, decode the first character first,
   // implicitly adding a zero hex byte as a prefix.
   assert((in_hex.size() + 1) / 2 == out_bytes.size());
   if(in_hex.size() % 2 != 0) {
      out_bytes[0] = decode(in_hex[0]);
      out_bytes = out_bytes.subspan(1);
      in_hex = in_hex.subspan(1);
   }

   // Decode the hex string that is now guaranteed to have an even length.
   assert(out_bytes.size() * 2 == in_hex.size());
   for(auto& byte : out_bytes) {
      const uint8_t hi = decode(in_hex[0]);
      const uint8_t lo = decode(in_hex[1]);
      byte = (hi << 4) | lo;
      in_hex = in_hex.subspan(2);
   }
}

template <WordType W, size_t N>
consteval auto hex_to_words(const char (&hex_string)[N]) {
   constexpr size_t hex_len = N - 1;               // Char count includes null terminator which we ignore
   constexpr size_t byte_len = (hex_len + 1) / 2;  // hex_string might have an odd length
   constexpr size_t word_len = (byte_len + sizeof(W) - 1) / sizeof(W);
   constexpr size_t zero_prefix_len = word_len * sizeof(W) - byte_len;
   constexpr size_t padded_byte_len = byte_len + zero_prefix_len;

   std::array<uint8_t, padded_byte_len> decoded_bytes = {0};
   decode_from_hex(std::span<uint8_t>{decoded_bytes}.subspan<zero_prefix_len, byte_len>(),
                   std::span<const char>{hex_string}.first<hex_len>());

   // Decode the (potentially zero-padded) big-endian byte string into "little-
   // endian" limbs. Words are decoded from the right of the byte string.
   static_assert(decoded_bytes.size() % sizeof(W) == 0, "Padded bytes is a multiple of word size");
   static_assert(decoded_bytes.size() / sizeof(W) == word_len, "Padded bytes is the right size");
   std::array<W, word_len> words = {0};
   std::span<const uint8_t> unread_bytes(decoded_bytes);
   for(auto& word : words) {
      word = load_be(unread_bytes.last<sizeof(W)>());
      unread_bytes = unread_bytes.first(unread_bytes.size() - sizeof(W));
   }

   return words;
}

Another new concept: I'm using vanilla assert() here, which is allowed in constexpr contexts (since C++14, I believe) and simply produces a compiler error if it doesn't check out. Perhaps that could be wrapped in the botan-macro. But then again, we're also using static_assert as is, no?

reneme · 2024-04-02T18:01:00Z

I pushed the above suggestion as ~~bd65c95~~ ce80c8e to see what CI has to say about it. Note that I didn't sign this commit, as I forget the appropriate USB-C adapter for my YubiKey at home. 😡

randombit · 2024-04-02T19:15:32Z

a) assert is absolutely no good. It is safe in consteval ONLY; if you happen to write it in a constexpr context and then call that function at runtime you can cause a runtime abort. Also NDEBUG disables the assertion entirely, which is very bad. Fortunately, all of these issues can be resolved by using BOTAN_ASSERT instead which has the same effects (in consteval context), is safe (throws an exception rather that aborting) if accidentally called at runtime, and doesn't get disabled by random build systems setting macros. [static_assert does not have any of these problems]

b) The provision for odd sized hex character seems to me relevant for big integers (if rarely) but not at all helpful, in fact more of a good way to get a subtle bug, for generic binary hex strings.

c) I can certainly see the utility of a compile time hex decoding for binary strings (I use the analogous hex! macro from Rust's hex-literal crate all the time). However such a function would imo return a std::array so you can use it as is to initialize constinit variables - writing to a span isn't useful, except in this one weird case with odd sized hex strings.

This is imo one of those situations where there are two things that seem quite similar, such that they could be solved in a combined way, but the combination proves more complicated than solving each individually due to handling the subtle differences.

reneme · 2024-04-02T21:19:59Z

assert is absolutely no good. It is safe in consteval ONLY

Agreed. There it is quite useful though. But if BOTAN_ASSERT works just fine, there's no real argument for it.

The provision for odd sized hex character seems to me relevant for big integers (if rarely) but not at all helpful, in fact more of a good way to get a subtle bug, for generic binary hex strings.

Yep, agree on that as well. The out-param is a crutch for this specific use case. For a generic hex_decode we would want to enforce an even-length hex string. Perhaps a std::copy or the ranges-equivalent thereof would be better suited, for this special case. Generally, I do "like" the out-param pattern for its ability to avoid copies and allocations when building serializations, despite the awkward usage. But at compile time that is much less of an argument.

reneme · 2024-04-02T22:03:50Z

Here's another suggestion: 1fea96e, moving the special treatment for the odd-length hex string into hex_to_words() and enforcing even-length hex strings in decode_from_hex(). Also, the latter returns a std::array<> now.

For the record: I'm somewhat surprised that std::ranges::copy doesn't do an out-of-range check when copying, but just accepts an output-iterator. Also, the CIFuzz build configuration doesn't seem to provide std::ranges::copy, yet. 😢

randombit · 2024-04-02T22:47:29Z

For the record: I'm somewhat surprised that std::ranges::copy doesn't do an out-of-range check when copying, but just accepts an output-iterator.

🙀

Also, the CIFuzz build configuration doesn't seem to provide std::ranges::copy, yet. 😢

Yeah this has been a problem for a while, I just poked on google/oss-fuzz#11116

reneme · 2024-04-03T07:11:11Z

Our range-based copy_mem() helper claims to be constexpr, but it uses std::memmove(), which isn't. Hence, I extended our helper and used that instead of std::ranges::copy. eee0a0c

randombit · 2024-04-03T14:56:44Z

TBH I find the new version much harder to understand than what I did originally.

reneme · 2024-04-03T21:04:08Z

Please revert to it then. No hard feelings at all! I think there's value in playing around with this to get a grasp on which abstractions work and which don't. I totally see how the buffer shuffling in my suggestion adds too much complexity for the task its supposed to achieve.

This is immediately useful for the NIST reduction code which has precomputed tables of constant values but likely will prove useful elsewhere as use of constexpr/constinit expands in the mp layer.

reneme · 2024-04-04T07:26:16Z

Nice. I like the shift_left<4>() addition. Makes it much clearer IMO.

randombit force-pushed the jack/word-literals branch 4 times, most recently from 3134d8d to 2a71a7a Compare April 1, 2024 00:20

randombit requested a review from reneme April 1, 2024 00:31

reneme force-pushed the jack/word-literals branch from bd65c95 to ce80c8e Compare April 2, 2024 18:02

reneme force-pushed the jack/word-literals branch from ce80c8e to 1fea96e Compare April 2, 2024 22:01

randombit and others added 3 commits April 3, 2024 18:54

Add functionality for compile time decoding of word constants

accf55a

This is immediately useful for the NIST reduction code which has precomputed tables of constant values but likely will prove useful elsewhere as use of constexpr/constinit expands in the mp layer.

Fix copy_mem to be properly constexpr

736e6d4

Extract out shift_left consteval function

63722b8

randombit force-pushed the jack/word-literals branch from eee0a0c to 63722b8 Compare April 3, 2024 22:56

randombit merged commit 64676be into master Apr 4, 2024
42 checks passed

randombit deleted the jack/word-literals branch April 4, 2024 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality for compile time decoding of word constants #3962

Add functionality for compile time decoding of word constants #3962

randombit commented Mar 31, 2024

randombit commented Apr 1, 2024

coveralls commented Apr 1, 2024 •

edited

Loading

reneme commented Apr 2, 2024

reneme commented Apr 2, 2024 •

edited

Loading

randombit commented Apr 2, 2024

reneme commented Apr 2, 2024

reneme commented Apr 2, 2024 •

edited

Loading

randombit commented Apr 2, 2024

reneme commented Apr 3, 2024 •

edited

Loading

randombit commented Apr 3, 2024

reneme commented Apr 3, 2024 •

edited

Loading

reneme commented Apr 4, 2024

Add functionality for compile time decoding of word constants #3962

Add functionality for compile time decoding of word constants #3962

Conversation

randombit commented Mar 31, 2024

randombit commented Apr 1, 2024

coveralls commented Apr 1, 2024 • edited Loading

reneme commented Apr 2, 2024

reneme commented Apr 2, 2024 • edited Loading

randombit commented Apr 2, 2024

reneme commented Apr 2, 2024

reneme commented Apr 2, 2024 • edited Loading

randombit commented Apr 2, 2024

reneme commented Apr 3, 2024 • edited Loading

randombit commented Apr 3, 2024

reneme commented Apr 3, 2024 • edited Loading

reneme commented Apr 4, 2024

coveralls commented Apr 1, 2024 •

edited

Loading

reneme commented Apr 2, 2024 •

edited

Loading

reneme commented Apr 2, 2024 •

edited

Loading

reneme commented Apr 3, 2024 •

edited

Loading

reneme commented Apr 3, 2024 •

edited

Loading