Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality for compile time decoding of word constants #3962

Merged
merged 3 commits into from
Apr 4, 2024

Conversation

randombit
Copy link
Owner

This is immediately useful for the NIST reduction code which has precomputed tables of constant values but likely will prove useful elsewhere as use of constexpr/constinit expands in the mp layer.

@randombit randombit force-pushed the jack/word-literals branch 4 times, most recently from 3134d8d to 2a71a7a Compare April 1, 2024 00:20
@randombit randombit requested a review from reneme April 1, 2024 00:31
@randombit
Copy link
Owner Author

@reneme I imagine you would appreciate this and may have some suggestions since you're a lot better at this C++ compile time stuff 😅

@coveralls
Copy link

coveralls commented Apr 1, 2024

Coverage Status

coverage: 92.072% (-0.02%) from 92.087%
when pulling eee0a0c on jack/word-literals
into aa12651 on master.

@reneme
Copy link
Collaborator

reneme commented Apr 2, 2024

Mhh, admittedly, I got somewhat nerd-sniped by that. Anyway: I suggest splitting this up into two functions:

  1. decode the hex string into bytes
  2. decode the byte buffer into words using load_be

That results in a bit more boilerplate but is easier to understand in my opinion. Also, it reuses existing functionality and has the potential to replace the hex-to-bytes function by a constexpr-version of Botan::hex_decode() down the lane.

consteval void decode_from_hex(std::span<uint8_t> out_bytes, std::span<const char> in_hex) {
   auto decode = [](char c) -> uint8_t {
      if(c >= '0' && c <= '9') {
         return static_cast<uint8_t>(c - '0');
      } else if(c >= 'a' && c <= 'f') {
         return static_cast<uint8_t>(c - 'a' + 10);
      } else if(c >= 'A' && c <= 'F') {
         return static_cast<uint8_t>(c - 'A' + 10);
      } else {
         throw std::runtime_error("Invalid hex character: " + std::to_string(c));
      }
   };

   // If the hex string has an odd length, decode the first character first,
   // implicitly adding a zero hex byte as a prefix.
   assert((in_hex.size() + 1) / 2 == out_bytes.size());
   if(in_hex.size() % 2 != 0) {
      out_bytes[0] = decode(in_hex[0]);
      out_bytes = out_bytes.subspan(1);
      in_hex = in_hex.subspan(1);
   }

   // Decode the hex string that is now guaranteed to have an even length.
   assert(out_bytes.size() * 2 == in_hex.size());
   for(auto& byte : out_bytes) {
      const uint8_t hi = decode(in_hex[0]);
      const uint8_t lo = decode(in_hex[1]);
      byte = (hi << 4) | lo;
      in_hex = in_hex.subspan(2);
   }
}

template <WordType W, size_t N>
consteval auto hex_to_words(const char (&hex_string)[N]) {
   constexpr size_t hex_len = N - 1;               // Char count includes null terminator which we ignore
   constexpr size_t byte_len = (hex_len + 1) / 2;  // hex_string might have an odd length
   constexpr size_t word_len = (byte_len + sizeof(W) - 1) / sizeof(W);
   constexpr size_t zero_prefix_len = word_len * sizeof(W) - byte_len;
   constexpr size_t padded_byte_len = byte_len + zero_prefix_len;

   std::array<uint8_t, padded_byte_len> decoded_bytes = {0};
   decode_from_hex(std::span<uint8_t>{decoded_bytes}.subspan<zero_prefix_len, byte_len>(),
                   std::span<const char>{hex_string}.first<hex_len>());

   // Decode the (potentially zero-padded) big-endian byte string into "little-
   // endian" limbs. Words are decoded from the right of the byte string.
   static_assert(decoded_bytes.size() % sizeof(W) == 0, "Padded bytes is a multiple of word size");
   static_assert(decoded_bytes.size() / sizeof(W) == word_len, "Padded bytes is the right size");
   std::array<W, word_len> words = {0};
   std::span<const uint8_t> unread_bytes(decoded_bytes);
   for(auto& word : words) {
      word = load_be(unread_bytes.last<sizeof(W)>());
      unread_bytes = unread_bytes.first(unread_bytes.size() - sizeof(W));
   }

   return words;
}

Another new concept: I'm using vanilla assert() here, which is allowed in constexpr contexts (since C++14, I believe) and simply produces a compiler error if it doesn't check out. Perhaps that could be wrapped in the botan-macro. But then again, we're also using static_assert as is, no?

@reneme
Copy link
Collaborator

reneme commented Apr 2, 2024

I pushed the above suggestion as bd65c95 ce80c8e to see what CI has to say about it. Note that I didn't sign this commit, as I forget the appropriate USB-C adapter for my YubiKey at home. 😡

@reneme reneme force-pushed the jack/word-literals branch from bd65c95 to ce80c8e Compare April 2, 2024 18:02
@randombit
Copy link
Owner Author

a) assert is absolutely no good. It is safe in consteval ONLY; if you happen to write it in a constexpr context and then call that function at runtime you can cause a runtime abort. Also NDEBUG disables the assertion entirely, which is very bad. Fortunately, all of these issues can be resolved by using BOTAN_ASSERT instead which has the same effects (in consteval context), is safe (throws an exception rather that aborting) if accidentally called at runtime, and doesn't get disabled by random build systems setting macros. [static_assert does not have any of these problems]

b) The provision for odd sized hex character seems to me relevant for big integers (if rarely) but not at all helpful, in fact more of a good way to get a subtle bug, for generic binary hex strings.

c) I can certainly see the utility of a compile time hex decoding for binary strings (I use the analogous hex! macro from Rust's hex-literal crate all the time). However such a function would imo return a std::array so you can use it as is to initialize constinit variables - writing to a span isn't useful, except in this one weird case with odd sized hex strings.

This is imo one of those situations where there are two things that seem quite similar, such that they could be solved in a combined way, but the combination proves more complicated than solving each individually due to handling the subtle differences.

@reneme
Copy link
Collaborator

reneme commented Apr 2, 2024

assert is absolutely no good. It is safe in consteval ONLY

Agreed. There it is quite useful though. But if BOTAN_ASSERT works just fine, there's no real argument for it.

The provision for odd sized hex character seems to me relevant for big integers (if rarely) but not at all helpful, in fact more of a good way to get a subtle bug, for generic binary hex strings.

Yep, agree on that as well. The out-param is a crutch for this specific use case. For a generic hex_decode we would want to enforce an even-length hex string. Perhaps a std::copy or the ranges-equivalent thereof would be better suited, for this special case. Generally, I do "like" the out-param pattern for its ability to avoid copies and allocations when building serializations, despite the awkward usage. But at compile time that is much less of an argument.

@reneme reneme force-pushed the jack/word-literals branch from ce80c8e to 1fea96e Compare April 2, 2024 22:01
@reneme
Copy link
Collaborator

reneme commented Apr 2, 2024

Here's another suggestion: 1fea96e, moving the special treatment for the odd-length hex string into hex_to_words() and enforcing even-length hex strings in decode_from_hex(). Also, the latter returns a std::array<> now.

For the record: I'm somewhat surprised that std::ranges::copy doesn't do an out-of-range check when copying, but just accepts an output-iterator. Also, the CIFuzz build configuration doesn't seem to provide std::ranges::copy, yet. 😢

@randombit
Copy link
Owner Author

For the record: I'm somewhat surprised that std::ranges::copy doesn't do an out-of-range check when copying, but just accepts an output-iterator.

🙀

Also, the CIFuzz build configuration doesn't seem to provide std::ranges::copy, yet. 😢

Yeah this has been a problem for a while, I just poked on google/oss-fuzz#11116

@reneme
Copy link
Collaborator

reneme commented Apr 3, 2024

Our range-based copy_mem() helper claims to be constexpr, but it uses std::memmove(), which isn't. Hence, I extended our helper and used that instead of std::ranges::copy. eee0a0c

@randombit
Copy link
Owner Author

TBH I find the new version much harder to understand than what I did originally.

@reneme
Copy link
Collaborator

reneme commented Apr 3, 2024

Please revert to it then. No hard feelings at all! I think there's value in playing around with this to get a grasp on which abstractions work and which don't. I totally see how the buffer shuffling in my suggestion adds too much complexity for the task its supposed to achieve.

randombit and others added 3 commits April 3, 2024 18:54
This is immediately useful for the NIST reduction code which
has precomputed tables of constant values but likely will prove
useful elsewhere as use of constexpr/constinit expands in the mp
layer.
@randombit randombit force-pushed the jack/word-literals branch from eee0a0c to 63722b8 Compare April 3, 2024 22:56
@randombit randombit merged commit 64676be into master Apr 4, 2024
42 checks passed
@randombit randombit deleted the jack/word-literals branch April 4, 2024 00:08
@reneme
Copy link
Collaborator

reneme commented Apr 4, 2024

Nice. I like the shift_left<4>() addition. Makes it much clearer IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants