Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactored SHA-3 in terms of newly added keccak-FIPS; added KMAC #3525

Closed
wants to merge 4 commits into from

Conversation

falko-strenzke
Copy link
Collaborator

@falko-strenzke falko-strenzke commented Apr 26, 2023

Adding only KMAC256 for now. For adding KMAC128, a generic base should be created for KMAC and then both variants derived from it.

@falko-strenzke falko-strenzke requested a review from reneme April 26, 2023 16:19
@coveralls
Copy link

coveralls commented Apr 26, 2023

Coverage Status

Coverage: 91.714% (-0.4%) from 92.148% when pulling 1e8f985 on falko-strenzke:kmac_04_23 into b6e705f on randombit:master.

@@ -251,6 +255,18 @@ std::unique_ptr<HashFunction> HashFunction::create(std::string_view algo_spec,
}
#endif


#if defined(BOTAN_HAS_KECCAK_FIPS)
if(req.algo_name() == "Keccak-FIPS[512]")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in the ticket, I really dislike this approach because the Keccak being exposed here is not, as I understand it, approved by NIST as a hash function, nor is it implemented by any library I am aware of.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that makes sense. I removed it from the external interface.

Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall direction of this. It certainly takes the internal implementation closer to the specs and makes it easier to understand. 👍
I've added a few high-level remarks regarding the implementation strategy.

Maybe related: With several PQC algorithms using SHAKE as XOF, it might be worth investing in a more generic interface for the extendable output flavours as well.

src/lib/hash/sha3/sha3.h Outdated Show resolved Hide resolved
src/lib/mac/kmac/kmac.cpp Outdated Show resolved Hide resolved
src/lib/mac/kmac/kmac.h Show resolved Hide resolved
src/lib/mac/kmac/kmac.h Outdated Show resolved Hide resolved
Comment on lines 18 to 20
/**
* KECCAK FIPS
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great the condense the discussion about the two variants of Keccak in #3279 into some clarifying Doxygen remarks here and in keccak.h. I'm sure future users and contributors would be grateful. 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added clarifying documentation in keccak_fips.h which comes in the next commit. The problem with keccak.h is that I am not really sure what it implements. I suspect it is what is implemented in "KECCAK - Final Algorithm Package" at https://csrc.nist.gov/projects/hash-functions/sha-3-project – but I didn't verify. It might be worthwhile to do so, but currently I don't have the time for it nor does this have priority in our current project.

src/lib/hash/keccak_fips/keccak_fips.h Outdated Show resolved Hide resolved
* KECCAK FIPS
*/

class Keccak_FIPS_generic : public HashFunction
Copy link
Collaborator

@reneme reneme May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic idea to consider: Would it make sense to define this as a template with the capacity being a value template parameter? Downstream users of the static building blocks could then do Keccak_FIPS_256::absorb() without explicitly stating the bitrate every time.

Not sure whether that's at all feasible, but maybe its worth a try in a follow-up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought but refrained from trying it right now because I wasn't sure how much code would then have to go into the header and what would be the consequences of that. Right now I think it wouldn't be a problem as the header is not part of the public API. Making the capacity (and maybe other parameters) template parameters may improve the readability of the code and - at least potentially – increase performance.

However I would tend to first have the basic implementation merged and then address this later like you are proposing as well.

Comment on lines -33 to -34
std::unique_ptr<HashFunction> new_object() const override;
std::unique_ptr<HashFunction> copy_state() const override;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those should have overloads in this class. It's somewhat hidden behind the polymorphy of HashFunction but as a user I would expect code along those lines to work:

SHA_3 sha(256);
HashFunction* other_sha = sha.new_object().get();

// this will fail, because `other_sha` is actually an instance of `Keccak_FIPS`
// and not of `SHA_3` (which I would have expected)
SHA_3* other_sha_ptr = dynamic_cast<SHA_3>(other_sha);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly these overloads would be needed in all 4 base classes SHA_3_224, ... Can you confirm this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. So that was actually an issue with the existing code already, no?
Come to think of it: sha3.h is an internal header, so applications don't actually have access to the SHA_3_* types and wouldn't be able to do the dynamic_cast as I suggested.

Still it is a code smell, in my opinion. @randombit what's your take on that?

Maybe CRTP could be helpful here, if the SHA_3 base class is not meant to be used as an interface by consuming code. It could potentially even be applied to Keccak_FIPS at one point.

A sketch for reference:

template <class DerivedT>
class SHA_3 : public Keccak_FIPS_generic
   {
   public:
      SHA_3(size_t output_bits) : Keccak_FIPS_generic(...) {}

      std::unique_ptr<HashFunction> new_object() const override
         {
         return std::make_unique<DerivedT>(...);
         }

      std::unique_ptr<HashFunction> copy_state() const override
         {
         return std::make_unique<DerivedT>(...);
         }
   };

class SHA_3_224 : public SHA_3<SHA_3_224> {}
class SHA_3_256 : public SHA_3<SHA_3_256> {}
// ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like enourmously complicating vs just using ownership not inheritance.

SHA-3 is a hash function.
SHA-3 is not "a Keccak"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is unrelated to the inheritance of Keccak, though.

My mental model of ::new_object() is that it returns an instance of the very same hash function class. The fact that SHA_3 is not marked final but has sub-classes (SHA_3_224, SHA_3_256, ...) violates that mental model in the current implementation.

It's really just a nitpick, but it struck me as unusual. Internal code might do something like that:

std::unique_ptr<HashFunction> sha3_256 = std::make_unique<SHA_3_256>();
std::unique_ptr<HashFunction> bare_sha3 = sha3_256->new_object();

Now, both objects will behave exactly the same. But they won't be instances of exactly the same class. While sha3_256 is an instance of the class SHA_3_256, sha3_bare will be an instance of SHA_3.

Applications will always go through HashFunction::create("SHA-3(256)") and therefore get an instance of SHA_3 (without the _256 suffix).

Again: just a minor code smell, if at all.

@@ -45,13 +45,13 @@ void SHAKE_128::clear()

void SHAKE_128::add_data(const uint8_t input[], size_t length)
{
m_S_pos = SHA_3::absorb(SHAKE_128_BITRATE, m_S, m_S_pos, input, length);
m_S_pos = Keccak_FIPS_generic::absorb(SHAKE_128_BITRATE, m_S, m_S_pos, input, length);
Copy link
Collaborator

@reneme reneme May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to the idea of making Keccak_FIPS_generic a template (with the capacity being a value template-parameter), this might be written as:

Keccak_FIPS_128::absorb(m_S, m_S_pos, input, length);

which I would find desirable. But as stated there: certainly okay to look at separately as a follow-up.

src/lib/hash/hash.cpp Outdated Show resolved Hide resolved
#include <string>

namespace Botan {

/**
* SHA-3
*/
class SHA_3 : public HashFunction

class SHA_3 : public Keccak_FIPS
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheritance here seems just wrong.

TBH I do not understand at all the motivation for moving the SHA-3 permutation logic, nor for how Keccak_FIPS simplifies anything.

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike adding Keccak as a HashFunction. It is at best an implementation detail. If you really feel it is helpful, define it within kmac.cpp, in terms of the functions on SHA_3.

Considering this patch is net +500 lines of code, I am not even seeing how this approach simplifies anything.

{
// KMAC supports key lengths from zero up to 2²⁰⁴⁰ (2^(2040)) bits
// https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-185.pdf#page=28
return Key_Length_Specification(0, std::numeric_limits<size_t>::max());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no practical reason to support keys larger than 512 bits.

This will just cause weird confusion (and out of memory errors) when someone tries to use mac->maximum_keylength()

Copy link
Collaborator Author

@falko-strenzke falko-strenzke May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the potential problem but I think this has some implications. KMAC may be used in arbitrary constuctions. Look for instance how the NIST KMAC-based key derivation functions are specified. Here a 'salt' value is used as the key. A salt value might be chosen based on very different consideration than a key. Accordingly, the Botan implementation may fail to support certain construction if it limits the key size to 512 bits. But I completely agree that the integer maximum is a bad idea. Maybe we find a reasonable value that accounts for other uses to a certain degree but also avoids an excessive memory allocation?

Copy link
Collaborator

@reneme reneme May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 3.0 wouldn't have been out yet, I would have suggested making the maximum value a std::optional<> to clearly communicate "no relevant upper limit in practice".

Maybe provide a "smallish" upper bound but allow to use KMAC with larger keys? This would require a way to pass Symmetric_Algorithm::valid_keylength() for larger keys than Key_Length_Spec advertises in the particular case of KMAC. That makes me feel uneasy as well. 🤔

@reneme
Copy link
Collaborator

reneme commented May 5, 2023

I really dislike adding Keccak as a HashFunction. It is at best an implementation detail.

Mhh, I think I get your point. Keccak is a construction that is "more" than just a HashFunction. Composition (instead of inheritance) better reflects the fact that Keccak is merely used as a building block for the HashFunction that SHA-3 is. Similarly, it could be used to build a SHAKE XOF, or a KMAC message authentication code.

To me Keccak_FIPS seemed equivalent to the MDx_HashFunction concept in mdx_hash.h or (with a slight stretch) similar to the Block_Cipher_Fixed_Params templated base class.

My main source of confusion is that these building blocks (absorb, permute, ...) are currently in SHA_3.

@falko-strenzke
Copy link
Collaborator Author

falko-strenzke commented May 8, 2023

I really dislike adding Keccak as a HashFunction. It is at best an implementation detail. If you really feel it is helpful, define it within kmac.cpp, in terms of the functions on SHA_3.

I think I explained at length how the changes simplify the code and make it clearer in #3279 . I am not going to repeat this here again. As long as there are no arguments negating the one's I've given, I don't expect I will change my mind.

Considering this patch is net +500 lines of code, I am not even seeing how this approach simplifies anything.

The number of changed lines is a mere artifact due to the limitations of git. I moved the entire SHA-3 code to Keccak_FIPS with git mv. But in a pull request, git cannot reflect that. Further, I fail to see how the number of changed lines should be a reliable measure of the degree to which the change simplifies the code. As the initial state of the code may be arbitrary, arbitrary (-ily large) changes may be necessary to optimize it.

@falko-strenzke
Copy link
Collaborator Author

I really dislike adding Keccak as a HashFunction. It is at best an implementation detail.

Mhh, I think I get your point. Keccak is a construction that is "more" than just a HashFunction. Composition (instead of inheritance) better reflects the fact that Keccak is merely used as a building block for the HashFunction that SHA-3 is. Similarly, it could be used to build a SHAKE XOF, or a KMAC message authentication code.

To me Keccak_FIPS seemed equivalent to the MDx_HashFunction concept in mdx_hash.h or (with a slight stretch) similar to the Block_Cipher_Fixed_Params templated base class.

My main source of confusion is that these building blocks (absorb, permute, ...) are currently in SHA_3.

I think the idea which supports using Keccak_FIPS as a base class is pretty straight forward: Keccak[c] is the base for all Keccac[c]-derived functions such as SHA-3, KMAC, etc. Thus if we would relabel it Keccak_FIPS_base, it would become clear enough that inheritance makes sense. However, I don't mind making it a member, if that is preferred for some reason. Since I understand that is what both of you prefer, I suggest making it a member. Probably then the discussion about turning Keccak_FIPS into a template is not relevant any more, as the code will not become more readable through this any more and the exepected performance gain through some compile-time known values will be negligible.

@falko-strenzke
Copy link
Collaborator Author

I really dislike adding Keccak as a HashFunction. It is at best an implementation detail. If you really feel it is helpful, define it within kmac.cpp, in terms of the functions on SHA_3.

Looking at mxd_hash.h I find

/**
* MDx Hash Function Base Class
*/
class MDx_HashFunction : public HashFunction
[...]

Wouldn't it be straightforward to use the same approach for Keccak[c], i.e. use Keccak_FIPS as the base class for all hash functions of this type? It may very well be a purely virtual class. Inside KMAC, we would then instantiate a derived class, which creates the specific hash function used by KMAC.

If this is not desired, then I would be interested to learn the difference between the two patterns (inheritance for MDx hash but not for Keccak[c]-based hash).

@reneme
Copy link
Collaborator

reneme commented May 10, 2023

If this is not desired, then I would be interested to learn the difference between the two patterns (inheritance for MDx hash but not for Keccak[c]-based hash).

Personally I can live with that. But may I suggest a hybrid approach?

  1. Keccak[c] as a free-standing entity (that is not bound to the interface or inheritance hierarchy of Hash_Function)
  2. A Keccak_HashFunction that condenses the common concerns of such hashes and acts as a base for SHA_3 and KMAC.

The free-standing base implementation could then be used as an implementation detail of other Keccak-based constructions (like a XOF based on SHAKE for example). As far as I can tell, the Hash_Function interface isn't necessarily fit for that use case.

What I have in mind as the "free-standing" portion are likely just the existing SHA_3::absorb, ::extend, ... functions. Just combined under a different (more generic) name.

@falko-strenzke
Copy link
Collaborator Author

If this is not desired, then I would be interested to learn the difference between the two patterns (inheritance for MDx hash but not for Keccak[c]-based hash).

Personally I can live with that. But may I suggest a hybrid approach?

1. `Keccak[c]` as a free-standing entity (that is not bound to the interface or inheritance hierarchy of `Hash_Function`)

2. A `Keccak_HashFunction` that condenses the common concerns of such hashes and acts as a base for `SHA_3` and `KMAC`.

The free-standing base implementation could then be used as an implementation detail of other Keccak-based constructions (like a XOF based on SHAKE for example). As far as I can tell, the Hash_Function interface isn't necessarily fit for that use case.

That makes sense to me. @randombit Do you also agree to this approach?

@randombit
Copy link
Owner

Re MDx_HashFunction don't read too much into it - that's not how it should be and is just a refactoring in waiting. Thankfully now that all of the implementations are hidden, removing it even possible to do in a release branch. In general at this point I'm quite strongly against C++ inheritance for implemention. IMO inheritance should only be used to express interface relationships. You can see this trend in PRs such as #3210

Keccak[c] as a free-standing entity (that is not bound to the interface or inheritance hierarchy of Hash_Function)

This seems fine. Not really sure where it should live, I suppose it could be in a new permutations subdir.

A Keccak_HashFunction that condenses the common concerns of such hashes and acts as a base for SHA_3 and KMAC.

This I do not like for aforementioned reasons. It's also not clear how much a common base helps anything. My take is that needing some base class to help implementation just means the Keccak[c] type was not properly designed.

@reneme
Copy link
Collaborator

reneme commented May 22, 2023

A Keccak_HashFunction that condenses the common concerns of such hashes and acts as a base for SHA_3 and KMAC.

This I do not like for aforementioned reasons. It's also not clear how much a common base helps anything. My take is that needing some base class to help implementation just means the Keccak[c] type was not properly designed.

It helps separating API contracts (implementation of the HashFunction interface) from permutation implementation (Keccak[c], MD_Hash (maybe?)) from concrete algorithm parameterization (SHA-3, SHAKE, ...).

I wouldn't see the need for such an "adapter" as a weakness in the design of Keccak[c]. When abstracting things into common base implementations we will usually need to "translate" to public interfaces (e.g. HashFunction, MessageAuthenticationCode or (maybe at some point) Xof). We can do that translation repeatedly in every concrete algorithm class (SHA-3, SHAKE, ...). Or we can lift it into a common adapter base class.

I've sketched that idea based on your refactoring of MDx_Hash here: #3553.

@falko-strenzke
Copy link
Collaborator Author

The new PR #3570 obsoletes this one, thus closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants