-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add static hash helper methods #17590
Comments
There are both good and bad sides to this idea. The succinctness, composability, and reduced strain on the finalizer are all good. But "mess up code quality" is hard to gauge... if you're hashing a lot then it's better to hold the hash object across calls to avoid the setup/teardown overhead. If we were to make it a static method then we need to make it thread safe (by the general rules of the framework), which means one of
None of those are hard, or necessarily bad, but they all have different tradeoffs; and I'm not sure we'd be able to determine which one is best for the framework; other than guessing arbitrarily. And, of course, there's an instance v static naming problem. The good name (ComputeHash) is already taken as a public instance method on a base class. We could redefine it with (And as for the switching between the managed and native implementations, that could be done on the .NET Framework implementation, but in .NET Core we don't have a managed implementation). So... given that we've broken IncrementalHash out into its own class, I think this definitely makes sense to have been the original implementation model. I'm just not sure that it can really be worked in to the existing API without creating more confusion than benefit. (Maybe a new class which has each algorithm as a method? public static class Hash
{
public static byte[] SHA256(byte[] data);
// And the other overloads
...
} ) |
I like the new class idea. There could be an overload for I did not think of sharing the object. I like that a lot! That would increase efficiency to nearly optimal levels. I consider it bad practice to reuse crypto objects because that's error potential in very sensitive code. That's even more reason to add the static functions. If the framework does it correctly callers can rely on it and forget about playing caching tricks. Now the advice can be: "If you want to hash anything, just use the |
I've always been a fan of adding convenience methods for the very common situation of having all the data you intend to hash (or do other crypto operations with). I'd rather not add more classes since we already have classes people know to use; adding static methods should generally be fine (maybe something like Sha256.Hash(byte[])?). As @GSPP pointed out, as the framework, we can experiment with the policies you floated and find something that generally works reasonably well. If somebody finds a situation that isn't good for them, they're free to manage their own instances. Among your suggestions, I like attempting to do some kind of sharing for general efficiency. As long as we're not holding on to a ton of hashers, I think it's fine to have them open for a while... they don't use that many resources. |
We need API proposal. |
API proposal: Let's add static methods to the individual hash classes, e.g.
I'd not add a new class Libraries providing more hash algorithms can simply implement the static method pattern set forth by the framework. We could cache hash algorithm instances. One issue here is that permanent resource usage might result from performing just a single or a few hash operations. The cache could either be a Here is a sample implementation for a thread static solution:
The main advantage is simplicity and performance. The main downside is potentially permanent resource usage from those cached algorithms. There are two instances per thread which can result in hundreds of objects. An alternative here is to simply not cache anything and create a fresh instance each time. There should be overloads for accepting |
On Win10+, the OS exposes a one-shot p/invoke API On other OSes, we could have a thread-static |
API proposal from the thread, more formally stated: namespace System.Security.Cryptography
{
public class SHA256 : HashAlgorithm
{
/*
* new proposed static API
* uses 'new' keyword since instance member HashAlgorithm.ComputeHash(byte[]) exists
*/
public static new byte[] ComputeHash(byte[] data);
/*
* new proposed static API
* computes the hash of the source data and writes it to the destination buffer.
* returns number of bytes written; throws if destination too small.
*/
public static int ComputeHash(ReadOnlySpan<byte> source, Span<byte> destination);
}
// above static methods also added to classes MD5, SHA1, SHA384, SHA512
} |
As ComputeHash is an instance method, making a static method with the same name seems overly confusing (removed ready-for-review) |
It's actually worse than confusing, it's a recompile-breaking change. Anyone with a strongly typed
|
@bartonjs You have an alternative proposal that puts these as static members on a new static type, which would work around the issue. Even with the issues you point out it still seems like something we'd allow to come through API review since it's an actionable, concrete proposal. |
The proposal for a new type named Hash? Since we've spanified things since then, I think it would be public static class Hash
{
public static byte[] SHA256(byte[] input) => throw null;
public static int SHA256(ReadOnlySpan<byte> input, Span<byte> destination) => throw null;
//repeat for MD5, SHA1, SHA384, SHA512
} Morgan's suggestion was to use Hash as the method name for the one-shot, which would be public partial class SHA256
{
public static byte[] Hash(byte[] input) => throw null;
public static int Hash(ReadOnlySpan<byte> input, ReadOnlySpan<byte> destination) => throw null;
} The latter generalizes better (anyone who adds a hash algorithm in a NuGet package, or whatever, can follow that pattern, but they can't add static methods on our Hash accelerator type). Having done a lot of work lately on patterns and practices, I think I prefer it as methods on the existing classes, as long as |
Approved as proposed public partial class MD5
{
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA1
{
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA256
{
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA384
{
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA512
{
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
} |
I assume |
Grumble, grumble, copy pasta. Fixed. |
Well. That's a bad job on my part. Any better name suggestion? I don't think the name masking will work, because of the legacy interaction with ICryptoTransform. |
Regardless of working or not, it is also a breaking change: public static void Main() {
var f = new Frob();
_ = f.Hash;
}
public class Foo {
public byte[]? Hash { get; set; }
}
public class Frob : Foo {
// Uncomment and `Main` will not compile anymore.
//public new byte[] Hash(byte[] cat) => Array.Empty<byte>();
}
... nothing with precedent. We could, though it's somewhat redundant, add the hash name? public byte[] HashSHA256(...);
public bool HashMD5(...); // or whatever the "right" casing is. There's also The alternative is we put the methods on a different class or new class. I'm starting to warm up to the idea of a new class: during API review I asked 'why not on public sealed class MyHsmBackedSHA256 : SHA256
{
// impl
} Calling So this was a very long way of me to say |
re-reading through the proposal, I will now suggest / endorse: namespace System.Security.Cryptography {
public static class Hash {
public static byte[] HashMD5(byte[] source) => throw null;
public static byte[] HashMD5(ReadOnlySpan<byte> source) => throw null;
public static int HashMD5(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashMD5(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static byte[] HashSHA1(byte[] source) => throw null;
public static byte[] HashSHA1(ReadOnlySpan<byte> source) => throw null;
public static int HashSHA1(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashSHA1(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static byte[] HashSHA256(byte[] source) => throw null;
public static byte[] HashSHA256(ReadOnlySpan<byte> source) => throw null;
public static int HashSHA256(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashSHA256(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static byte[] HashSHA384(byte[] source) => throw null;
public static byte[] HashSHA384(ReadOnlySpan<byte> source) => throw null;
public static int HashSHA384(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashSHA384(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static byte[] HashSHA512(byte[] source) => throw null;
public static byte[] HashSHA512(ReadOnlySpan<byte> source) => throw null;
public static int HashSHA512(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashSHA512(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
} |
And, while I'm at it, can throw in bonus API: namespace System.Security.Cryptography {
public static class Hash {
public static byte[] HashName(HashAlgorithmName hashAlgorithmName, byte[] source) => throw null;
// needs better name. Compute? Just can't be hash unless class name changes.
// etc
}
} and maybe we don't even need the Using a new class at least makes it easier to use |
A standalone Hash class, using the algorithm names as verbs, would certainly solve the problem in-framework, but doesn't create an extensible pattern, so there's nothing for someone to really follow if they wanted to "bring back" RIPEMD160, or wanted to provide historical algorithms the framework never had, like MD4.
Assuming that the derived type of SHA256 is still implementing SHA256, there's room for "but I thought SHA256CryptoServiceProvider.SomeVerb(data) used CAPI...", but it's always SHA256. That's better (in my head) than "SHA256.SomeVerb(algName, data)" actually being MD5. (You could also say the same thing about RSACryptoServiceProvider, but we made that type "just (mostly) work" on non-Windows, showing sometimes "right answer" is good enough.) "OneShot" is what I'd call this functionality for an internal method, or perhaps "StaticHash". "OneShot" doesn't feel right for public API, and while "StaticHash" is still a little on-the-nose it does avoid combining "Hash" from the instance members and type name with "Digest" as the synonym verb. Another reasonable thing is "StatelessHash", separating it from ComputeHash which has a stateful mix with the object (Disposed or not, and poor interaction with the TransformBlock method). (Or "HashStateless", but that feels less good in my gut) |
Okay, if we want to stick with statics on existing algs, I am not terribly concerned with the name. Could do any of yours, or |
Maybe Alternatives: |
We discussed this again, and accepted HashData. public partial class MD5
{
public static byte[] HashData(byte[] source) => throw null;
public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA1
{
public static byte[] HashData(byte[] source) => throw null;
public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA256
{
public static byte[] HashData(byte[] source) => throw null;
public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA384
{
public static byte[] HashData(byte[] source) => throw null;
public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA512
{
public static byte[] HashData(byte[] source) => throw null;
public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
} |
I think I can squeeze this one in over the weekend. |
Regarding the suggestion of "HashBytes": there are existing methods named (Try)HashData... admittedly, they're all protected; but there aren't any named "HashBytes", so "HashData" is more platform-consistent. |
If you want to PAL through to native one-shots, that'd be best. If you want to go quicker with |
That's the intention. We don't really have a native PAL for CNG (do we?) and CNG's one-shot |
Well, we have the PAL (split build for the OS), we just don't carry a native shim. |
New APIs are in. Work left to be done:
|
Edit by @GrabYourPitchforks on 20 Jan 2020: Formal API proposal written at https://github.com/dotnet/corefx/issues/9369#issuecomment-576445733.
Computing a hash code currently requires this code:
Or the more awkward
HashAlgorithm.Create("SHA256")
.This code is alright. It's not the end of the world. But I think it should be slimmer than that:
Benefits:
SHA256.ComputeHash
can look at it's input size and dynamically pick the fastest implementation. I found the following to be optimal through testing on Windows x64:estimatedDataLength <= 512 ? new SHA1Managed() : HashAlgorithm.Create("SHA1")
. Apparently, using the Windows crypto API has quite some per-hash cost.I request that static helper method be added to the framework. This seems like an attractive case for a community contribution.
Proposal:
Using this pattern on all of the HashAlgorithm-derived types that are not KeyedHashAlgorithm-derived types (MD5, SHA1, SHA256, SHA384, SHA512):
The text was updated successfully, but these errors were encountered: