-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Rfc2898DeriveBytes #24897
Comments
It's about as good as it's going to get. It was fixed up in dotnet/corefx#23269. We can't really drop any lower because the .NET API is "streaming" and the lower level API is one-shot. |
What if we added to this class static methods that performed a one-shot without streaming? That would solve our immediate issue where we don't want this logic duplicated in ASP.NET, and it should make this functionality easier to consume for the 99% of callers who just want one-shot anyway. |
Just for comparison...with aspnet dataprotection on Win 10 x64 with .NET Core 2.1.0-preview2-26225-03, we still get ~20% better perf calling into bcrypt.
Code: The benchmark code: using System.Security.Cryptography;
using System.Text;
using BenchmarkDotNet.Attributes;
using Microsoft.AspNetCore.Cryptography.KeyDerivation;
using Microsoft.AspNetCore.Cryptography.KeyDerivation.PBKDF2;
namespace PBKDF2Benchmark
{
public class HashBenchmark
{
[Params(1000, 10000, 100000)]
public int Iterations { get; set; }
private static readonly byte[] _salt = GenerateSalt();
private static readonly Win7Pbkdf2Provider _win7Provider = new Win7Pbkdf2Provider();
private static readonly Win8Pbkdf2Provider _win8Provider = new Win8Pbkdf2Provider();
private static byte[] GenerateSalt()
{
using (var random = new RNGCryptoServiceProvider())
{
var result = new byte[64];
random.GetBytes(result);
return result;
}
}
[Benchmark]
public byte[] Win8Pbkdf2ProviderSHA512()
{
var key = _win8Provider.DeriveKey("Hello World", _salt, KeyDerivationPrf.HMACSHA512, Iterations, 512 / 8);
return key;
}
[Benchmark]
public byte[] Win7Pbkdf2ProviderSHA512()
{
var key = _win7Provider.DeriveKey("Hello World", _salt, KeyDerivationPrf.HMACSHA512, Iterations, 512 / 8);
return key;
}
[Benchmark]
public byte[] Rfc2898DeriveBytesSHA512()
{
byte[] _password = Encoding.UTF8.GetBytes("Hello World");
var rfc = new Rfc2898DeriveBytes(_password, _salt, Iterations, HashAlgorithmName.SHA512);
return rfc.GetBytes( 512 / 8 );
}
}
} |
Is it allowed to add/use the Unsafe library in System.Security.Cryptography.Algorithms? tl;dr: We could gain another ~5%.
then benchmarks show the following, where BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.168 (1809/October2018Update/Redstone5)
Intel Core i9-9900K CPU 3.60GHz, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=3.0.100-preview-009812
[Host] : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
ShortRun : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3
The modified class: // Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using System.Buffers;
using System.Text;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using Internal.Cryptography;
namespace System.Security.Cryptography
{
public class Rfc2898DeriveBytes : DeriveBytes
{
private const int MinimumSaltSize = 8;
private readonly byte[] _password;
private byte[] _salt;
private uint _iterations;
private HMAC _hmac;
private readonly int _blockSize;
private byte[] _buffer;
private uint _block;
private int _startIndex;
private int _endIndex;
public HashAlgorithmName HashAlgorithm { get; }
public Rfc2898DeriveBytes(byte[] password, byte[] salt, int iterations)
: this(password, salt, iterations, HashAlgorithmName.SHA1)
{
}
public Rfc2898DeriveBytes(byte[] password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm)
{
if (salt == null)
throw new ArgumentNullException(nameof(salt));
if (salt.Length < MinimumSaltSize)
throw new ArgumentException(SR.Cryptography_PasswordDerivedBytes_FewBytesSalt, nameof(salt));
if (iterations <= 0)
throw new ArgumentOutOfRangeException(nameof(iterations), SR.ArgumentOutOfRange_NeedPosNum);
if (password == null)
throw new NullReferenceException(); // This "should" be ArgumentNullException but for compat, we throw NullReferenceException.
_salt = salt.CloneByteArray();
_iterations = (uint)iterations;
_password = password.CloneByteArray();
HashAlgorithm = hashAlgorithm;
_hmac = OpenHmac();
// _blockSize is in bytes, HashSize is in bits.
_blockSize = _hmac.HashSize >> 3;
Initialize();
}
public Rfc2898DeriveBytes(string password, byte[] salt)
: this(password, salt, 1000)
{
}
public Rfc2898DeriveBytes(string password, byte[] salt, int iterations)
: this(password, salt, iterations, HashAlgorithmName.SHA1)
{
}
public Rfc2898DeriveBytes(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm)
: this(Encoding.UTF8.GetBytes(password), salt, iterations, hashAlgorithm)
{
}
public Rfc2898DeriveBytes(string password, int saltSize)
: this(password, saltSize, 1000)
{
}
public Rfc2898DeriveBytes(string password, int saltSize, int iterations)
: this(password, saltSize, iterations, HashAlgorithmName.SHA1)
{
}
public Rfc2898DeriveBytes(string password, int saltSize, int iterations, HashAlgorithmName hashAlgorithm)
{
if (saltSize < 0)
throw new ArgumentOutOfRangeException(nameof(saltSize), SR.ArgumentOutOfRange_NeedNonNegNum);
if (saltSize < MinimumSaltSize)
throw new ArgumentException(SR.Cryptography_PasswordDerivedBytes_FewBytesSalt, nameof(saltSize));
if (iterations <= 0)
throw new ArgumentOutOfRangeException(nameof(iterations), SR.ArgumentOutOfRange_NeedPosNum);
_salt = Helpers.GenerateRandom(saltSize);
_iterations = (uint)iterations;
_password = Encoding.UTF8.GetBytes(password);
HashAlgorithm = hashAlgorithm;
_hmac = OpenHmac();
// _blockSize is in bytes, HashSize is in bits.
_blockSize = _hmac.HashSize >> 3;
Initialize();
}
public int IterationCount
{
get
{
return (int)_iterations;
}
set
{
if (value <= 0)
throw new ArgumentOutOfRangeException(nameof(value), SR.ArgumentOutOfRange_NeedPosNum);
_iterations = (uint)value;
Initialize();
}
}
public byte[] Salt
{
get
{
return _salt.CloneByteArray();
}
set
{
if (value == null)
throw new ArgumentNullException(nameof(value));
if (value.Length < MinimumSaltSize)
throw new ArgumentException(SR.Cryptography_PasswordDerivedBytes_FewBytesSalt);
_salt = value.CloneByteArray();
Initialize();
}
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_hmac != null)
{
_hmac.Dispose();
_hmac = null;
}
if (_buffer != null)
Array.Clear(_buffer, 0, _buffer.Length);
if (_password != null)
Array.Clear(_password, 0, _password.Length);
if (_salt != null)
Array.Clear(_salt, 0, _salt.Length);
}
base.Dispose(disposing);
}
public override byte[] GetBytes(int cb)
{
Debug.Assert(_blockSize > 0);
if (cb <= 0)
throw new ArgumentOutOfRangeException(nameof(cb), SR.ArgumentOutOfRange_NeedPosNum);
byte[] password = new byte[cb];
int offset = 0;
int size = _endIndex - _startIndex;
if (size > 0)
{
if (cb >= size)
{
Buffer.BlockCopy(_buffer, _startIndex, password, 0, size);
_startIndex = _endIndex = 0;
offset += size;
}
else
{
Buffer.BlockCopy(_buffer, _startIndex, password, 0, cb);
_startIndex += cb;
return password;
}
}
Debug.Assert(_startIndex == 0 && _endIndex == 0, "Invalid start or end index in the internal buffer.");
byte[] ui = ArrayPool<byte>.Shared.Rent(_blockSize);
try
{
while (offset < cb)
{
byte[] T_block = Func(ui);
int remainder = cb - offset;
if (remainder > _blockSize)
{
Buffer.BlockCopy(T_block, 0, password, offset, _blockSize);
offset += _blockSize;
}
else
{
Buffer.BlockCopy(T_block, 0, password, offset, remainder);
offset += remainder;
Buffer.BlockCopy(T_block, remainder, _buffer, _startIndex, _blockSize - remainder);
_endIndex += (_blockSize - remainder);
return password;
}
}
}
finally
{
Array.Clear(ui, 0, _blockSize);
ArrayPool<byte>.Shared.Return(ui);
}
return password;
}
public byte[] CryptDeriveKey(string algname, string alghashname, int keySize, byte[] rgbIV)
{
// If this were to be implemented here, CAPI would need to be used (not CNG) because of
// unfortunate differences between the two. Using CNG would break compatibility. Since this
// assembly currently doesn't use CAPI it would require non-trivial additions.
// In addition, if implemented here, only Windows would be supported as it is intended as
// a thin wrapper over the corresponding native API.
// Note that this method is implemented in PasswordDeriveBytes (in the Csp assembly) using CAPI.
throw new PlatformNotSupportedException();
}
public override void Reset()
{
Initialize();
}
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Security", "CA5350", Justification = "HMACSHA1 is needed for compat. (https://github.com/dotnet/corefx/issues/9438)")]
private HMAC OpenHmac()
{
Debug.Assert(_password != null);
HashAlgorithmName hashAlgorithm = HashAlgorithm;
if (string.IsNullOrEmpty(hashAlgorithm.Name))
throw new CryptographicException(SR.Cryptography_HashAlgorithmNameNullOrEmpty);
if (hashAlgorithm == HashAlgorithmName.SHA1)
return new HMACSHA1(_password);
if (hashAlgorithm == HashAlgorithmName.SHA256)
return new HMACSHA256(_password);
if (hashAlgorithm == HashAlgorithmName.SHA384)
return new HMACSHA384(_password);
if (hashAlgorithm == HashAlgorithmName.SHA512)
return new HMACSHA512(_password);
throw new CryptographicException(SR.Format(SR.Cryptography_UnknownHashAlgorithm, hashAlgorithm.Name));
}
private void Initialize()
{
if (_buffer != null)
Array.Clear(_buffer, 0, _buffer.Length);
_buffer = new byte[_blockSize];
_block = 1;
_startIndex = _endIndex = 0;
}
// This function is defined as follows:
// Func (S, i) = HMAC(S || i) ^ HMAC2(S || i) ^ ... ^ HMAC(iterations) (S || i)
// where i is the block number.
private byte[] Func(byte[] ui)
{
int tempLength = _salt.Length + sizeof(uint);
Span<byte> temp = tempLength > 100
? new byte[tempLength]
: stackalloc byte[tempLength];
_salt.CopyTo(temp);
Helpers.WriteInt(_block, temp, _salt.Length);
Span<byte> uiSpan = new Span<byte>(ui, 0, _blockSize);
if (!_hmac.TryComputeHash(temp, uiSpan, out int bytesWritten) || bytesWritten != _blockSize)
{
ThrowCryptographicException();
}
byte[] ret = new byte[_blockSize];
uiSpan.CopyTo(ret);
for (int i = 2; i <= _iterations; i++)
{
if (!_hmac.TryComputeHash(uiSpan, uiSpan, out bytesWritten) || bytesWritten != _blockSize)
{
ThrowCryptographicException();
}
// The allowed block sizes are all multiples of 4
// SHA1 is 160bit -> it needs uint, everything could use ulong, which would be faster
// Replace with https://github.com/dotnet/corefx/issues/30746
int rounds = _blockSize >> 2;
ref uint[] retUint = ref Unsafe.As<byte[], uint[]>(ref ret);
ref uint[] uiUint = ref Unsafe.As<byte[], uint[]>(ref ui);
for (int j = 0; j < rounds; j++)
{
retUint[j] ^= uiUint[j];
}
}
// increment the block count.
_block++;
return ret;
}
private static void ThrowCryptographicException()
{
throw new CryptographicException();
}
}
} |
A straw man proposal: namespace System.Security.Cryptography
{
public class Rfc2898DeriveBytes
{
public static byte[] Pbkdf2DeriveBytes(
byte[] password,
byte[] salt,
int iterations,
int length,
HashAlgorithmName hashAlgorithm);
public static byte[] Pbkdf2DeriveBytes(
ReadOnlySpan<byte> password,
ReadOnlySpan<byte> salt,
int iterations,
int length,
HashAlgorithmName hashAlgorithm);
public static void Pbkdf2DeriveBytes(
ReadOnlySpan<byte> password,
ReadOnlySpan<byte> salt,
int iterations,
HashAlgorithmName hashAlgorithm,
Span<byte> destination);
public static byte[] Pbkdf2DeriveBytes(
string password,
byte[] salt,
int iterations,
int length,
HashAlgorithmName hashAlgorithm);
public static byte[] Pbkdf2DeriveBytes(
ReadOnlySpan<char> password,
ReadOnlySpan<byte> salt,
int iterations,
int length,
HashAlgorithmName hashAlgorithm);
public static void Pbkdf2DeriveBytes(
ReadOnlySpan<char> password,
ReadOnlySpan<byte> salt,
int iterations,
HashAlgorithmName hashAlgorithm,
Span<byte> destination);
}
} Thoughts:
|
I ran the benchmark from the original issue again: BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=5.0.102
[Host] : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
DefaultJob : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
The difference between KeyDerivation is down to ~15% and with switching the byte xor in Func() to be uint based, see #26651 the difference gets down to ~10% on an amd zen3. |
This also opens the doors to us for platform native implementations ( |
Oh, you beat me to a strawman. Nice.
Nope. We don't want defaults in crypto API, because eventually all defaults are bad. One day SHA-2 will also be a broken algorithm.
The existing class seems right.
Can the proposed trys fail (other than, perhaps, a length-zero destination)? To me this is more like RNG.Fill... you give it a buffer, it writes as much to it as you asked.
I like your set, except that I'd make the Span-writing ones void with Fill semantics. Better than my set from the other issue. |
Yeah that was me trying to switch to decaf coffee. I updated mine. I kept all the method names the same. We could consider making the Span-filling ones named |
What's the reason for an allocation in older versions of Windows? |
Windows 10 has singleton values (which they pseudo-handles) for hash algorithms. Older Windowses don't, so we'd need to instantiate the hash algorithm. While we /could/ write it with IntPtr and try-finally, the done thing is a SafeHandle. |
Windows 10 has HMAC pseudo handles https://docs.microsoft.com/en-us/windows/win32/seccng/cng-algorithm-pseudo-handles. For < Windows 10, we could end up allocating a handle. |
Bah. That's OpenSSL 1.1+. @bartonjs I don't suppose you've been thinking about dropping 1.0 anytime soon? Seems like we could shim it for older openssls. |
I haven't looked at the supported OS matrix for .NET 6. Eyeballing it:
So... maybe? But to be conservative I'm probably going to make the loader support 3.0/1.1/1.0 for 6. Unless that proves impossible, then I'll have to have a conversation with release management around my ability to support that wide of a range of systems.
Yep, we'd just define CryptoNative_PBKDF2 (or whatever) and set up a FALLBACK_FUNCTION for if 1.0 is the loaded version. |
The one-shot |
Would that help that much? the dataprotection pbkdf2 function on win10 is not much faster and that's using windows' bcrypt api, assuming this is true for all the other implementations too, why would you want to have native implementations? I understand for a large perf gain, which in this case doubles as a security gain too, but for somehting in single digit % range. The source code complexity of PBKDF2 isn't high and it can be well tested, so that can't be it either. |
We have not psuedo-ified the HMAC algorithms which are different handles I believe. Edit: Sorry if I misunderstood, perhaps you were just pointing out a place we are already using pseudo handles.
Maybe not, but I think then the HMAC one-shot becomes a pre-req for this. Regardless I can bench it. But if, say, we want to have something for the dataprotection to use, whatever we add can't be slower, or they'll keep using their own implementation. |
@vcsjones Yeah, it's a possible source that we could copy + paste from if we wanted to extend the same benefit to HMAC and KDF handles. It and a few related refactorings showed a modest increase in throughput. Need to determine if it's worth taking the churn in this code. |
I noodled with a proof-of-concept for this, I think the API proposal is reasonable, aside from the name We could implement this in managed code if we wanted, which might actually be the right way. OpenSSL doesn't have this until OpenSSL v1.1, and macOS has zero documentation for If we want to implement in managed code and allocation-free, then this proposal is blocked on #40012 since we have no way to do no-allocating HMAC operations. |
Doing it through only one P/Invoke will cut down on at least (iteration*blocks)-1 transitions, which should allow higher iteration counts at the same clock time. Assuming that only OpenSSL 1.0 is a problem, we'd just have to implement PBKDF2 as a fallback function. |
That appears to be the case. CNG is known to be good, and macOS's API is straight forward enough. One thing that needs to be checked are minimums across platforms for consistency. OpenSSL follows SP800-132 for minimums: So the salt size, number of iterations, and number of bytes to extract from the KDF will have some minimum. As far as how that affects the API proposals, I don't think it does, other than they will throw an |
Some of the min Limits might affect Test vectors, a few published ones include Iteration Count = 1. |
Update based on work in the draft PR:
|
Nit: We should use the throwing (non-substituting) version of UTF-8 when performing the chars -> bytes conversion. |
@GrabYourPitchforks added to the TODO in the PR. |
namespace System.Security.Cryptography
{
public partial class Rfc2898DeriveBytes
{
public static byte[] Pbkdf2(
byte[] password,
byte[] salt,
int iterations,
HashAlgorithmName hashAlgorithm,
int outputLength);
public static byte[] Pbkdf2(
ReadOnlySpan<byte> password,
ReadOnlySpan<byte> salt,
int iterations,
HashAlgorithmName hashAlgorithm,
int outputLength);
public static void Pbkdf2(
ReadOnlySpan<byte> password,
ReadOnlySpan<byte> salt,
Span<byte> destination,
int iterations,
HashAlgorithmName hashAlgorithm);
public static byte[] Pbkdf2(
string password,
byte[] salt,
int iterations,
HashAlgorithmName hashAlgorithm,
int outputLength);
public static byte[] Pbkdf2(
ReadOnlySpan<char> password,
ReadOnlySpan<byte> salt,
int iterations,
HashAlgorithmName hashAlgorithm,
int outputLength);
public static void Pbkdf2(
ReadOnlySpan<char> password,
ReadOnlySpan<byte> salt,
Span<byte> destination,
int iterations,
HashAlgorithmName hashAlgorithm);
}
} |
Is there an issue tracking the removal/replacement of the PBKDF2 implementation ASP.NET Core? |
Hm, not anymore. There was dotnet/aspnetcore#2508 but it's been closed for a while. |
Thanks. I filed dotnet/aspnetcore#30941 |
See aspnet/DataProtection#272 for full context.
ASP.NET has its own implementation of RFC2898 because it needs top-of-the-line performance for password hashing in order to have acceptable latency for login scenarios. If we bring these performance improvements to the framework's implementation of these APIs, ASP.NET can remove its own implementation and rely on the standard classes.
API Proposal
PBKDF2 technically starts with a password (that's what the "P" is), which is generally thought of as a
string
, but the algorithm operates on the bytes of a password, causing the always fun encoding selection problem.Therefore there are two primary entrypoints, and then spanification thereof:
static byte[] PBKDF2(byte[] password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm, int outputLength)
static byte[] PBKDF2(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm, int outputLength)
(calls the byte-based version after Encoding.UTF8)The text was updated successfully, but these errors were encountered: