-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADO.NET IHashPicker
customization API + Orleans v3-compatible IHashPicker
implementation
#9217
Conversation
…-only grain id serialization breaking changes handling required
@dotnet-policy-service agree |
{ | ||
private static void Mix(ref uint a, ref uint b, ref uint c) | ||
{ | ||
a -= b; a -= c; a ^= (c >> 13); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a
, b
, and c
are ref
s, each operation must access the (stack) memory.
When the operations are solely done on locals, then registers can be used.
Cf. sharplab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity I ran a benchmark.
| Method | Mean | Error | StdDev | Ratio | RatioSD |
|----------- |----------:|----------:|----------:|------:|--------:|
| Default | 27.784 ns | 0.5689 ns | 0.5322 ns | 1.00 | 0.03 |
| Suggestion | 6.043 ns | 0.1590 ns | 0.1633 ns | 0.22 | 0.01 |
So roughly 4.5x faster (in the isolated benchmark).
code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Bench>();
public class Bench
{
private uint _a = 42;
private uint _b = 43;
private uint _c = 44;
[Benchmark(Baseline = true)]
public void Default() => Mix(ref _a, ref _b, ref _c);
[Benchmark]
public void Suggestion() => Mix1(ref _a, ref _b, ref _c);
private static void Mix(ref uint a, ref uint b, ref uint c)
{
a -= b; a -= c; a ^= (c >> 13);
b -= c; b -= a; b ^= (a << 8);
c -= a; c -= b; c ^= (b >> 13);
a -= b; a -= c; a ^= (c >> 12);
b -= c; b -= a; b ^= (a << 16);
c -= a; c -= b; c ^= (b >> 5);
a -= b; a -= c; a ^= (c >> 3);
b -= c; b -= a; b ^= (a << 10);
c -= a; c -= b; c ^= (b >> 15);
}
private static void Mix1(ref uint aa, ref uint bb, ref uint cc)
{
uint a = aa;
uint b = bb;
uint c = cc;
a -= b; a -= c; a ^= (c >> 13);
b -= c; b -= a; b ^= (a << 8);
c -= a; c -= b; c ^= (b >> 13);
a -= b; a -= c; a ^= (c >> 12);
b -= c; b -= a; b ^= (a << 16);
c -= a; c -= b; c ^= (b >> 5);
a -= b; a -= c; a ^= (c >> 3);
b -= c; b -= a; b ^= (a << 10);
c -= a; c -= b; c ^= (b >> 15);
aa = a;
bb = b;
cc = c;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to keep the implementation of the JenkinsHash
exactly the same it was in 3.7.2, without introducing potentially breaking changes.
It's possible to improve it but only with a proper coverage by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that calculation optimization will bring a significant profit. We need to calculate hash only 2 times just before DbCommand execution, ~50ns improvement is unnoticeable in comparison with ADO.NET overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but think in a different way:
For each operation memory needs to be accessed, so if in L1 read from there, otherwise go the cache hierarchy up to RAM if not found. So we have potential for cache trashing, which can't be shown in micro-benchmarks.
When memory access is minimized, the chance for cache trashing is minimized too.
Here the code change required is trivial, so I'd go with the suggestion -- the algorithm is untouched, just at entry and exit of the method the value are read to local / written back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, pushed the change
uint c = 0; | ||
int i = 0; | ||
|
||
while (i + 12 <= len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
gets incremented in the loop. thus the loop-condition nees to be re-evaluated each iteration. Change it to
while (i + 12 <= len) | |
while (i <= len - 12) |
then the loop condition is only evaluated once (and can potentially be kept in a register).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
while (i + 12 <= len) | ||
{ | ||
a += (uint)data[i++] | | ||
((uint)data[i++] << 8) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of indexing results in a lot bound checks. Do we care about these?
Alternative is to use raw-pointers (pinning doesn't harm here).
byte[] bytesToHash = Encoding.UTF8.GetBytes(data); | ||
return ComputeHash(bytesToHash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intermediate allocation for the bye-array can be avoided.
byte[] bytesToHash = Encoding.UTF8.GetBytes(data); | |
return ComputeHash(bytesToHash); | |
int maxByteCount = Encoding.UTF8.GetMaxByteCount(data.Length); | |
Span<byte> buffer = maxByteCount <= 256 | |
? stackalloc byte[256] | |
: new byte[maxByteCount]; | |
int written = Encoding.UTF8.GetBytes(data, buffer); | |
return ComputeHash(buffer.Slice(0, written)); |
(assuming data.Length
will most likely be <= 256, otherwise the byte-array fallback could be rented from the array pool instead)
Is it worth to complicate the code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed all JenkinsHash
methods except ComputeHash(ReadOnlySpan<byte>)
(not used)
/// <summary> | ||
/// <see cref="IHasher.Hash(byte[])"/>. | ||
/// </summary> | ||
public int Hash(byte[] data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public int Hash(byte[] data) | |
public int Hash(ReadOnlySpan<byte> data) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be a breaking change - IHashPicker
interface is a part of the public API.
We can extend IHashPicker
with new method Hash(ReadOnlySpan<byte>)
with default implementation or add new interface with this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to add method overload just in implementations. May be it's the best option as byte[] data
is originally produced by AdoGrainKey.GetHashBytes()
.
/// <summary> | ||
/// <see cref="IHasher.Hash(byte[])"/>. | ||
/// </summary> | ||
public int Hash(byte[] data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public int Hash(byte[] data) | |
public int Hash(ReadOnlySpan<byte> data) |
I'd go with the ROS on all these arguments.
var extendedData = new byte[data.Length + 8]; | ||
data.CopyTo(extendedData, 0); | ||
return _innerHasher.Hash(extendedData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can extendedData
be stack-allocated (or rented from array pool) to avoid that allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we extend IHashPicker
or add a new interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (stackalloc + pooled memory fallback)
// reducing allocations if data is not a grain type | ||
if (data.Length >= _grainType.Length && Encoding.UTF8.GetByteCount(_grainType) == data.Length) | ||
{ | ||
var grainTypeBytes = Encoding.UTF8.GetBytes(_grainType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the rare case (when I read the comment correct)?
Otherwise try to avoid the allocation by using stack-space (see comment above for a snippet how to do this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (stackalloc + pooled memory fallback)
…nused methods removed
try | ||
{ | ||
data.AsSpan().CopyTo(buffer); | ||
Array.Clear(buffer, data.Length, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Clear
needed?
For the stack-alloc path above you can safely assume that the stack-space is zeroed. E.g. when there's [SkipLocalsInit]
applied.
I'd combine the two pathes, to avoid the duplicated logic (though it's very little logic here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we need to clear rented buffer to make sure all these bytes set to zero - rented buffer can contain arbitrary data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I misread the code here. The extra 8 bytes need to be zeroed.
Thanks for the hint!
As the span also needs to be cleared, these two code-pathes should be collapsed (see other comments).
{ | ||
var grainTypeBytes = buffer.AsSpan(0, grainTypeByteCount); | ||
|
||
if (!Encoding.UTF8.TryGetBytes(_grainType, grainTypeBytes, out _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, combine the code paths.
|
||
return grainTypeBytes.SequenceEqual(data); | ||
} | ||
finally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finally
isn't needed. In case of failure the rented array will just be dropped (on the GCed). The array-pool is tolerant to this.
W/o finally
there's also room for the JIT to do more optimizations.
The pattern w/ try-finally
for the array pool got used some time ago in .NET, but it isn't anymore.
(Same above)
var extendedLength = data.Length + 8; | ||
if (extendedLength <= 256) | ||
{ | ||
Span<byte> extended = stackalloc byte[extendedLength]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack-allocated should be to a constant of power of 2, then sliced. It's cheaper.
The extra space (extended) should also be cleared, just to be safe (see SkipLocalsInit).
try | ||
{ | ||
data.AsSpan().CopyTo(buffer); | ||
Array.Clear(buffer, data.Length, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I misread the code here. The extra 8 bytes need to be zeroed.
Thanks for the hint!
As the span also needs to be cleared, these two code-pathes should be collapsed (see other comments).
|
||
var buffer = ArrayPool<byte>.Shared.Rent(extendedLength); | ||
try | ||
var buffer = extendedLength switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces branches that are not necessary. Keep it simple like
int extendedLenght = data.Length + 8;
byte[]? bufferFromPool = null;
Span<byte> buffer = (extendedLenght <= 256
? stackalloc byte[256]
: bufferFromPool = ArrayPool<byte>.Shared.Rent(extendedLenght)
).Slice(0, extendedLenght);
data.AsSpan().CopyTo(buffer);
buffer.Slice(data.Length).Clear();
// Hash here
if (bufferFromPool is not null)
{
ArrayPool<byte>.Shared.Return(bufferFromPool);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but with one more if
to assign bufferFromPool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the problem?
That if
is cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in code above bufferFromPool
is never assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, this is from typing the code here within the comments. It's corrected now.
|
||
// assuming code below never throws, so calling ArrayPool.Return without try/finally block for JIT optimization | ||
|
||
var buffer = rentedBuffer is not null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you write it as suggested (i.e. like use almost everywhere with .NET)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong.
When we call ArrayPool.Rent
and don't call ArrayPool.Return
, the buffer we used will be collected by GC. Even if GC will return the array to pool, nobody knows when the GC kicks in. So, further calls to ArrayPool.Return
will allocate new arrays (we will quickly deplete the pool by frequent ArrayPool.Rent
calls) which will lead to high memory traffic. That's why I return the array explicitly as soon as possible to make it recycled by further ArrayPool.Rent
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small repro
[MemoryDiagnoser]
public class PoolAllocationBenchmark
{
[Benchmark]
public void Test()
{
for (var i = 0; i < 1_000_000; i++)
{
UsePool();
}
}
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
private static void UsePool()
{
var buffer = ArrayPool<byte>.Shared.Rent(1024);
Array.Clear(buffer);
}
}
// * Summary *
BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4391/23H2/2023Update/SunValley3)
12th Gen Intel Core i9-12900HX, 1 CPU, 24 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
[Host] : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|------- |---------:|---------:|---------:|-----------:|----------:|
| Test | 42.28 ms | 0.596 ms | 0.557 ms | 66750.0000 | 999.45 MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've got what you mean (confused by outdated snipped), pushed a change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
Sorry for the confusion with the rented array.
@@ -118,7 +118,7 @@ public class AdoNetGrainStorage: IGrainStorage, ILifecycleParticipant<ISiloLifec | |||
/// <summary> | |||
/// The hash generator used to hash natural keys, grain ID and grain type to a more narrow index. | |||
/// </summary> | |||
public IStorageHasherPicker HashPicker { get; set; } = new StorageHasherPicker(new[] { new OrleansDefaultHasher() }); | |||
public IStorageHasherPicker HashPicker { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this default need to change? It seems that the current behavior should be the default, with the option for the Orleans 3.x compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we always initialize this property in the constructor, so this initializer is ignored.
But it can be added as a fallback in the constructor to handle cases when value from options is null
.
return; | ||
|
||
// content-aware hashing with different pickers, unable to use standard StorageHasherPicker | ||
options.HashPicker = new Orleans3CompatibleStorageHashPicker(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be setting the existing default value, new StorageHasherPicker(new[] { new OrleansDefaultHasher() });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's a mistake, thanks for checking
references: #9141
In this PR:
AdoNetGrainStorageOptions.IHashPicker
to support the hash picker customization via DIIHashPicker
implementationOrleans3CompatibleStorageHashPicker
for Orleans v3 -> v7+ migration scenariosAdoNetGrainStorageOptions.UseOrleans3CompatibleHasher()
for easierOrleans3CompatibleStorageHashPicker
configurationMicrosoft Reviewers: Open in CodeFlow