Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADO.NET IHashPicker customization API + Orleans v3-compatible IHashPicker implementation #9217

Merged
merged 13 commits into from
Nov 12, 2024

Conversation

vladislav-prishchepa
Copy link
Contributor

@vladislav-prishchepa vladislav-prishchepa commented Nov 5, 2024

references: #9141

In this PR:

  • added property AdoNetGrainStorageOptions.IHashPicker to support the hash picker customization via DI
  • added a new IHashPicker implementation Orleans3CompatibleStorageHashPicker for Orleans v3 -> v7+ migration scenarios
  • added method AdoNetGrainStorageOptions.UseOrleans3CompatibleHasher() for easier Orleans3CompatibleStorageHashPicker configuration
Microsoft Reviewers: Open in CodeFlow

@vladislav-prishchepa
Copy link
Contributor Author

@dotnet-policy-service agree

{
private static void Mix(ref uint a, ref uint b, ref uint c)
{
a -= b; a -= c; a ^= (c >> 13);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a, b, and c are refs, each operation must access the (stack) memory.
When the operations are solely done on locals, then registers can be used.

Cf. sharplab

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity I ran a benchmark.

| Method     | Mean      | Error     | StdDev    | Ratio | RatioSD |
|----------- |----------:|----------:|----------:|------:|--------:|
| Default    | 27.784 ns | 0.5689 ns | 0.5322 ns |  1.00 |    0.03 |
| Suggestion |  6.043 ns | 0.1590 ns | 0.1633 ns |  0.22 |    0.01 |

So roughly 4.5x faster (in the isolated benchmark).

code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Bench>();

public class Bench
{
    private uint _a = 42;
    private uint _b = 43;
    private uint _c = 44;

    [Benchmark(Baseline = true)]
    public void Default() => Mix(ref _a, ref _b, ref _c);

    [Benchmark]
    public void Suggestion() => Mix1(ref _a, ref _b, ref _c);

    private static void Mix(ref uint a, ref uint b, ref uint c)
    {
        a -= b; a -= c; a ^= (c >> 13);
        b -= c; b -= a; b ^= (a << 8);
        c -= a; c -= b; c ^= (b >> 13);
        a -= b; a -= c; a ^= (c >> 12);
        b -= c; b -= a; b ^= (a << 16);
        c -= a; c -= b; c ^= (b >> 5);
        a -= b; a -= c; a ^= (c >> 3);
        b -= c; b -= a; b ^= (a << 10);
        c -= a; c -= b; c ^= (b >> 15);
    }

    private static void Mix1(ref uint aa, ref uint bb, ref uint cc)
    {
        uint a = aa;
        uint b = bb;
        uint c = cc;

        a -= b; a -= c; a ^= (c >> 13);
        b -= c; b -= a; b ^= (a << 8);
        c -= a; c -= b; c ^= (b >> 13);
        a -= b; a -= c; a ^= (c >> 12);
        b -= c; b -= a; b ^= (a << 16);
        c -= a; c -= b; c ^= (b >> 5);
        a -= b; a -= c; a ^= (c >> 3);
        b -= c; b -= a; b ^= (a << 10);
        c -= a; c -= b; c ^= (b >> 15);

        aa = a;
        bb = b;
        cc = c;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to keep the implementation of the JenkinsHash exactly the same it was in 3.7.2, without introducing potentially breaking changes.

It's possible to improve it but only with a proper coverage by tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that calculation optimization will bring a significant profit. We need to calculate hash only 2 times just before DbCommand execution, ~50ns improvement is unnoticeable in comparison with ADO.NET overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but think in a different way:

For each operation memory needs to be accessed, so if in L1 read from there, otherwise go the cache hierarchy up to RAM if not found. So we have potential for cache trashing, which can't be shown in micro-benchmarks.

When memory access is minimized, the chance for cache trashing is minimized too.
Here the code change required is trivial, so I'd go with the suggestion -- the algorithm is untouched, just at entry and exit of the method the value are read to local / written back.

Copy link
Contributor Author

@vladislav-prishchepa vladislav-prishchepa Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, pushed the change

uint c = 0;
int i = 0;

while (i + 12 <= len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i gets incremented in the loop. thus the loop-condition nees to be re-evaluated each iteration. Change it to

Suggested change
while (i + 12 <= len)
while (i <= len - 12)

then the loop condition is only evaluated once (and can potentially be kept in a register).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

while (i + 12 <= len)
{
a += (uint)data[i++] |
((uint)data[i++] << 8) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of indexing results in a lot bound checks. Do we care about these?

Alternative is to use raw-pointers (pinning doesn't harm here).

Comment on lines 85 to 86
byte[] bytesToHash = Encoding.UTF8.GetBytes(data);
return ComputeHash(bytesToHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intermediate allocation for the bye-array can be avoided.

Suggested change
byte[] bytesToHash = Encoding.UTF8.GetBytes(data);
return ComputeHash(bytesToHash);
int maxByteCount = Encoding.UTF8.GetMaxByteCount(data.Length);
Span<byte> buffer = maxByteCount <= 256
? stackalloc byte[256]
: new byte[maxByteCount];
int written = Encoding.UTF8.GetBytes(data, buffer);
return ComputeHash(buffer.Slice(0, written));

(assuming data.Length will most likely be <= 256, otherwise the byte-array fallback could be rented from the array pool instead)

Is it worth to complicate the code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all JenkinsHash methods except ComputeHash(ReadOnlySpan<byte>) (not used)

/// <summary>
/// <see cref="IHasher.Hash(byte[])"/>.
/// </summary>
public int Hash(byte[] data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public int Hash(byte[] data)
public int Hash(ReadOnlySpan<byte> data)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be a breaking change - IHashPicker interface is a part of the public API.

We can extend IHashPicker with new method Hash(ReadOnlySpan<byte>) with default implementation or add new interface with this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to add method overload just in implementations. May be it's the best option as byte[] data is originally produced by AdoGrainKey.GetHashBytes().

/// <summary>
/// <see cref="IHasher.Hash(byte[])"/>.
/// </summary>
public int Hash(byte[] data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public int Hash(byte[] data)
public int Hash(ReadOnlySpan<byte> data)

I'd go with the ROS on all these arguments.

Comment on lines 48 to 50
var extendedData = new byte[data.Length + 8];
data.CopyTo(extendedData, 0);
return _innerHasher.Hash(extendedData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can extendedData be stack-allocated (or rented from array pool) to avoid that allocation?

Copy link
Contributor Author

@vladislav-prishchepa vladislav-prishchepa Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we extend IHashPicker or add a new interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (stackalloc + pooled memory fallback)

// reducing allocations if data is not a grain type
if (data.Length >= _grainType.Length && Encoding.UTF8.GetByteCount(_grainType) == data.Length)
{
var grainTypeBytes = Encoding.UTF8.GetBytes(_grainType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the rare case (when I read the comment correct)?

Otherwise try to avoid the allocation by using stack-space (see comment above for a snippet how to do this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (stackalloc + pooled memory fallback)

try
{
data.AsSpan().CopyTo(buffer);
Array.Clear(buffer, data.Length, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Clear needed?
For the stack-alloc path above you can safely assume that the stack-space is zeroed. E.g. when there's [SkipLocalsInit] applied.

I'd combine the two pathes, to avoid the duplicated logic (though it's very little logic here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we need to clear rented buffer to make sure all these bytes set to zero - rented buffer can contain arbitrary data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I misread the code here. The extra 8 bytes need to be zeroed.
Thanks for the hint!

As the span also needs to be cleared, these two code-pathes should be collapsed (see other comments).

{
var grainTypeBytes = buffer.AsSpan(0, grainTypeByteCount);

if (!Encoding.UTF8.TryGetBytes(_grainType, grainTypeBytes, out _))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, combine the code paths.


return grainTypeBytes.SequenceEqual(data);
}
finally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally isn't needed. In case of failure the rented array will just be dropped (on the GCed). The array-pool is tolerant to this.
W/o finally there's also room for the JIT to do more optimizations.

The pattern w/ try-finally for the array pool got used some time ago in .NET, but it isn't anymore.
(Same above)

var extendedLength = data.Length + 8;
if (extendedLength <= 256)
{
Span<byte> extended = stackalloc byte[extendedLength];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack-allocated should be to a constant of power of 2, then sliced. It's cheaper.
The extra space (extended) should also be cleared, just to be safe (see SkipLocalsInit).

try
{
data.AsSpan().CopyTo(buffer);
Array.Clear(buffer, data.Length, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I misread the code here. The extra 8 bytes need to be zeroed.
Thanks for the hint!

As the span also needs to be cleared, these two code-pathes should be collapsed (see other comments).


var buffer = ArrayPool<byte>.Shared.Rent(extendedLength);
try
var buffer = extendedLength switch
Copy link
Member

@gfoidl gfoidl Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces branches that are not necessary. Keep it simple like

int extendedLenght = data.Length + 8;
byte[]? bufferFromPool = null;

Span<byte> buffer = (extendedLenght <= 256
    ? stackalloc byte[256]
    : bufferFromPool = ArrayPool<byte>.Shared.Rent(extendedLenght)
).Slice(0, extendedLenght);

data.AsSpan().CopyTo(buffer);
buffer.Slice(data.Length).Clear();

// Hash here

if (bufferFromPool is not null)
{
    ArrayPool<byte>.Shared.Return(bufferFromPool);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but with one more if to assign bufferFromPool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the problem?

That if is cheap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in code above bufferFromPool is never assigned

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, this is from typing the code here within the comments. It's corrected now.


// assuming code below never throws, so calling ArrayPool.Return without try/finally block for JIT optimization

var buffer = rentedBuffer is not null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you write it as suggested (i.e. like use almost everywhere with .NET)?

Copy link
Contributor Author

@vladislav-prishchepa vladislav-prishchepa Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong.

When we call ArrayPool.Rent and don't call ArrayPool.Return, the buffer we used will be collected by GC. Even if GC will return the array to pool, nobody knows when the GC kicks in. So, further calls to ArrayPool.Return will allocate new arrays (we will quickly deplete the pool by frequent ArrayPool.Rent calls) which will lead to high memory traffic. That's why I return the array explicitly as soon as possible to make it recycled by further ArrayPool.Rent calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small repro

[MemoryDiagnoser]
public class PoolAllocationBenchmark
{
    [Benchmark]
    public void Test()
    {
        for (var i = 0; i < 1_000_000; i++)
        {
            UsePool();
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
    private static void UsePool()
    {
        var buffer  = ArrayPool<byte>.Shared.Rent(1024);
        Array.Clear(buffer);
    }
}
// * Summary *

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4391/23H2/2023Update/SunValley3)
12th Gen Intel Core i9-12900HX, 1 CPU, 24 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2


| Method | Mean     | Error    | StdDev   | Gen0       | Allocated |
|------- |---------:|---------:|---------:|-----------:|----------:|
| Test   | 42.28 ms | 0.596 ms | 0.557 ms | 66750.0000 | 999.45 MB |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've got what you mean (confused by outdated snipped), pushed a change

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

Sorry for the confusion with the rented array.

@@ -118,7 +118,7 @@ public class AdoNetGrainStorage: IGrainStorage, ILifecycleParticipant<ISiloLifec
/// <summary>
/// The hash generator used to hash natural keys, grain ID and grain type to a more narrow index.
/// </summary>
public IStorageHasherPicker HashPicker { get; set; } = new StorageHasherPicker(new[] { new OrleansDefaultHasher() });
public IStorageHasherPicker HashPicker { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this default need to change? It seems that the current behavior should be the default, with the option for the Orleans 3.x compatible

Copy link
Contributor Author

@vladislav-prishchepa vladislav-prishchepa Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we always initialize this property in the constructor, so this initializer is ignored.
But it can be added as a fallback in the constructor to handle cases when value from options is null.

return;

// content-aware hashing with different pickers, unable to use standard StorageHasherPicker
options.HashPicker = new Orleans3CompatibleStorageHashPicker();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be setting the existing default value, new StorageHasherPicker(new[] { new OrleansDefaultHasher() });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's a mistake, thanks for checking

@ReubenBond ReubenBond merged commit 77cb079 into dotnet:main Nov 12, 2024
16 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants