-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle) #60948
Comments
Tagging subscribers to this area: @GrabYourPitchforks, @dotnet/area-system-memory Issue DetailsBackground and motivationAs of C# 8.0 it is possible to embed constant byte data into .NET Assemblies using a syntax like
However, unlike the support for array initialization this isn't possible for other primitive data types. Instead for other data types the This proposal is designed to provide a new api which works in a manner similar to how array initialization via During a recent hackathon event, support for this was prototyped, and a simple example such as static int[] _intData = new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
[MethodImpl(MethodImplOptions.NoInlining)]
static int GetData(int offset)
{
return _intData[offset];
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int GetDataROS(int offset)
{
ReadOnlySpan<int> intSpan = (ReadOnlySpan<int>)new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
return intSpan[offset];
} was shown to be a fair bit faster. In particular, looped 1.8 billion times @jaredpar has looked at the hackathon effort and is in general agreement that the C# team would be willing to implement code generation of this. @stephentoub has also done some investigation of locations within the BCL where this sort of data would be useful, and there are a significant number of potential optimizations. Individually, these optimizations are generally fairly small, but there are quite a few which might benefit. API Proposalnamespace System.Runtime.CompilerServices
{
public static class RuntimeHelpers
{
public ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle field);
}
} Requirements: API UsageReadOnlySpan<int> intSpan = new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; Or possibly if the C# team wish to implement... ReadOnlySpan<int> intSpan = stackalloc int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; There is no expectation that developers not generating IL would ever use this api directly. Alternative DesignsIt may be simpler for the JIT to work with the following construct. However, I don't believe it is a significant improvement. namespace System.Runtime.CompilerServices
{
public static class RuntimeHelpers
{
public CreateSpan<T>(RuntimeFieldHandle field, out ReadOnlySpan<T> span);
}
} RisksSince the new api would only be available on new versions of .NET but the syntax is valid for older frameworks, customers writing code may write code expecting the optimized behavior and instead see substantially slower, allocating logic instead. Suggestions have been made that support for this may also entail building a fallback generator for this sort of thing when targeting older versions of the frameworks to avoid performance cliffs.
|
@AaronRobinsonMSFT (Who worked on the hackathon on this project) |
Nice write-up. #24961 (comment) was already approved... is this different? |
Nit: The approved API has |
@echesakovMSFT PTAL.
@jkotas if it is the exactly same API with a different argument name, should we close this one and merge this thread with #24961? |
Let's keep this one open since it follows the API proposal template and close #24961. |
There is a difference between the two proposed use variants. I think Roslyn should implement optimization for both. Re: API UsageReadOnlySpan<int> intSpan = new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; Or possibly if the C# team wish to implement... ReadOnlySpan<int> intSpan = stackalloc int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; When optimized, these will result in the same IL, but language treats them differently. Span local will assume escape scope from its initializer and {
ReadOnlySpan<int> intSpan = new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
// this is ok
return intSpan;
} However there is a scenario where user would want the more restricted Ex: ReadOnlySpan<int> intSpan = stackalloc int[]{1, 2, 3};
if (condition)
{
// if intSpan is globally escapable, this would be a compile error
intSpan = stackalloc int[] {x, y, z,};
}
|
@VSadov I agree that the various C# lowerings/optimizations are interesting to discuss, but I don't believe that this is the right forum. I believe the discussion about the precise lowering, and when it should happen, etc should be done in the Roslyn or C# language repos. |
Right, When corresponding Roslyn workitem is created, this should go there. I do not think we have one yet. |
How about implementing ReadOnlySpan<int> mySpan = new int[] {475, 579}; would be translated to something like this: ReadOnlySpan<int> mySpan =
new ReadOnlySpan<int>(BitConverter.IsLittleEndian ? &mySpan_littleEndian : &mySpan_bigEndian, 2); The advantages are performance (the |
Why this fixed set of types vs. just all types that are
Nit: all of these need to include
I wanted to push back on this a bit by using the following code as an example: ReadOnlySpan<byte> span = new byte { 0x1, 0x2, 0x3 }; This code is not universally optimized, it only optimizes on certain versions of the compiler. Yet we never have gotten any push back about that. Instead the general feeling is if you want the faster optimization then upgrade to the faster toolset + runtime. |
We haven't gotten pushback I expect because you don't need a newer set of reference assemblies to use this, all you need is the newer compiler. We're able to rely on this without question anywhere in dotnet/runtime because we know we're using a compiler that provides the optimization, even if we're building for netstandard2.0. Remove that guarantee, and we now need to think about every place we make this change, as the file we're modifying might be built into an assembly targeting an older set of reference assemblies. |
The complexity for that goes through the roof. It was explored in dotnet/designs#46. |
It's not clear to me why this is the case. For a little-endian machine this should be a direct read of the data, iterated per field. For a big endian machine, we need to:
This is a very simple thing, algorithmically speaking. It does have some up-front cost for big-endian machines but RyuJIT doesn't actually support that today and for Mono its a rare target. =========== The IL spec already covers everything required here in The IL grammar itself allows for structured data declarations:
and so something such as the following is legal:
It likewise covers the 3 types of data initialization, all of which are supported on all target platforms either via PE or the ELF files for AOT scenarios or via the runtime itself for JIT scenarios:
The first case is the primitive case where everything is well-known ahead of time. |
You need to do it for little endian machines too to deal with platform specific alignment rules. |
In some cases involving badly packed layouts, yes. There are just as many scenarios, particularly involving our own simple value types that are all laid out in a way that's the same across all platforms.
I'm still confident this is something that could be handled by our tooling and the benefits it could provide could be substantial, particularly in interop, graphics, and HPC related scenarios (the same goes for I'm already working around this in a few of my interop libraries. I have nearly 7000 "static readonly" variables that have to be initialized in one library. These were almost all Changing it all to do this, brought the startup cost to 0 (big endian isn't supported here, but my generator inserts the right stuff if I tell it to): [NativeTypeName("const GUID")]
public static ref readonly Guid IID_ID3D12Device
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get
{
ReadOnlySpan<byte> data = new byte[] {
0xF1, 0x19, 0x98, 0x18,
0xB6, 0x1D,
0x57, 0x4B,
0xBE,
0x54,
0x18,
0x21,
0x33,
0x9B,
0x85,
0xF7
};
Debug.Assert(data.Length == Unsafe.SizeOf<Guid>());
return ref Unsafe.As<byte, Guid>(ref MemoryMarshal.GetReference(data));
}
} |
I agree that the support for arbitrary types is implementable. It just makes the feature multiple times more complex and expensive to implement. The proposal for primitive types is compatible with eventual extension for arbitrary types if we choose to do it. I think it makes sense to start with primitive types only. The Guid example from your library does not look like a good motivating scenario for support of arbitrary types. The main problem is that you have all thousands of Guids in one type. Nothing good performance-wise comes from types with thousands of fields or methods. The cost for loading types like that is far from zero. This hack above helps you eliminate the JIT cost of the huge static constructor, but you are still going to pay for the thousands fields or methods even if you are using just a few of them. It think a better factoring would be to have the IID as part of the containing type or something like that. For example, how it is done in CsWinRT: https://github.com/microsoft/CsWinRT/blob/72b1236979de2d444416207c5d14bc513af52592/src/WinRT.Runtime/Interop/IWeakReferenceSource.net5.cs#L80 . Once you do that, the startup overhead of naturally initialized Guid constants becomes much more pay-for-play. For scenarios that care a lot about startup performance, a good AOT compiler should be able to run the initialization for naturally initialized Guid at compile time, without any additional measures. |
I think that makes sense, but at the same time if we are going to do something here it'd be nice to finish it all out in the same release. Getting partway there for .NET 7 and then having to wait an entire separate release (or more) to get the rest can be painful.
There are multiple reasons for this, including (but not limited to) backwards/forwards compatibility and ease of use/porting native code to C#. I did, a long time ago, actually have many separate classes, assemblies, and "clearer" separation of types (such as Putting all the "globally visible" C/C++ members into a single type has helped solve many more problems than it has caused so far, especially given my assemblies are fully trimmable (and so shrink from 12MB to about 50-200kb for most usages). Namely, it means that my libraries are "as close to The largest downside was, by far, the cost of the static initializers; which was mitigated by moving to a mix of properties and "unmanaged constants". There is still some cost for the VM to load and process the main class, but it was basically immeasurable in comparison (its also no worse than what C++/CLI does for importing metadata; which while not the perfect example, is an apt-comparison of something you can already hit for larger interop libraries). |
I'm glad to see a progress for primitive type constants. Is there something similar for more complex types? So, PowerShell builds some large Dictionary-based caches on startup. It would be great if they could be built beforehand at the crossgen stage, for example. |
@iSazonov Can you link an example of the large Dictionary cache in Powershell? (To me, it sounds like a job for source generator and specialized hashtable structure optimized for persistence.) |
@jkotas There are two examples.
With .Net 6.0 I see great improvement of PowerShell startup scenario (~20% on my note. Thanks .Net team for great work!) (Nevertheless, it is still a long way from Windows PowerShell - 400 ms vs 270 ms on my note.) |
* FieldRVA alignment In support of dotnet/runtime#60948 the linker (an assembly rewriter) will need to be able to preserve the alignment of RVA based fields which are to be used to create the data for `CreateSpan<T>` records This is implemented by adding a concept that RVA fields detect their required alignment by examining the PackingSize of the type of the field (if the field type is defined locally in the module) * Update Mono.Cecil.Metadata/Buffers.cs Co-authored-by: Aaron Robinson <[email protected]> * Enhace logic used to ensure type providing PackingSize is local to the module. Co-authored-by: Aaron Robinson <[email protected]>
In support of CreateSpan (#60948), improve alignment for RVA static pre-initialized fields to align memory blocks which may contain long, ulong, or double primitive arrays on 8 byte boundaries. Mono fix is more involved - Fix Ref-Emit generated RVA statics - Set the HasFieldRVA bit. NOTE: earlier code that attempts to set the bit doesn't actually do that as the FieldRVA bit is in FieldAttributes.ReservedMask which is masked away in the FieldBuilder constructor - Fix the Swizzle lookup. We should not be using the 8 byte swizzle in the final else clause. - Enhance ref-emitted field to allow for use with CreateSpan - Ref-emitted fields should specify a pack size appropriate for minimum alignment of the CreateSpan targetted data - Respect the packing_size specified so that RVA static fields generated for CreateSpan can be properly aligned Fixes #62314
@davidwrighton, is there work remaining on this issue, or is the API functional enough now for Roslyn to target? |
Un-assigning myself |
There was 1 open question from @MichalStrehovsky as to whether or not we should mandate the explicit higher alignment via the .pack directive in CreateSpan, or rely on the compilers to do this without additional validation, but its ready for Roslyn to target. |
Excellent. |
Why is it called |
Consistency with |
If there are no expected managed callers anyway then I'd fix the names on both methods. |
There's practically zero benefit. |
Just as status: |
dotnet/roslyn#61414 was merged, so I'm going to consider this done. I have a separate commit from dotnet/runtime to switch over a bunch of arrays to spans once we ingest an updated compiler that has the addition. Thanks, all. |
dotnet/runtime is now consuming the Roslyn build that has support for the previously added CreateSpan method, and code in dotnet/runtime is now taking advantage of it, so this work is all done. Thanks to all who contributed. |
Background and motivation
As of C# 8.0 it is possible to embed constant byte data into .NET Assemblies using a syntax like
However, unlike the support for array initialization this isn't possible for other primitive data types. Instead for other data types the
ReadOnlySpan
is initialized by actually allocating an array. The reason for this lack of support is that there is no current way to express that the constant data is in little endian format, and needs to be translated to the runtime endian format, if the application is run on hardware which utilizes big endian numbers.This proposal is designed to provide a new api which works in a manner similar to how array initialization via
RuntimeHelpers.InitializeArray
works. This will allow constant data to be cleanly expressed on all architectures.During a recent hackathon event, support for this was prototyped, and a simple example such as
was shown to be a fair bit faster.
In particular, looped 1.8 billion times
GetData
took 9783ms and 'GetDataROS' took 8468ms to execute. For an improvement of somewhere around 13.5%.@jaredpar has looked at the hackathon effort and is in general agreement that the C# team would be willing to implement code generation of this.
@stephentoub has also done some investigation of locations within the BCL where this sort of data would be useful, and there are a significant number of potential optimizations. Individually, these optimizations are generally fairly small, but there are quite a few which might benefit.
API Proposal
Requirements:
T
must be a primitive constant sized type (byte, sbyte, char, short, ushort, int, uint, long, ulong, float, double)field
must reference a FieldRVAThe RVA associated with
field
must be aligned on the size of the primitive typeT
.API Usage
Or possibly if the C# team wish to implement...
There is no expectation that developers not generating IL would ever use this api directly.
Alternative Designs
It may be simpler for the JIT to work with the following construct. However, I don't believe it is a significant improvement.
Risks
Since the new api would only be available on new versions of .NET but the syntax is valid for older frameworks, customers writing code may write code expecting the optimized behavior and instead see substantially slower, allocating logic instead. Suggestions have been made that support for this may also entail building a fallback generator for this sort of thing when targeting older versions of the frameworks to avoid performance cliffs.
The text was updated successfully, but these errors were encountered: