Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create non-primitive-constant-data.md #46

Closed
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions proposed/non-primitive-constant-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Non-primitive Constant Data

## Introduction
Current implementations of languages in .NET are limited to a small set of possible constant types. It would be useful for computational scenarios to provide a mechanism to specify efficiently accessible pre-computed data in more complex data structures. This proposal describes a mechanism for supporting user data structures of more arbitrary data structures, supporting both array production, and constant data access to a single constant as well as a ReadOnlySpan of data.

## Current valid constant forms
Currently accepted are 4 forms of constant
- Integer/Float constants
```csharp
void Constants()
{
int x = 3;
double y = 4.0;
}
```

Which is represented in IL via the various ldc opcodes.
```
IL_0001: ldc.i4.s 3
IL_0003: stloc.s 0
IL_0005: ldc.r8 4.0
IL_000E: stloc.s 1
```

- String constants
```csharp
void Constants()
{
string z = "z";
}
```

Which is represented in IL using the ldstr instruction and #US metadata table.
```
IL_0001: ldstr "z"
```
- Byte data constants
```csharp
void Constants()
{
ReadOnlySpan<byte> byte_data = new ReadyOnlySpan<byte>(new byte[]{0,1,2,3});
}
```
Which is represented in IL using an RVA static field on an anonymous type and the accessed via ldsflda to get its address.
```
IL_0001: ldloca.s 0
IL_0003: ldsflda valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=3' '<PrivateImplementationDetails>'::'0C7A623FD2BBC05B06423BE359E4021D36E721AD'
IL_0008: ldc.i4.3
IL_0009: call instance void valuetype [System.Memory]System.ReadOnlySpan`1<uint8>::.ctor(void*, int32)
```


- Primitive data for a dynamically constructed array
```csharp
void Constants()
{
int[] data = new byte[]{0,1,2,4};
}
```

Which is represented in IL using an RVA static field on an anonymous type which is used in conjunction with the System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray function to dynamically create an array based on a set of data which was statically placed into the assembly.
```
IL_0001: ldc.i4.4
IL_0002: newarr [mscorlib]System.Byte
IL_0007: dup
IL_0008: ldtoken field int32 '<PrivateImplementationDetails>'::'12DADA1FFF4D4787ADE3333147202C3B443E376F'
IL_000d: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
IL_0012: stloc.0
```
## New valid constant forms

The proposal is to provide a set of CompilerServices apis implemented as intrinsics (in most cases) which will allow the use of more complex constants. This api will accept as a byref argument a reference to data in a well defined type layout, and return a byref to data in the correct platform layout.

### Definition of types which can be represented as constants
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of banned types should include mono's size-variadic types nint and nfloat as they don't fit on any of those categories.

- Must be a struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be a generic type? - like int? ?
It feels "why not", but then we must be sure that such types respect sequential layout and have predictable packing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, int? is probably a bad example, since it has private fields, but the question still remains - would generic types be supported?
A better example would be tuples like (byte, byte, byte) - can that be a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a generic type, but it only make sense if we were willing to make setting the fields of a nullable a well known detail of the frameworks. Given the history of nullable I think it would be fine, but it would probably require special casing in the runtime to have this knowledge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, it would be ok to declare the fields of ValueTuple to be a public contract as well. With regards to layout of generic structures, by allowing the layout to differ between the PE file and the actual runtime, support for a variety of capabilities pops out. I hadn't thought of generics, but they should just work.

- The struct must have sequential layout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about explicit layout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the requirement should just be that the type has a constant size (so no auto-layout and no pointer or pointer-like types).
If the user opts-for Sequential layout, they are responsible for ensuring that the layout is the same on all systems; or the runtime should throw when the constant size and the actual size are different

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the requirement should just be that the type has a constant size

That is not sufficient to make this work on bigendian platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout is sufficiently variable that it must either support explicit layout only(without overlapping fields), or sequential layout could also be supported. If sequential layout is supported at all, it doesn't make sense to put arbitrary restrictions on it, like must be consistent between platforms, as that's impossible to define.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also occurs to me that the current proposal would actually also work for auto layout. I'm not sure how I feel about that, but we could certainly make it work.

- Without pointers of any form (No object reference, IntPtr, UIntPtr, pointer, function pointer)
- All fields must be public, or the type must be a primitive type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does public matter at all here? I would assume the type must be primitive not any or qualification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring all fields be public would also block "opaque" types, such as Vector128<T>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does public matter at all here?

The field layout has to be public contract for this to work with versioning. The easiest way to guarantee it is to require all fields to be public. Consider the case where the struct is defined in one assembly and used in different assembly.

"opaque" types, such as Vector128<T>

Types like Vector128<T> do not work with this in general. The swizzler would have to special case them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field layout has to be public contract for this to work with versioning

I don't believe so. It could work with fields of any accessibility, provided the user doesn't do anything that modifies the layout of the struct later.

At the language level, it would be possible for the user to specially attribute structs that they want constant to work with (which tells the compiler, "I am opting into this behavior, and I won't change the layout later"). If they were to change it later, then it would be a breaking change and would fail at runtime (the same as any number of other breaking changes that are possible for people to make).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the language level, it would be possible for the user to specially attribute structs that they want constant to work with

I think it's going to be a requirement for these types to be marked in a special way. There are too many requirement,s most outlined in this document, that go against how struct are generated in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference assemblies and the tooling around them would need to learn how to deal with it.

It would be great if the tooling could just reuse the C# Ref Assembly feature (which already deals with fields, etc)... It would be conceivable, even for partial facades, if the reference definitions were pulled from the general S.P.Corelib reference assembly (as needed), rather than being manually recreated.

Copy link
Member Author

@davidwrighton davidwrighton Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding - Types like Vector128 which are special cased by the runtime are effectively primitives. I'll update the doc to note that Vector128 and Vector256 fall into that category. If we do end up allowing

@jaredpar Generally, non-public fields are not considered to be part of the public contract of a type. I am aware that in some circumstances, the private details of a structure affect various behaviors of the C#, but we generally want to make public contracts impacted by private details.

An alternative approach would be to describe the language compiler generated layout of structures via a set of attributes on the types. This would possibly allow us to describe the special case for VectorXXX as well as make the discussion of public vs private fields moot (at the runtime level). The idea of private fields being part of this sort of a contract makes me uncomfortable, but it works for me if one is talking about constants for types defined in the same module or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if the tooling could just reuse the C# Ref Assembly

I agree that it would be great, but last time I have checked the C# Ref Assembly feature was insufficient for what CoreFX needs.

rather than being manually recreated

The manually maintaned public surface definition lets us ensure that we have same surface between platforms, and compatible surface betwen versions and different runtimes. The C# Ref Assembly feature does not have equivalent for this today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, non-public fields are not considered to be part of the public contract of a type. I am aware that in some circumstances, the private details of a structure affect various behaviors of the C#, but we generally want to make public contracts impacted by private details.

Struct private fields are very much a part of the c# contract. I think every time people have tried to get cute about them not they get broken. I don't think I'd want to require public fields here in C#. Would detract a bit from the feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be great, but last time I have checked the C# Ref Assembly feature was insufficient for what CoreFX needs.

We should follow up again on this as I'm not sure what feature is missing at this point. There shouldn't be any gap at this point, other than changing the infrastructure for calling the compiler.

- All fields must also be of a type which can be represented as a constant

### Well known type layout
Given that IL binaries may be loaded on many different platforms with different type layout rules, constant data must be represented in a consistent manner across all of them. For consistent type layout, types shall be laid out in precisely a sequential manner, with *no* packing between any fields. For instance, this struct will utilize 9 bytes when stored.

```csharp
struct NonAlignedStruct
{
byte b;
double d;
}
```

### RuntimeHelpers apis

```csharp
class RuntimeHelpers
{
// Fundamental new api capability
// Behavior if the values pointed at by inData changes over time is undefined
// This function is expected to be implemented as a compiler intrinsic with the behavior of the c# written below. The proposal
// expects that implementors will make this work for arbitrary calls, not just as a jit intrinsic, but that's possibly not completely necessary.
ReadOnlySpan<TOutput> LoadConstantData<TOutput, TInput>(ref TInput inData, int count) where TOutput:struct where TInput:struct
{
if (!VerifyThatTypeIsValidForConstantRepresentation(typeof(TOutput)))
throw new InvalidProgramException();

if (count < 0)
throw new InvalidProgramException();

if (checked(GetSizeOfConstantRepresentation(typeof(TOutput)) * count) > sizeof(TInput))
throw new InvalidProgramException();

if (HasGCPointers(typeof(TInput))
throw new InvalidProgramException();

if (!VerifyInDataPointsInsideOfSomeLoadedManagedAssembly(ref inData))
throw new InvalidProgramException();

if (IsAlignedForTOutput(ref inData, typeof(TOutput)) && WellKnownTypeLayoutMatchesPlatformLayout(typeof(TOutput)))
{
return new ReadOnlySpan<TOutput>(Unsafe.As<TOutput>(inData), count);
}
else
{
// Convert inData pointer to raw pointer, and use it as an entry in a hashtable to store converted constant data
// The converted constant data will be constructed in some fashion like...
IntPTr inDataPtr = (IntPtr)Unsafe.AsPtr(inData);
lock(hashtable)
{
TOutput [] data;
if (!hashtable.TryGetValue(inDataPtr, out data) || data.Length < count)
{
data = new TOutput[count];
// Read data from inData into count, handling the type layout transition
hashtable[inDataPtr] = data;
}
return data.AsSpan();
}
}
}

// Should be treated as an intrinsic by the compiler. Should allow more efficient encoding of single constants in an IL stream
TOutput LoadIndividualConstant<TOutput, TInput>(ref TInput inData) where TOutput:struct where TInput:struct
{
return LoadConstantData(ref inData, 1)[0];
}

// General replacement for ArrayInitialize pattern, also handles non-primitive constants.
Copy link
Member

@jkotas jkotas Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative design could be to teach the existing InitializeArray API to support more types, and not introduce new API.

Copy link
Member

@VSadov VSadov Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitializeArray requires that there is an array instance. Span-based API feels a bit more flexible.

It might be nice if InitializeArray works with generalized const-able types, but having span based APIs, it would not be strictly necessary. Compiler couild just emit ToArray or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, its straightforward to enable InitializeArray API to support more types. I generally prefer to avoid extending existing apis, but that would be reasonable to do.

TOutput[] AllocateArrayFromConstantData<TOutput, TInput>(ref TInput inData, int count)
{
TOutput[] result = new TOutput[count];
LoadConstantData(ref inData, count).CopyTo(result.AsSpan());
return result;
}
}
```

These runtime helpers will be used in a manner which closely resembles how the byte array initialization works

```
IL_0003: ldsflda valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=27' '<PrivateImplementationDetails>'::'0C7A623FD2BBC05B06423BE359E4021D36E721AD'
IL_0008: ldc.i4.3
IL_0009: call valuetype [System.Memory]System.ReadOnlySpan`1<!!0> System.Runtime.CompilerServices.RuntimeHelpers::LoadConstantData<NonAlignedStruct, valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=27'>(ref !!1, int32)
```