Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow memory-overlapped fields for unmanaged unions #1333

Open
6 tasks done
roboz0r opened this issue Nov 3, 2023 · 1 comment
Open
6 tasks done

Allow memory-overlapped fields for unmanaged unions #1333

roboz0r opened this issue Nov 3, 2023 · 1 comment

Comments

@roboz0r
Copy link

roboz0r commented Nov 3, 2023

Motivation

Along the same lines as Merge fields of the same type on multi-case union structs and More stack-efficient struct DUs it is desirable to represent struct unions with the minimum memory footprint i.e. sizeof<Tag> + sizeof<LargestCase> rather than the existing representation of sizeof<Tag> + sizeof<Case0> + sizeof<Case1>...

The reasoning for the existing representation is that the .NET Runtime forbids overlapping reference and value types in memory:

Offset values shall be non-negative. It is possible to overlap fields in this way, though offsets occupied by an object reference shall not overlap with offsets occupied by a built-in value type or a part of another object reference. While one object reference can completely overlap another, this is unverifiable

(C) Ecma-335 II.10.7

This proposal differs from #699 and #1311 and avoids this limitation by requiring that all fields in the union be unmanaged.

A type is an unmanaged type if it's any of the following types:

  • sbyte, byte, short, ushort, int, uint, long, ulong, nint, nuint, char, float, double, decimal, or bool
  • Any enum type
  • Any pointer type
  • Any user-defined struct type that contains fields of unmanaged types only.

Proposed Syntax

Unmanaged struct unions would use the same syntax as other discriminated unions however it would require addition of a new [<UnmanagedUnion>] attribute.

[<Struct>]
[<UnmanagedUnion>]
type A =
    | A0
    | A1 of int

Any combination of fields and field names can work provided that all fields are unmanaged and two fields in the same case do not have the same name.

Questions:

  • Should both [<Struct>] and [<UnmanagedUnion>] be required or does [<UnmanagedUnion>] imply [<Struct>]?
  • If [<UnmanagedUnion>] was not required, could the same output be generated, provided that all fields are known to be unmanaged?

Examples of Limitations

[<Struct>]
[<UnmanagedUnion>]
type B =
    | B0 of int * int // Allowed
    | B1 of (int * int) // Not allowed, tuple is a reference type
    | B2 of struct (int * int) // Allowed
    | B3 of string // Not allowed, string is a reference type
    | B4 of obj // Not allowed
    | B5 of {| A: int |} // Not allowed
    | B6 of struct {| A: int |} // Allowed
    | B7 of struct {| A: obj |} // Not allowed, a struct contains a reference type

Compiled Representation

Considering the case:

[<Struct>]
[<UnmanagedUnion>]
type A =
    | A0
    | A1 of a:int
    | A2 of b:bool * c:unativeint

It is anticipated that the generated code would be roughly equivalent to:

// Concept compiler-generated code 
open System
open System.Runtime.CompilerServices
open System.Runtime.InteropServices
# nowarn "9"

type A_Gen_Tag = 
    | A0 = 0    
    | A1 = 1
    | A2 = 2

// A0 has no fields so no need to create a struct

[<Struct>]
type A_Gen_A1 = 
    {
        a: int
    }

[<Struct>]
type A_Gen_A2 = 
    {
        b:bool
        c:unativeint
    }

// Having a separate cases struct avoids the need to compute field offsets. They can all be 0.
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type A_Gen_Cases = 
    {
        [<FieldOffset(0)>]   
        A1: A_Gen_A1
        [<FieldOffset(0)>]
        A2: A_Gen_A2
    }

[<Struct>]
[<StructLayout(LayoutKind.Sequential)>] // Is this attribute required or useful?
type A_Gen = 
    {
        Tag: A_Gen_Tag
        Cases: A_Gen_Cases
    }

    static member A0 = 
        { Tag = A_Gen_Tag.A0; Cases = Unchecked.defaultof<A_Gen_Cases> }

    static member A1 (a: int) =
        let mutable case = Unchecked.defaultof<A_Gen_Cases>
        Unsafe.AsRef<A_Gen_A1>(&case.A1) <- { a = a }
        { Tag = A_Gen_Tag.A1; Cases = case }

    static member A2 (b: bool, c: unativeint) =
        let mutable case = Unchecked.defaultof<A_Gen_Cases>
        Unsafe.AsRef<A_Gen_A2>(&case.A2) <- { b = b; c = c }
        { Tag = A_Gen_Tag.A2; Cases = case }

// The union patterns should be switched using the Tag field and present the fields equivalent to other unions
let (|A0|A1|A2|) (a: A_Gen)=
    match a.Tag with
    | A_Gen_Tag.A0 -> A0
    | A_Gen_Tag.A1 -> A1 (a.Cases.A1.a)
    | A_Gen_Tag.A2 -> A2 (a.Cases.A2.b, a.Cases.A2.c)
    | _ -> failwith "Unreachable"

// Construction and usage would be roughly equivalent to:
let a0 = A_Gen.A0

let a1 = A_Gen.A1(1)

let a2 = A_Gen.A2(true, 2un)

match a0 with
| A0 -> printfn "A0"
| _ -> failwith "not A0"

match a1 with
| A1 a -> printfn "A1: %A" a
| _ -> failwith "not A1"

match a2 with
| A2 (b, c) -> printfn "A2: %A" (b, c)
| _ -> failwith "not A2"

Support For Generics

In principle there would be nothing wrong with supporting generics, provided they all had the unmanaged constraint. However, this is not currently supported by the .NET runtime.

If the runtime restriction is lifted, it is expected that the user will still have to explicitly provide the unmanaged constraint.

// Ok
[<Struct>]
[<UnmanagedUnion>]
type UnmanagedOption<'T when 'T: unmanaged> =
| UNone
| USome of v:'T

// Error: UnmanagedUnion requires that type parameter 'T have the unmanaged constraint. i.e <'T when 'T: unmanaged>
[<Struct>]
[<UnmanagedUnion>]
type UnmanagedOption<'T> =
| UNone
| USome of v:'T

Representation of Tag

The Tag of a union has traditionally been represented by int32 however there are other logical choices:

  • byte has the potential to give the smallest memory footprint, particularly for unions without any fields, but may degrade performance due to memory alignment issues. There would be very few, if any, discriminated unions in the wild with >256 cases.
  • nativeint is larger than int32 on most machines today but has the potential to offer the best performance by using the machine's native integer size and memory alignment. It may result in the same struct size even on 64-bit machines depending on how .NET lays out the memory.
  • Others candidates int16, int64 or their unsigned equivalents don't have compelling advantages.

This could, optionally, be user-defined by providing the desired type in the UnmanagedUnion attribute:

type UnmanagedUnionWithTagAttribute(t: Type) =
    inherit Attribute()
    // If not specified we provide a default
    new () = UnmanagedUnionWithTagAttribute(typeof<int32>)

Support for Explicit layout and Packing

Normally the memory layout of structs with [<StructLayout(LayoutKind.Sequential)>] can be further aligned by specifying the Pack Field.

It is not anticipated that [<StructLayout>] could be applied to a struct attributed with [<UnmanagedUnion>].

This kind of custom layout would require Allow the union pattern to be implemented explicitly to be implemented.

Pros and Cons

The advantages of making this adjustment to F# are better performance and the ability to interoperate with native libraries that use a tagged union without additional ceremony or allocations due to active patterns.

The disadvantages of making this adjustment to F# are the limitations of the proposal may mean its usage is rare and it is one more thing to do.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): M

Related suggestions:

Affidavit (please submit!)

Please tick these items by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on StackOverflow) and I have searched StackOverflow for discussions of this issue
  • This is a language change and not purely a tooling change (e.g. compiler bug, editor support, warning/error messages, new warning, non-breaking optimisation) belonging to the compiler and tooling repository
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.

@charlesroddie
Copy link

In our codebase we have only one DU type that we would want this for, but it has a pivotal place and is used for many calculations and needs to be high-performance.

However if this feature is not implemented in the F# compiler, the alternative is to convert the DU type into a manually defined struct using the exact code that you posted (thanks for that; very useful!). A disadvantage here is that matching would be worse, but matching on the tag and using AsCase() methods doesn't seem so bad for occasional use, and active patterns would still work.

So the feature seems useful, but the workaround may be good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants