Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Champion "Discriminated Unions" #113

Open
5 tasks
gafter opened this issue Feb 15, 2017 · 889 comments
Open
5 tasks

Champion "Discriminated Unions" #113

gafter opened this issue Feb 15, 2017 · 889 comments

Comments

@gafter
Copy link
Member

gafter commented Feb 15, 2017

  • Proposal added
  • Discussed in LDM
  • Decision in LDM
  • Finalized (done, rejected, inactive)
  • Spec'ed

See

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-08-31.md#discriminated-unions
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions
https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-07-24.md#discriminated-unions

@gafter gafter self-assigned this Feb 15, 2017
@gafter
Copy link
Member Author

gafter commented Feb 15, 2017

I wouldn't expect progress on this feature to precede records.

@gafter gafter changed the title Discriminated Unions Champion Discriminated Unions Feb 15, 2017
@DavidArno
Copy link

@gafter,

I've started writing a draft proposal around #75. Would you like me to continue with this, or did you have a proposal planned yourself?

@gafter
Copy link
Member Author

gafter commented Feb 20, 2017

@DavidArno I do not expect to invest any significant effort on this until we make progress on records.

@DavidArno
Copy link

@gafter,

OK, that's good to know. I'll carry on with my proposal then, but will take time over it as there's no rush.

@gafter
Copy link
Member Author

gafter commented Feb 21, 2017

/cc @agocke

@agocke
Copy link
Member

agocke commented Feb 21, 2017

I will be moving my proposal of "closed" type hierarchies from Roslyn to this repo shortly. I also think we should explore "real" discriminated unions and have some thoughts on that, but it's still much more of an exploratory phase for me.

However, I think I'm close to a reasonably complete proposal for closed and an analyzer-based prototype. I'll happily champion this feature, as well.

@gafter gafter changed the title Champion Discriminated Unions Champion "Discriminated Unions" Feb 21, 2017
@gafter gafter modified the milestone: X.0 candidate Mar 17, 2017
@Richiban
Copy link

One question that I think we need to ask (and I haven't seen anyone ask elsewhere) is whether the case classes can be used as types.

Allow me to illustrate with the example of the Option type:

public enum class Option<T>
{
	None,
	Some(T Value)
}

Obviously Option<T> is a type (an abstract base class), but are None and Some types? Since, in an OO language like C#, ADTs are probably actually implemented as type hierarchies one might be tempted to answer "yes", but I'm not sure it makes much sense.

public void MyMethod(Some<string> someString) // Is this allowed? It doesn't make much sense
{
	// ...
}

I think of ADTs as functioning more like enums, however they're actually implemented. So using each case as a type doesn't make sense, any more than this makes sense:

public enum Colours
{
	Red, Green, Blue
}

public void MyMethod(Blue colour)
{
	// ...
}

@alrz
Copy link
Member

alrz commented Apr 5, 2018

whether the case classes can be used as types.

I think it shouldn't be the case with enum class but with another "expanded" syntax which could represent more complex type hierarchies like an AST

class SyntaxNode {
  case class Statement { } // implicitly inherit from SyntaxNode, as in, a "case" of the said type
  case class Expression {
     case enum class Literal { Numeric, String }
  }
}

@mcintyre321
Copy link

mcintyre321 commented May 15, 2018

I think that feature can be added fairly simply using a custom type, in the same way Tuple<T0, ..., TN> was added.

I maintain a library, OneOf, which adds a OneOf<T0, ..., TN> type which has .Match<TOut>(Func<T0, ..., TOut>, ..., Func<T0, ..., TOut> methods. By using implicit operators to create the OneOf instances from values, the syntax is very terse and comprehensible. Also, the allocations are low because it's a struct and doesn't create intermediate 'builder' objects, unlike some other solutions.

The OneOf<T0, .., TN> type also provides .Switch and .TryGetTX(out TX value, out TRemainder remainer) methods.

Example of using a OneOf as a return value:

public OneOf<User, InvalidName, NameTaken> CreateUser(string username)
{
    if (!IsValid(username)) return new InvalidName();
    var user = _repo.FindByUsername(username);
    if(user != null) return new NameTaken();
    var user = new User(username);
    _repo.Save(user);
    return user;
}

example of Matching

    OneOf<string, ColorName, Color> backgroundColor = "Red"; //uses implicit casts to reduce overhead
    Color c = backgroundColor.Match(
        str => CssHelper.GetColorFromString(str),
        name => new Color(name),
        col => col
   );
    

As new types are added to the OneOf definition, compiler errors are generated wherever the union is Matched or Switched, as the methods are required to have the correct number of lambda parameters.

This can be included in the BCL without language changes, although I'm sure some syntactical sugar could be sprinkled.


this proposal was originally made at dotnet/roslyn#14208 and at #1524 . Sorry!

@Richiban
Copy link

Richiban commented May 15, 2018

@mcintyre321 Your OneOf type can be described as equivalent to the Either type or Choice type such as that found in F#. However, the Either type is not an alternative to discriminated unions, in fact it is built on top of discriminated unions:

type Choice<'a, 'b> = Choice1Of2 of 'a | Choice2Of2 of 'b

@HaloFour
Copy link
Contributor

@mcintyre321

While your library does accomplish providing a single discriminated union it also demonstrates the degree of boilerplate required to do so which is what this proposal seeks to reduce. Your types also don't work with C#'s recursive pattern matching which will make it much more efficient and much more capable to match over such a type:

var backgroundColor = ...;

// no delegate invocation required
Color c = backgroundColor switch {
    string str => CssHelper.GetColorFromString(str),
    ColorName name => new Color(name),
    Color col => col
};

@mcintyre321
Copy link

@Richiban OneOf<T0, ..., TN> has up up to 33 parameters, so is more useful as a general return object than Either or Choice.

@HaloFour I agree it would be good to have switch and match support built in to the language,. butI would have thought that the delegate calls will be JITted. I'm not sure what the boilerplate you refer to is.

@HaloFour
Copy link
Contributor

@mcintyre321

I'm not sure what the boilerplate you refer to is.

public enum class OneOf<T1, T2>
{
	First(T1 value),
	Second(T2 value)
}

vs. this*

* Yes, I know that you have all of the arities crammed into one, but the file is too large to link to a specific line.

@Richiban
Copy link

Richiban commented May 15, 2018

@mcintyre321 I don't doubt it's usefulness (or the fact that it's better than Either at those situations).

My point was that discriminated unions are a much more general tool that can also solve the problem that Either solves.

I'm not sure how you would propose to implement the equivalent of this using an Either type?

type FavouriteColour =
    | Red
    | Green
    | Blue
    | Other of string

@mcintyre321
Copy link

mcintyre321 commented May 15, 2018

@Richiban The abilities to naming a DU is useful, but you still get a lot of value with anonymous DUs. Is it a show-stopper to not have named unions (initially at least)?

That said, there are some things that can be done.

A OneOf, as implemented, is a struct, but I think that in F# (by default) they are classes. So you could make OneOf a class too*, and have class FavouriteColour : OneOf<Red, Green, Blue, string> { }. One problem with this is that implicit operators aren't inherited, although I think maybe I saw a proposal suggesting that was coming.

Another alternative for naming is to use a Record type e.g. class FavouriteColour(OneOf<Red, Green, Blue, string> Value).

And you can always use an alias: using FavouriteColour = OneOf<Red, Green, Blue, string>; if it's just the code bloat that's a problem (rather than the lack of a named Type ).

I appreciate none of this is quite as nice as the F# approach, but perhaps the language could be extended to fix some of this. E.g. defining a union class FavouriteColour : OneOf<Red | Green | Blue | string> { } could cause the compiler to output the required implicit operators.

TBH I'm happy with any solution where

  • you can do ad-hoc definition of DUs in method signatures and member declarations
  • concrete types from any library can be used
  • exhaustive matching with compile errors after a change in parameters (obv!)

@HaloFour cramming it into one file is along the lines of https://referencesource.microsoft.com/#mscorlib/system/tuple.cs , although I admit OneOf.cs has become somewhat larger!

*There's a class-based OneOfBase in the library, but the name isn't great IMO.

@HaloFour
Copy link
Contributor

@mcintyre321

Anonymous union type support would be #399 which is quite a bit different from DUs.

@Richiban
Copy link

@mcintyre321

The abilities to naming a DU is useful, but you still get a lot of value with anonymous DUs

The naming is what makes it a discriminated union. Without the names it's just a (type) union (also very useful, in my opinion).

I don't know about the C# language team, but I'm absolutely desperate for a decent discriminated union type in C#. The number of times I'm modelling a domain and I want to say "An instance of this type looks either like this or like that." C# has nothing of the kind and it's really difficult to work around (although some of the new pattern matching features from build might take away some of the pain).

@agocke
Copy link
Member

agocke commented Jul 31, 2024

I'm not saying the type of A us unit. The type of data represented by
cases B, C, and D are unit. Instead, we now have to declare a type for
each one.

Correct, because they are different types. Again if they are types, they must be declared as such.

@jam40jeff
Copy link

I'm not saying the type of A us unit. The type of data represented by
cases B, C, and D are unit. Instead, we now have to declare a type for
each one.

Correct, because they are different types. Again if they are types, they must be declared as such.

Which is exactly why I'm saying it's restrictive to force the tags to be types.

@agocke
Copy link
Member

agocke commented Jul 31, 2024

Sure -- that's probably an open question.

If we create a new concept called "union variant", we could simply make the cases all union variants. Then we could special case the use of variants in pattern-matching.

Obviously, this would preclude using the cases as actual types, meaning that you couldn't have variables of type B. There are a number of cases where this is somewhat useful today. For example, you could imagine Roslyn's syntax tree as a union. There are a number of methods that take BinaryOperatorExpression as a parameter type. If we used unions they would only be able to take the top-level SyntaxNode as a parameter type, weakening the type safety of the model. Of course we could define nested unions, but that has its own ergonomic problems.

Overall, I'm not convinced either way. I think the case for keeping variants is clearer for struct unions, as the subtyping relationship there feels a bit forced.

@TehPers
Copy link

TehPers commented Aug 1, 2024

Would this proposal allow for empty unions? For example, suppose this definition in a theoretical syntax:

public union A {}

Would this be possible, and would it represent an uninhabited type? Would this be useful (including in source generators)?

@tats-u
Copy link

tats-u commented Aug 3, 2024

Can we combine static data shared by all instances of a specific subtype, which Java's enum can contain, and independent data among instances?

if (status is Found { RedirectTo: var redirectTo })
{
    return Retry(redirectTo);
}
if (!status.Successful)
{
    Console.Error.WriteLine($"Error {status.Code} {status.Message}");
    return false;
}

enum class HttpStatus(int Code, bool Successful, bool IsRedirect, bool DueToUser, string Message)
{
    Ok[200, true, false, false, "OK"],
    Found[302, false, true, false, "Found"](string RedirectTo),
    Unauthorized[401, false, false, true, "Unauthorized"](string[] WwwAuthorize),
    NotFound[404, false, false, true, "Not Found"],
    InternalServerError[500, false, false, false, "Internal Server Error"],
}

@nuiva
Copy link

nuiva commented Aug 3, 2024

Would this proposal allow for empty unions? For example, suppose this definition in a theoretical syntax:

public union A {}

Would this be possible, and would it represent an uninhabited type? Would this be useful (including in source generators)?

As I understand the proposal, this would create a [Closed] abstract record A {}, that is, a record class which you can't construct or inherit.

In a nullable-oblivious context, A is inhabited because you can write A a = null;. Even in a nullable aware context, it's technically (if you consider inhabitation to be about expressions rather than values) inhabited because you can write A f() => throw ...; and then the expression f() has type A, even though A has no possible runtime values.

Unlike the bottom type, A is not a subtype of all types, so I can't think of any use for it.

@csharper2010
Copy link

It's going to be erasure for ad hoc unions and that seems to reflect that other approaches would need non-trivial runtime changes. So they'll be based on compile-time and runtime checks.

I'm fine with that as I'm desperately waiting for real discriminated unions and don't see much value in ad hoc type unions.

@JoaoVictorVP
Copy link

I just thought about something, how exactly are the union structs be optimized with generics? Now that I think about it, the question about the memory layout being unknown is the exactly same for these, so they would be unoptimized? @CyrusNajmabadi
If so, then the only point that makes you not using structs for ad hoc unions is the order independence of the declaration?

@snarbies
Copy link

snarbies commented Aug 6, 2024

Union structs are a completely separate beast from ad hoc unions. They entail a separate set of concerns and one really does not inform about the other.

Each option in a union struct is a distinct struct. Best info I can find on how they interact with generics would be what's covered under the "Implementation" portion of the union struct section in the proposal. Honestly, it looks like union structs just won't play terribly well with generics and will probably require boxing.

Not using structs for ad hoc unions is simply a matter of you can't know what's going to end up inside there in advance, due to the various indirection (interfaces, delegates) and type features (generics, variance) in the language. There is no way to orchestrate a layout that is guaranteed to make the runtime happy.

@JoaoVictorVP
Copy link

Union structs are a completely separate beast from ad hoc unions. They entail a separate set of concerns and one really does not inform about the other.

Each option in a union struct is a distinct struct. Best info I can find on how they interact with generics would be what's covered under the "Implementation" portion of the union struct section in the proposal. Honestly, it looks like union structs just won't play terribly well with generics and will probably require boxing.

Not using structs for ad hoc unions is simply a matter of you can't know what's going to end up inside there in advance, due to the various indirection (interfaces, delegates) and type features (generics, variance) in the language. There is no way to orchestrate a layout that is guaranteed to make the runtime happy.

I understand each option in a union struct will be a separate struct, my question is more <how it will be optimized if you don't know the struct layout at compile time for generics> than this. Because even if you say that they are a "completely separate beast" from ad hoc unions they still suffer some of the same problems, unless the idea is that they will not be optimized at all and will only be structs with many other struct fields each occupying their own space in memory.

@snarbies
Copy link

snarbies commented Aug 8, 2024

I don't see any indication that union structs will be optimized for generics nor that there is any potential for overlap concerns. Like I said, I think to be usable within a generic, union structs would have to be boxed (in which case the runtime type is the discriminator).

Overlap is just one major problem for struct-based type unions.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 8, 2024

Overlap is just one major problem for struct-based type unions.

I don't think it's a huge concern. F# has never overlapped fields in struct unions, even in non-generic cases. I think it's a nice to have and I think that the spec should leave the possibility open for the compiler to apply optimizations like this where applicable/safe but I wouldn't consider it a blocker. I'd be more concerned about the compiler making an assumption as to when boxing would be the better solution.

@TahirAhmadov
Copy link

TahirAhmadov commented Aug 8, 2024

Regarding overlapping, I was chatting with somebody who made a union struct source generator which does overlapping, by recursively unwrapping complex structs, etc.

IMO, if an SG can do it, then the compiler can definitely do it.

PS. I also made a union struct SG (before I knew another one already existed) and I went with the simple non-overlapping approach because it made sense for my use case. https://www.nuget.org/packages/TA.SourceGenerators.EnumStruct/
No, I don't remember the name/contact of the person I was speaking with and their Nuget package :(

@HaloFour
Copy link
Contributor

HaloFour commented Aug 8, 2024

@TahirAhmadov

IMO, if an SG can do it, then the compiler can definitely do it.

"Can" and "should" are too different things, though. The more clever the compiler tries to be the harder it is for the runtime.

@TahirAhmadov
Copy link

Yes and no. The way his SG did it, is (by definition) purely within the boundaries of the current language and runtime constraints. Nothing is stopping the language from adopting a similar mechanism, in order to cover the 90% of the use cases, while leaving the generics, etc. 10% of the cases alone.

@masonwheeler
Copy link

I don't think generics would be a 10% use case, not with so many people saying Option<T, U> and similar ideas are what they see as the primary use case.

@TahirAhmadov
Copy link

@masonwheeler I'm not 100% sure what Option<T, U> is supposed to look like, but I was primarily talking about user-defined union struct U { A(int X), B(DateTime Y), C } types.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 8, 2024

I expect common DUs like Option<T> and Result<T, E> are small enough that the lack of overlap wouldn't be a massive deal. Option<T> basically wouldn't cost any more than Nullable<T> does today.

@roboz0r
Copy link

roboz0r commented Aug 8, 2024

I made a proposal in the F# GitHub about a possible way to implement overlapped struct unions given the current runtime limitations fsharp/fslang-suggestions#1333

@agocke
Copy link
Member

agocke commented Aug 8, 2024

FWIW any overlapping is probably trading size for throughput. Having a fixed-address offset, as opposed to a dynamic dispatch, will likely speed up the instruction pipeline.

@agocke
Copy link
Member

agocke commented Aug 8, 2024

I'll also say that I don't think Option<T> is a good candidate for "basic" discriminated unions because it has a very specific optimization (use a null pointer for None) that will not be available to any other type or runtime structure.

I think Option is a good candidate for dedicated behavior, both because of it's particular characteristics and the prominence of the use case.

Once we think about special-casing Option, the rest of the overlap use cases seem much less important to me.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 8, 2024

@agocke

I'll also say that I don't think Option<T> is a good candidate for "basic" discriminated unions because it has a very specific optimization (use a null pointer for None) that will not be available to any other type or runtime structure.

How would that work if T is a value type, and you don't want to box?

@snarbies
Copy link

snarbies commented Aug 8, 2024

I'm sure this has come up before, but sometimes null is among your inhabited values.

@TehPers
Copy link

TehPers commented Aug 8, 2024

The niche optimization that Rust performs on its enums is what lets it treat an Option<NonNullPtr> (where NonNullPtr is one of the many non-null pointer types in Rust) as the same size as a NonNullPtr (since NonNullPtr has null as a niche value), but I'm not sure how common that usage would be in C# for an Option<T> value type, and I'm also curious if there's a demand for that kind of optimization in C# anyway. Not that optimizing it if possible isn't a good thing, I just don't know how common it would be, especially since reference types in C# are all nullable anyway. As for enums, unlike Rust, C#'s enums can hold any value that their backing type can hold (so they function essentially as integers), so enums don't really have niche values either. One example of where it might come up that comes to mind is if you nest another value type that can use some padding bytes to store the variant, but I have no clue how possible that is with the current runtime.

@CyrusNajmabadi
Copy link
Member

As a point of clarification @TehPers, what you are calling "that kind of optimization in c#" would likely be better described as "that kind of optimization in .net". Layouts of objects, and special things around null-handling, are def in in the purview of the runtime vs the language. :)

@roboz0r
Copy link

roboz0r commented Aug 8, 2024

FWIW any overlapping is probably trading size for throughput. Having a fixed-address offset, as opposed to a dynamic dispatch, will likely speed up the instruction pipeline.

In my proposal the address offsets are fixed at compile time and there's no dynamic dispatch. The tradeoffs are to work around the runtime limitations:

  • Generics are not permitted
  • Containing a reference type is not permitted

@orthoxerox
Copy link

I'm sure this has come up before, but sometimes null is among your inhabited values.

This means that the type should have two representations: one for value types and nullable reference types, the other for non-nullable reference types. If only C# had definitely non-nullable types...

@agocke
Copy link
Member

agocke commented Aug 9, 2024

@agocke

I'll also say that I don't think Option<T> is a good candidate for "basic" discriminated unions because it has a very specific optimization (use a null pointer for None) that will not be available to any other type or runtime structure.

How would that work if T is a value type, and you don't want to box?

By special-casing, I mean special-casing in the runtime, similar to Nullable<T>. The runtime can know whether an instantiation is a reference type or value type. In the case of a value type it would store an extra field for None. In the case of a non-null reference type, it would be lowered to null. This would involve also teaching the runtime about reference nullability and enforcing it in at least this case.

@jkone27
Copy link

jkone27 commented Oct 10, 2024

would be great if for Option and Result existing FSharp.Core could be used or shared between roslyn and C#? to mantain some level of compatibility and also many parts of these are just already in place in F#?

*FSharp.Core: Result
*FSharp.Core: Option

why can't C#/Roslyn use FSharp.Core for this feature to some extent? this would allow all existing F# Result and Option types from libraries to be consumable from C# as well, plus, would make future C# result/options consumable from F#

i am sure most likely there is reasons, but just double checking/curious

@HaloFour
Copy link
Contributor

why can't C#/Roslyn use FSharp.Core for this feature to some extent?

F# employs a lot of custom metadata from F# specific assemblies, plus a binary resource blob of additional metadata. I would expect that as a part of this proposal that the C# team will come up with language-agnostic metadata (e.g. attributes defined in BCL assemblies) and convention, and that the F# team could then hopefully adopt that metadata to enable those types to interoperate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests