Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: "nominal records" for C# #1667

Closed
wants to merge 7 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
381 changes: 381 additions & 0 deletions proposals/data-classes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,381 @@

# Working with Data

When we talk about C# and start to talk about data, the conversation often
moves to talk about advanced data structures and complicated use cases. Here
I'd like to do the opposite: talk about simple data and the representations
we use for it.

To start, let's talk about what data is and what it isn't.

Data *is* a collection of values with potentially heterogenous types. Some
examples include

* Database rows
* JSON/XML messages
* Login info
* Configuration options

Data *is not*

* A process
* A computation
* A conversation
* Interactive
* An object
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example I'm unclear about.

Inheritance is explicitly mentioned further down, so what makes it not an object? Lack of a complex graph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheritance can be possible without changing the contract with the type, namely that it is simple data and does not represent an interactive process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a feature better suited to an external code generator rather than an addition to the language. It is too complicated to be considered syntactic sugar, but doesn't add 'new' functionality, it just "rewraps" existing available features. A code generator that accepts a data definition and generates the appropriate C# class would provide this without adding complexity to the language.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And such generators already exist - here's one: https://github.com/johnazariah/csharp-algebraictypes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple features that could not be implemented by a code generator in this proposal, including the use of object initializers to initialize read-only members.


What you can do with data

* Name it
* Read it
* Modify it (or prevent modification)
* Compose it
* Compare it
* Copy it

What you can't do

* Call it
* Query it

For an object-oriented language this may seem strange, because isn't
everything an object? In some sense, you can view data as a degenerate object
-- fields with pure transparency. But this also misses the point. The point
of object-oriented architecture is to bundle state and behavior and provide
composable objects that can interactively respond to the system, like cells
in an organism. There's value in this structure, but it also creates a
binding between the data and the behavior. By creating data individually we
allow the data to shift contexts and allow other components to define their
own behaviors.
Copy link

@chucker chucker Jun 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the argument here that data isn't an object because it has no behavior?


## What does it look like in C#?

C# has a couple different ways to represent simple data, but I contend that
all have fundamental problems.

**Anonymous types**. Things look pretty good at first: you can name, it
can read it, and you can compare it easily (equality is automatically
defined). You can't modify it, though -- anonymous types are always immutable
and you can't easily create a copy with only one change. The real problem
is composibility. You can't use it as a real type anywhere outside the
current method and you can't nest it in other data structures, except as
an object.

**Tuples**. Tuples are a lot like anonymous types. You can provide names,
read the elements, modify them, and compare them. You can't make them
immutable, but the real problem is in composition. Tuples aren't really
an abstraction -- you describe the data structure in full in every
place you use it. This makes it hard to expand tuples past a certain
size and makes it difficult to compose with other data structures because
you cannot refer to them by name.

**Classes/Structs**. This is by far the most common representation of data
in C#. A canonical example looks something like this:

```C#
public class LoginResource
{
public string Username { get; set; }
public string Password { get; set; }
public bool RememberMe { get; set; }
}
```

This feature provides names, for both the members and the data structure, it
provides easy nominal composition, and is easily composible with all other
data structures. It also provides a convenient syntax for creation by
interacting directly with the named data, e.g.

```C#
var x = new LoginResource {
Username = "andy",
Password = password
}
```

Unfortunately, there are still serious problems. There is no piecewise
comparer implicitly defined for C# classes, so if you want simple data
comparison, the real example looks like this:

```C#
using System;

public class LoginResource : IEquatable<LoginResource>
{
public string Username { get; set; }
public string Password { get; set; }
public bool RememberMe { get; set; } = false;

public override bool Equals(object obj)
=> obj is LoginResource resource && Equals(resource);

public bool Equals(LoginResource other)
{
return other != null &&
Username == other.Username &&
Password == other.Password &&
RememberMe == other.RememberMe;
}

public override int GetHashCode()
{
var hashCode = -736459255;
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Username);
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Password);
hashCode = hashCode * -1521134295 + RememberMe.GetHashCode();
return hashCode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is simpler now with HashCode.Combine. (just to make the example code fair :)).

}

public override string ToString()
{
return $"{{{nameof(Username)} = {Username}, {nameof(Password)} = {Password}, {nameof(RememberMe)} = {RememberMe}}}";
}

public static bool operator ==(LoginResource resource1, LoginResource resource2)
{
return EqualityComparer<LoginResource>.Default.Equals(resource1, resource2);
}

public static bool operator !=(LoginResource resource1, LoginResource resource2)
{
return !(resource1 == resource2);
}
}
```

Immutable data is also a problem. The object initializer syntax provides
a simple name-based mechanism to create a data type. With `readonly`
fields or properties, a constructor must be used instead. This creates
another set of problems:

1. A constructor must be manually defined.
1. The constructor parameters are ordered, while the fields are not.
Consumers have now taken a dependency on the parameter ordering.
1. The constructor must be maintained with any field changes.
Copy link
Member

@jaredpar jaredpar Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue: constructor parameter names don't necessarily line up with field / property names hence named argument passing doesn't have the same ease of use as object initializers. #Resolved

1. Constructor parameter names don't necessarily line up with field/property
names hence named argument passing doesn't have the same ease of use as
object initializers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: tehse are definitely cons of a constructor. But it might be nice to mention the pros as well. Namely terse and sensible syntax for 'data' that has well-understood positional ordering.


There is also no way to create a copy of a data structure with readonly
fields with only one item changed. A new type must be constructed manually.

Copy link
Member

@jaredpar jaredpar Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double blank line. #Resolved

## Proposal

To resolve many of these issues, I propose a new modifier for classes and structs: `data`.
`data` classes or structs are meant to satisfy the goals listed above by doing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a sample of a data class usage in the proposal?

following things:

1. Automatically generating `Equals`, `GetHashCode`, `ToString`, `==`, `!=`, and `IEquatable<T>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is ToString relevant here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on including ToString. It's handy for debugging.
A debugger display tip would be ok, as an alternative.


In reply to: 198207292 [](ancestors = 198207292)

based on the member data of the type.
1. Allow object initializers to also initialize readonly members.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's mentioned later. But is it possible to override any of this? For example, if you want to provide a better GethashCode impl? Or if you want to provide your own specific ToString?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. We may want to give a warning/error if you override everything and data just means nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, i was hoping 'data' would solve the issue of:

+There is also no way to create a copy of a data structure with readonly
+fields with only one item changed. A new type must be constructed manually.

So i could imagine having a data class where i override everything. But i still benefit from the fact that you provided me the easy way to generate a copy with only some pieces changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would not give the warning whenever there are read-only data members because someone may be using the object initializer support for them.

As far as With-ers goes, they will be generated too, but I don't have the details worked out yet. I think we may want to introduce an actual with { } expression to mirror the object initializer syntax to make handling read-only members easier (and not generate many intermediate objects).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM


Data classes or structs represent unordered, *named* data, like the simple
`LoginResource` class that people write today.

The LoginResource class now could be defined as

```C#
public data class LoginResource
{
public string Username { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it even make sense to make these non-public? Could it default to public?

Copy link
Member Author

@agocke agocke Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private fields and properties are perfectly fine. We could consider it, but right now I'm not wild about changing the default accessibility of members. That's a significant amount of extra complexity in the language, and peoples' style guidelines often depend on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also related, I think it's actually useful to force properties to come first in the class body, otherwise it doesnt look like a "data class" anymore. sure it might be basically a style preference but it worth to consider.

public string Password { get; }
public bool RememberMe { get; } = false;
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be called out somewhere already, but one can add other members such as fields and methods (including some implementations that supercede the auto-implemented GetHashCode/Equals/ToString/...).


and the use would be identical:

```C#
var x = new LoginResource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like initializer becoming special calls to generated constructor, since I'm using

var x = new LoginResource
(
    username: "andy",
    password: password
) 

, looks similar to object initializer, having benefits of readonly and preventing violate after initializer in old compilers.

Username = "andy",
Password = password
};
Copy link
Member

@jcouv jcouv Jun 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to think about the intersection between data types and nullable feature. The nullable feature produces warnings if some non-null fields are left uninitialized by the constructor.
If don't initialize a read/write non-null data member, we'd probably want to warn...
Can you take a note of that somewhere to remind ourselves to think the issue through? Thank

```

Note that `RememberMe` must have an initializer to avoid a warning in the
object initializer about an unset read-only property.

However, the generated class code would look like:

```C#
public class LoginResource : IEquatable<LoginResource>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEquatable [](start = 29, length = 10)

Looking at System.Tuple and System.ValueTuple, there are some more interfaces to consider: IEquatable, IStructuralEquatable, IStructuralComparable, IComparable, IComparable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoginResource [](start = 13, length = 13)

I assume that we would tag this type somehow (maybe with an attribute), to identify it as a data class?

Copy link
Member

@alrz alrz Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make IEquatable impl opt-in? e.g. only if spelled out in the source: data class C : IEquatable<C>

(or opt-out a la fsharp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce the amount of possibly unused code generated. the opt-in solution would be synonymous to derive or deriving in rust and haskell.

{
public string <>Backing_Username;
public string Username => <>Backing_Username;
public string <>Backing_Password;
public string Password => <>Backing_Password;
public string <>Backing_RememberMe = false;
public bool RememberMe => <>Backing_RememberMe;

protected LoginResource() { }

public static LoginResource Init() => new LoginResource();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init [](start = 32, length = 4)

It would be good to show what how caller code is transformed to use these APIs. I assume it's simply:

var temp = LoginResource.Init();
temp.<>Backing_Username = "";
temp.<>Backing_Password = "";
...

I'm wondering if the Init method should also be unpronouncable. If someone wrote var temp = LoginResource.Init(); they would escape the initialization checks (which members are left unset).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should absolutely be unspeakable.


public override bool Equals(object obj)
=> obj is LoginResource resource && Equals(resource);

public bool Equals(LoginResource other)
{
return other != null &&
EqualityContractOrigin == other.EqualityContractOrigin &&
Username == other.Username &&
Password == other.Password &&
RememberMe == other.RememberMe;
}

protected virtual Type EqualityContractOrigin => typeof(LoginResource);

public override int GetHashCode()
{
unchecked
{
return EqualityComparer<string>.Default.GetHashCode(Username) +
EqualityComparer<string>.Default.GetHashCode(Password) +
RememberMe.GetHashCode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you can only compare with other LoginResource's... is there a reason that hashing is unordered?

}
}

public override string ToString()
{
return $"{{{nameof(Username)} = {Username}, {nameof(Password)} = {Password}, {nameof(RememberMe)} = {RememberMe}}}";
}

public static bool operator ==(LoginResource resource1, LoginResource resource2)
{
return EqualityComparer<LoginResource>.Default.Equals(resource1, resource2);
}

public static bool operator !=(LoginResource resource1, LoginResource resource2)
{
return !(resource1 == resource2);
}
}
```

### Equality

First, the generation of equality support. Data members are only public
fields and auto-properties. This allows data classes to have private
implementation details without giving up simple equality semantics. There are
a few places this could be problematic. For instance, only auto-properties
are considered data members by default, but it's not uncommon to have some
simple validation included in a property getter that does not meaningfully
change the semantics, e.g.

```C#
{
...
private int _field;
public int Field
{
get
{
Debug.Assert(_field >= 0);
return _field;
}
set { ... }
}
}
```

To support these cases and provide an easy escape hatch, I propose a
new attribute, `DataMemberAttribute` with a boolean flag argument on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Runtime.Serialization.DataMemberAttribute already exists in the BCL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. DataMemberAttribute2 😄

constructor. This allows users to override the normal behavior and include
or exclude extra members in equality. The previous example would now read:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems odd to me. Given that it's a Data-Class or Data-Struct, my bias is that the public props/fields should be part of the data-ness. It seems like if someone does not want that, the attribute should be to opt-out, instead of needing to opt-in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-properties are part of data-ness, computed properties may not be. That's especially true if you have, say, a half dozen computed properties all reading from single piece of data that you could compare directly for equality. For instance, flags enums to boolean properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-properties are part of data-ness, computed properties may not be.

Agreed. but my point is simply: when something "may not be", what side of hte fence do you decide they're on. My argument is we shoudl just assume they are part of the data-ness. And if you don't want that, you opt-out.

Note: i also think opt-out is a good idea because maybe i want an auto-prop and i do not want that auto-prop to be part of the data-ness. Now, i'd have a consistent way for auto-props and computed-props to say that. i.e.:

[NotData]
internal int LogLevel { get; set; }

I want this to be an auto-prop because i really don't need to compute anything. But i don't want it participating in the data-ness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's especially true if you have, say, a half dozen computed properties all reading from single piece of data that you could compare directly for equality.

This example is interesting to me, because for the data classes I would want to write, that piece of data would not be a public prop/field, and would instead be a private field. The example that immediately comes to mind is CommonConversion in Roslyn, where it's a bunch of bool properties that we internally store as bits in an integer. For that example, what would you propose for comparison? Would there be some way of saying "These public things aren't part of equality, but this private field is"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@333fred I think in that case, the best thing would be to just provide your own override of GetHashCode/Equals. i.e. you know better. So you can give the fastest impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@333fred That's what the proposal is talking about. You put [DataMember(true)] on the private field and it's considered in equality. The public computed properties are not.

I think @CyrusNajmabadi's approach is too heavy-handed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

Note: in fred's example, he jsut wants the private field used for equals/gethashcode. Is there then a way to opt-out of something being a data member? i.e. even if you have an auto-prop, can you put [DataMember(false)] on it?

If so, i think i'm happy. Basically, the language would have rules about if an unadorned member was a data-member or not (i.e. auto-props are, blah blah blah are not). But you can always put on DataMember(true/false) on anything to explicitly opt-in/out.

Does that make sense ot you Andy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi Yeah, the boolean flag in the constructor was supposed to work both ways. To both opt-in and opt-out.


```C#
{
...
private int _field;

[DataMember(true)]
public int Field
{
get
{
Debug.Assert(_field >= 0);
return _field;
}
}
}
```

Equality itself would be defined in terms of its data members. A `data` type
is equal to another `data` type when there is an implicit conversion between
the target type and the source type and each of the corresponding members
are equal. The members are compared by `==` if it is available. Otherwise,
the method `Equals` is tried according to overload resolution rules (st. an
`Equals` method with an identity conversion to the target type is preferred
over the virtual `Equals(object)` method).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this better than saying that you use EqualityComparer<X>.Default.Equals(a, b)? That shoudl do "the right thing" for basically all types, right? Of course, the compiler could consider optimizing that for some things like primitive types. but it seems odd to bake in the knowledge about == vs .Equals vs IEquatable.Equals.

Copy link
Member Author

@agocke agocke Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question that I don't know the answer to. (yet)


There is also one hidden data member, `protected virtual Type EqualityContractOrigin { get; }`, that is
always considered in equality. By default this member always returns the
static type of its containing type, i.e. `typeof(Containing)`. This means
that sub-classes are not, by default, considered equal to their base classes,
or vice versa. This also ensures that equality is commutative and `GetHashCode`
matches the results of `Equals`. These methods are virtual, so they can be
overridden, but then it is the user's responsibility to ensure that they
abide by the appropriate contract.

`GetHashCode` would be implemented by calling `GetHashCode` on each of
the data members.

### Readonly initialization

Support for `readonly` members in object initializers may seem like a small
feature, but it's important that making a `data` type readonly not come with
a lot of extra ceremony. The essence of a `data` type is a set of named fields
and that should stay true regardless of whether or not the fields are
`readonly`. It may be tempting for implementation simplicity to try to use
constructors instead, but this is a design smell that conflates positional
semantics with `readonly` semantics. Requiring initialization via constructor
means that field order becomes a public API and requires careful versioning,
which is not true of mutable `data` types and is a constraint that should
be irrelevant to `readonly` semantics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea. This feels like this may be unpalatable for a section of users. It sounds like you would initialize with something akin to the field/property initializers you have today in C#. i.e. new Foo { X = x, Y = y. etc }. I would contend there are customers for whom concerns like smell and careful versioning do not apply, who would find the above far too verbose and cumbersome for their tastes. Not having a form like new Foo(x, y) may be too bitter a pill for some to swallow.

Copy link
Contributor

@HaloFour HaloFour Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most unpalatable aspect of this would be the compiler surfacing public mutable fields with autogenerated unspeakable names with only a wink and a nudge to enforce their immutability. There must be a significantly better way to accomplish this behavior without requiring private implementation details to be exposed which then every other compiler either has to ignore or has to respect.

What about creating a nested public struct to contain all of the data members and having the constructor or a static factory method accepting that struct?

public data class LoginResource {
    public string Username { get; }
    public string Password { get; }
}
var loginResource = new LoginResource {
    Username = "andy",
    Password = password
};

// translates into

public class LoginResource {
    private readonly DataMembers _data;

    public LoginResource(DataMembers data) => _data = data;
    
    public string Username { get => _data.Username; }
    public string Password { get => _data.Password; }

    public struct DataMembers {
        public string Username { get; set; }
        public string Password { get; set; }
    }

    // other members elided for brevity
}
var loginResource = new LoginResource(new LoginResource.DataMembers {
    Username = "andy",
    Password = password
});

Copy link
Member Author

@agocke agocke Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaloFour I'm came up with this exact same idea! The problem is, what if you have a data struct? Now you have two structs for every struct! That seems like a lot of metadata bloat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke

That seems like a lot of metadata bloat.

Which, in my opinion, is a much more palatable issue to have than publicly-exposed secretly named writable "readonly" fields. 😄

The builder pattern isn't an uncommon one to see, especially with readonly data classes. To see it codified and accessible as a language feature would be a welcome addition, in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, too, think this would be worth exploring.

Though, i'm also amenable to either approach being taken. I think @HaloFour (understandably) has a very visceral reaction to 'faked up' readonly. I'm more ok with it. But it would be nice if the underlying generation was actually just 'clean' and followed some sort of pattern that any language could consume/produce without having to understand hackery...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to reveal too many secrets but...

all of F#'s supposedly thoroughly safe, immutable data structures are not marked initonly in metadata

😉

This is not to imply F# is unsafe -- just the opposite. I think the truth is that language rules provide the vast majority of safety. This is also the position we took with the new ref readonly feature, where we allow ref variables to be taken directly to readonly fields and rely on the C# language rules to enforce the safety of ref readonly (which the CLR has no conception of).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke I think relying on language rules with private or internal members is fine, but doing the same with public members is much more problematic. Does F# ever produce public non-initonly fields for its immutable data structures?

And ref readonly uses modreq, which means that some compiler could circumvent those rules, but it would have to be wilful.

Using unspeakable names offers questionable safety, especially since there is a major .Net language that can speak them, namely F# with its double backtick syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke

Warning: I saw Hello World implemented in F# once so I'm totally qualified to comment on the language.

F# also encodes a boatload of secret metadata in binary resources that is intended to only be understood and supported by F#. Where you can expose metadata publicly either the rules are different or the magic breaks down. An example might be F#'s extended generic type parameter constraints which most languages don't understand and don't enforce only to have the program explode at runtime if the generic type argument isn't "valid". In my opinion C# should be a much better citizen in the .NET ecosystem of languages and should try to avoid relying on magic public behavior where it is possible.

I do agree that readonly ref does involve some magic, but I would argue that it's different in that it's relatively niche and that the magic mostly happens on the implementer side. IIRC it would take someone writing some pathological code to simulate a readonly ref from another language which could abuse the lack of enforcement. Data classes, on the other hand, are intended for a significantly wider audience to eliminate the boilerplate of creating these very common classes. Having a leaky abstraction where any consumer can wreck the internal state of the class seems incredibly dangerous.

Anywho, the crux of my argument is that I would prefer that clean/safe approaches be considered before hacky ones. In fact, I think support for a builder pattern would be nice for C# as an orthogonal feature and data classes can simply piggyback on that.


One way to remove dependence on a constructor is simply not make the members
`readonly` in metadata. The CLR treats `readonly` mostly as guidance -- it
can easily be overriden using reflection anyway. Most of the safety of
`readonly` members in C# is not provided by the runtime, but by C# safety
rules. One way we could enforce compiler rules would be to generate public
`get`-only properties and make the backing field public and mutable. Object
initializers would be able to set the properties, but user code wouldn't be
able to because the backing properties are unspeakable.

One problem with this strategy is `readonly` fields. In that case there is
no backing field to hide. There are two possible solutions. The first is
to forbid public `readonly` fields and require properties. The second is
to make all fields into properties automatically. This is strange because
we would be generating a property from a field syntax. However, it removes
what will be a meaningless restriction for the user, only mandated by
implementation difficulties. Property substition will never be a perfect
abstraction (reflection will be able to see properties, for example) but the
solution would probably be able to match user expectations for the vast
majority of cases. The properties would also be `ref readonly` returning, so
even uses of `in` or `ref readonly` would function as expected.

If data classes contain any `readonly` members that do not have initializers,
they also do not define a default public constructor like other classes.
Instead, they define a protected constructor with no arguments, and an
unspeakable public "initialization" method. This method is called when using
an object initializer and the compiler verifies that all `readonly` members
are initialized, or an error is produced.


## Extensible data classes (data class subtyping)

Like normal C# classes, data classes are not sealed by default and can be
inherited from in sub-classes.

In non-`data` sub-classes, if there are any readonly members without
default initialization in the base class, the subclass is required to
define a protected constructor. The constructor must assign all readonly
members of the base class before the constructor ends, or an error is
produced.

In `data` sub-classes, the requirements of the base become requirements
of the sub-class, such that initialization of the sub-class must also
initialize all of the required members of the base.


## TODO: "With"-ers