Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: "nominal records" for C# #1667

Closed
wants to merge 7 commits into from
Closed

Conversation

agocke
Copy link
Member

@agocke agocke commented Jun 25, 2018

This is the first part of a set of proposals about data manipulation and representation in C#, but can be viewed as a standalone language feature.

@agocke
Copy link
Member Author

agocke commented Jun 25, 2018

cc @jaredpar

@agocke agocke requested a review from a team June 25, 2018 20:58
@jcouv jcouv added the Proposal label Jun 25, 2018
First, the generation of equality support. Data members are only public
fields and auto-properties. This allows data classes to have private
implementation details for things like caching without giving up simple
equality semantics. There are a few places this could be problematic. For
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying this is wrong. But this def feels like it needs to be expanded upon, esp. give the above sections on what data is/isn't. Talking about impl-details(i.e computation) and caching, after somewhat saying that data is not these things isn't wrong, but it's very subtle and muddies things a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll elaborate more on this. The idea is that the public contract should be seen as a collection of data. As always, there's some room to hide your dirty laundry in private implementation details if the real world is more complicated than that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this also breaks serialization (in particular, _de_serialization), at least the way it's commonly implemented in e.g. Newtonsoft.Json.


`GetHashCode` would be implemented by calling `GetHashCode` on each of
the data members and these would be combined using a symmetric operation
(like `xor`) to prevent ordering from affecting the results.
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected that 'data' is unordered? Above, you mentioned:

Data *is* a collection of values with potentially heterogenous types.

Should that be "an unordered collection of values"?

Or, potentially:

Data *is* a collection of **named** values with potentially heterogenous types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke No, this is a different issue. The problem with using a symmetric operation for combining hash codes is that it means that e.g. new Point { X = 1, Y = 0 }.GetHashCode() returns the same value as new Point { X = 0, Y = 1 }.GetHashCode().

This is easily solvable by adding the name of the field to the hash.

The comment about pretending that types are the same is because I assumed the reason for symmetry is that you wanted e.g. new Contoso.Person { FirstName = "Andy", LastName = "Gocke" }.GetHashCode() to be the same as new AdventureWorks.Person { LastName = "Gocke", FirstName = "Andy" }.GetHashCode().

Again, making the name of the field part of the hash solves the problem.

semantics with `readonly` semantics. Requiring initialization via constructor
means that field order becomes a public API and requires careful versioning,
which is not true of mutable `data` types and is a constraint that should
be irrelevant to `readonly` semantics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea. This feels like this may be unpalatable for a section of users. It sounds like you would initialize with something akin to the field/property initializers you have today in C#. i.e. new Foo { X = x, Y = y. etc }. I would contend there are customers for whom concerns like smell and careful versioning do not apply, who would find the above far too verbose and cumbersome for their tastes. Not having a form like new Foo(x, y) may be too bitter a pill for some to swallow.

Copy link
Contributor

@HaloFour HaloFour Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most unpalatable aspect of this would be the compiler surfacing public mutable fields with autogenerated unspeakable names with only a wink and a nudge to enforce their immutability. There must be a significantly better way to accomplish this behavior without requiring private implementation details to be exposed which then every other compiler either has to ignore or has to respect.

What about creating a nested public struct to contain all of the data members and having the constructor or a static factory method accepting that struct?

public data class LoginResource {
    public string Username { get; }
    public string Password { get; }
}
var loginResource = new LoginResource {
    Username = "andy",
    Password = password
};

// translates into

public class LoginResource {
    private readonly DataMembers _data;

    public LoginResource(DataMembers data) => _data = data;
    
    public string Username { get => _data.Username; }
    public string Password { get => _data.Password; }

    public struct DataMembers {
        public string Username { get; set; }
        public string Password { get; set; }
    }

    // other members elided for brevity
}
var loginResource = new LoginResource(new LoginResource.DataMembers {
    Username = "andy",
    Password = password
});

Copy link
Member Author

@agocke agocke Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaloFour I'm came up with this exact same idea! The problem is, what if you have a data struct? Now you have two structs for every struct! That seems like a lot of metadata bloat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke

That seems like a lot of metadata bloat.

Which, in my opinion, is a much more palatable issue to have than publicly-exposed secretly named writable "readonly" fields. 😄

The builder pattern isn't an uncommon one to see, especially with readonly data classes. To see it codified and accessible as a language feature would be a welcome addition, in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, too, think this would be worth exploring.

Though, i'm also amenable to either approach being taken. I think @HaloFour (understandably) has a very visceral reaction to 'faked up' readonly. I'm more ok with it. But it would be nice if the underlying generation was actually just 'clean' and followed some sort of pattern that any language could consume/produce without having to understand hackery...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to reveal too many secrets but...

all of F#'s supposedly thoroughly safe, immutable data structures are not marked initonly in metadata

😉

This is not to imply F# is unsafe -- just the opposite. I think the truth is that language rules provide the vast majority of safety. This is also the position we took with the new ref readonly feature, where we allow ref variables to be taken directly to readonly fields and rely on the C# language rules to enforce the safety of ref readonly (which the CLR has no conception of).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke I think relying on language rules with private or internal members is fine, but doing the same with public members is much more problematic. Does F# ever produce public non-initonly fields for its immutable data structures?

And ref readonly uses modreq, which means that some compiler could circumvent those rules, but it would have to be wilful.

Using unspeakable names offers questionable safety, especially since there is a major .Net language that can speak them, namely F# with its double backtick syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke

Warning: I saw Hello World implemented in F# once so I'm totally qualified to comment on the language.

F# also encodes a boatload of secret metadata in binary resources that is intended to only be understood and supported by F#. Where you can expose metadata publicly either the rules are different or the magic breaks down. An example might be F#'s extended generic type parameter constraints which most languages don't understand and don't enforce only to have the program explode at runtime if the generic type argument isn't "valid". In my opinion C# should be a much better citizen in the .NET ecosystem of languages and should try to avoid relying on magic public behavior where it is possible.

I do agree that readonly ref does involve some magic, but I would argue that it's different in that it's relatively niche and that the magic mostly happens on the implementer side. IIRC it would take someone writing some pathological code to simulate a readonly ref from another language which could abuse the lack of enforcement. Data classes, on the other hand, are intended for a significantly wider audience to eliminate the boilerplate of creating these very common classes. Having a leaky abstraction where any consumer can wreck the internal state of the class seems incredibly dangerous.

Anywho, the crux of my argument is that I would prefer that clean/safe approaches be considered before hacky ones. In fact, I think support for a builder pattern would be nice for C# as an orthogonal feature and data classes can simply piggyback on that.

@HaloFour
Copy link
Contributor

When will an Issue be created for this proposal? I imagine that some of these points will be quite contentious. Just briefly scanning the "readonly initialization" section is enough to give me nightmares.

@gulshan
Copy link

gulshan commented Jun 26, 2018

How this proposal relates to Records #39 ? As I commented there, I am for just a primary constructor based short hand definition capability for all/any types, and then the equality, Hash, with and deconstructor generation on top of that when associated with some keyword like data (or attribute like [Data]). The same data keyword being proposed here with equality and hash generation. Shouldn't these two proposals be consolidated?

I think proposed readonly initialization changes current meaning of initializers in C#. With non-nullable references are coming, initialization of individual reference type fields will be a concern. I personally think, default values for primary constructor parameters are enough to deal with this. It also optionally provides names for fields and ignore the order of the arguments to some extent.

is a subtype of `A` (and therefore has an implicit conversion) and all of
the shared members between `A` and `B` are equal. In contrast, `A` would
likely not have an implicit conversion to `B`, so they would not be
considered equal.
Copy link
Contributor

@svick svick Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand what you're proposing correctly, then I think this is a terrible idea:

  • It means that a.Equals(b) can return true, even though a.GetHashCode() and b.GetHashCode() are not equal.
  • As mentioned, it means Equals is not symmetric.

Both of these points mean that these types won't work correctly in hash-based collections like Dictionary<TKey, TValue> or HashSet<T>. And they also break the invariants required by implementations of Equals(object) and GetHashCode(). #Resolved

Copy link
Member Author

@agocke agocke Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. The problem I ran into was: how do I prohibit it?

Making .Equals do a dynamic type check seems problematic. So does calling GetHashCode inside Equals.

However, Neal has sent me a great idea for producing a symmetric Equals:

class Person
{
    public string Name { get; }
 
    public Person(string name) { Name = name; }
 
    protected virtual Type EqualityContractOrigin => typeof(Person); // Note here
 
    public override bool Equals(object obj)
    {
        var that = obj as Person;
        return that != null &&
            that.EqualityContractOrigin == this.EqualityContractOrigin && // Note here
            that.Name.Equals(this.Name);
    }
}

I really like this solution.

#Resolved

Copy link
Contributor

@svick svick Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke

Making .Equals do a dynamic type check seems problematic.

I think that's actually the right solution. Different types should not compare as equal, even if e.g. a derived type does not override methods it should override.

Why would it be problematic?

If it's about performance, I think code like that.GetType() == this.GetType() has the potential to be more efficient than two virtual calls (though the CLR could do a better job at optimizing it). #Resolved


`GetHashCode` would be implemented by calling `GetHashCode` on each of
the data members and these would be combined using a symmetric operation
(like `xor`) to prevent ordering from affecting the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the primary result of this would be a bad hash code.

Trying to pretend that two different data classes are the same if they have the same members sounds like a bad idea to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like you're restating the previous comment. If Equals is symmetric, do you have any other objection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke No, this is a different issue. The problem with using a symmetric operation for combining hash codes is that it means that e.g. new Point { X = 1, Y = 0 }.GetHashCode() returns the same value as new Point { X = 0, Y = 1 }.GetHashCode().

I believe that in some data sets, this could increase the number of collisions by a lot.

The comment about pretending that types are the same is because I assumed the reason for symmetry is that you wanted e.g. new Contoso.Person { FirstName = "Andy", LastName = "Gocke" }.GetHashCode() to be the same as new AdventureWorks.Person { LastName = "Gocke", FirstName = "Andy" }.GetHashCode().

Sorry I was unclear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in that case the intention is: yes, fields are deliberately unordered. However, there is no structural equality -- the types must be nominally compatible for two types to be considered equal. It sounds like we're also centering around subtypes not being equal to supertypes, so that would imply the types are equal as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke In that case, I think the order of fields should matter specifically for the purpose of computing the hash code. Hash codes should not be preserved across executions, so I think it's fine if changing the order of fields changes the hash code.

(Even stronger example is .Net Core 2.1 HashCode. It uses randomization, which means hash codes produced by it change even if the types themselves don't. If that behavior is acceptable, then I think depending on the order of fields should be also acceptable, even if they are otherwise unordered.)

default initialization in the base class, the subclass is required to
define a protected constructor. The constructor must assign all readonly
members of the base class before the constructor ends, or an error is
produced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that because of this, it's not possible to make something like the following code work?

data class Person
{
    public string Name { get; }
}

data class Student : Person
{
    public int Grade { get; }
}new Student { Name = "Jane Doe", Grade = 1 }

Or do these rules only apply to inheritors that are normal classes, not data classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only apply to normal classes. Data classes should inherit the requirements of their base data classes.

@gafter
Copy link
Member

gafter commented Jun 26, 2018

@agocke we figured out how to make equals be symmetric in the case of inheritance. I’ll send you notes tomorrow.

@agocke
Copy link
Member Author

agocke commented Jun 26, 2018

@HaloFour Several members of the LDM requested a PR, since that's essentially an issue in GitHub, but it has better change tracking and you can leave comments on individual sections.

@agocke
Copy link
Member Author

agocke commented Jun 26, 2018

@gulshan @CyrusNajmabadi If you're looking for an ultra-short syntax for data members, you may be interested in my forthcoming companion proposal for expanding tuples (I call them "named tuples").

{
public string Username { get; set; }
public string Password { get; set; }
public bool RememberMe { get; set; } = false;
Copy link
Member

@jaredpar jaredpar Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have = false here? It's the default and having it put there makes me think it's relevant to the feature. #Resolved

Unfortunately, there are still serious problems. There is no piecewise
comparer implicitly defined for C# classes, so if you want simple data
comparison, the real example looks like
[this](https://gist.github.com/agocke/6ba6e64f77f1212ba7292bfd1f1000e5)
Copy link
Member

@jaredpar jaredpar Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than link to a GIST I would just inline the sample here. Yes it's a big sample but that's part of the point. #Resolved

1. A constructor must be manually defined.
1. The constructor parameters are ordered, while the fields are not.
Consumers have now taken a dependency on the parameter ordering.
1. The constructor must be maintained with any field changes.
Copy link
Member

@jaredpar jaredpar Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue: constructor parameter names don't necessarily line up with field / property names hence named argument passing doesn't have the same ease of use as object initializers. #Resolved


There is also no way to create a copy of a data structure with readonly
fields with only one item changed. A new type must be constructed manually.

Copy link
Member

@jaredpar jaredpar Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double blank line. #Resolved

## Proposal

To resolve many of these issues, I propse a new modifier for classes and structs: `data`.
`data` classes or structs are meant to satisfy the goals listed above by doing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a sample of a data class usage in the proposal?

`data` classes or structs are meant to satisfy the goals listed above by doing the
following things:

1. Automatically generating `Equals`, `GetHashCode`, `ToString`, `==`, `!=`, and `IEquatable<T>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is ToString relevant here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on including ToString. It's handy for debugging.
A debugger display tip would be ok, as an alternative.


In reply to: 198207292 [](ancestors = 198207292)


## Proposal

To resolve many of these issues, I propse a new modifier for classes and structs: `data`.
Copy link
Contributor

@jnm2 jnm2 Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo propse #Resolved

`readonly` in metadata. The CLR treats `readonly` mostly as guidance -- it
can easily be overriden using reflection anyway. Most of the safety of
`readonly` members in C# is not provided by the runtime, but by C# safety
rules. One way we could enfore compiler rules would be to generate public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo enfore

@jnm2
Copy link
Contributor

jnm2 commented Jun 26, 2018

Concerned about the public unspeakable members, but I'm thrilled about this proposal in general. Can hardly wait for this!

  • There is also no way to create a copy of a data structure with readonly fields with only one item changed. A new type must be constructed manually.

    I'm particularly interested in this since today I'm maintaining With* methods by hand. What would the syntax be for creating a copy with only one item changed?
    If I wanted to suppress a certain With* method or add a custom With* method to require two fields to be set at once, perhaps with very simple validation, would I have to swap the entire class over to a normal class first?

  • Would permitting the manual override of ToString on a data type ever be considered? I write data types and override ToString for UI data binding to control the display of the object, choosing one or two of the fields to use in the display string.

@HaloFour
Copy link
Contributor

@agocke

Several members of the LDM requested a PR, since that's essentially an issue in GitHub, but it has better change tracking and you can leave comments on individual sections.

I agree that PRs make it easier to discuss specific portions of a proposal, but it doesn't feel appropriate to have this one discussion segregated from every other discussion. Once this proposal is merged the conversation is effectively lost since a user would have to know to look in a completely different place to find it. The only thing worse than the shortcomings of the Issues is scattering the discussions all over the place.

@agocke
Copy link
Member Author

agocke commented Jun 26, 2018

@dotnet/csharplangdesign See @HaloFour's comment above about proposal formatting.

@gulshan
Copy link

gulshan commented Jun 26, 2018

So, what about the Records #39 ? Abandoned?

Fixes some typos and changes equality to be commutative
even in the presence of subtyping.
@agocke
Copy link
Member Author

agocke commented Jun 26, 2018

@gulshan You can see this as part of a broader "records" proposal. If the thing you were looking for in records was the syntax, that will be part of a follow-up proposal I'm calling "named tuples".

hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Username);
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Password);
hashCode = hashCode * -1521134295 + RememberMe.GetHashCode();
return hashCode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is simpler now with HashCode.Combine. (just to make the example code fair :)).

1. The constructor must be maintained with any field changes.
1. Constructor parameter names don't necessarily line up with field/property
names hence named argument passing doesn't have the same ease of use as
object initializers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: tehse are definitely cons of a constructor. But it might be nice to mention the pros as well. Namely terse and sensible syntax for 'data' that has well-understood positional ordering.


1. Automatically generating `Equals`, `GetHashCode`, `ToString`, `==`, `!=`, and `IEquatable<T>`
based on the member data of the type.
1. Allow object initializers to also initialize readonly members.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's mentioned later. But is it possible to override any of this? For example, if you want to provide a better GethashCode impl? Or if you want to provide your own specific ToString?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. We may want to give a warning/error if you override everything and data just means nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, i was hoping 'data' would solve the issue of:

+There is also no way to create a copy of a data structure with readonly
+fields with only one item changed. A new type must be constructed manually.

So i could imagine having a data class where i override everything. But i still benefit from the fact that you provided me the easy way to generate a copy with only some pieces changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would not give the warning whenever there are read-only data members because someone may be using the object initializer support for them.

As far as With-ers goes, they will be generated too, but I don't have the details worked out yet. I think we may want to introduce an actual with { } expression to mirror the object initializer syntax to make handling read-only members easier (and not generate many intermediate objects).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

{
public string Username { get; set; }
public string Password { get; set; }
public bool RememberMe { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nervous about these being mutable. can i make immutable if i want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good :) Personally, in any docs, i think it might be good to start with the data being immutable. but then showing how you can opt-into mutability pretty easily. but that' sa very minor point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the example to read-only to show off the codegen hacks.

{
return EqualityComparer<string>.Default.GetHashCode(Username) +
EqualityComparer<string>.Default.GetHashCode(Password) +
RememberMe.GetHashCode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you can only compare with other LoginResource's... is there a reason that hashing is unordered?

To support these cases and provide an easy escape hatch, I propose a
new attribute, `DataMemberAttribute` with a boolean flag argument on the
constructor. This allows users to override the normal behavior and include
or exclude extra members in equality. The previous example would now read:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems odd to me. Given that it's a Data-Class or Data-Struct, my bias is that the public props/fields should be part of the data-ness. It seems like if someone does not want that, the attribute should be to opt-out, instead of needing to opt-in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-properties are part of data-ness, computed properties may not be. That's especially true if you have, say, a half dozen computed properties all reading from single piece of data that you could compare directly for equality. For instance, flags enums to boolean properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-properties are part of data-ness, computed properties may not be.

Agreed. but my point is simply: when something "may not be", what side of hte fence do you decide they're on. My argument is we shoudl just assume they are part of the data-ness. And if you don't want that, you opt-out.

Note: i also think opt-out is a good idea because maybe i want an auto-prop and i do not want that auto-prop to be part of the data-ness. Now, i'd have a consistent way for auto-props and computed-props to say that. i.e.:

[NotData]
internal int LogLevel { get; set; }

I want this to be an auto-prop because i really don't need to compute anything. But i don't want it participating in the data-ness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's especially true if you have, say, a half dozen computed properties all reading from single piece of data that you could compare directly for equality.

This example is interesting to me, because for the data classes I would want to write, that piece of data would not be a public prop/field, and would instead be a private field. The example that immediately comes to mind is CommonConversion in Roslyn, where it's a bunch of bool properties that we internally store as bits in an integer. For that example, what would you propose for comparison? Would there be some way of saying "These public things aren't part of equality, but this private field is"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@333fred I think in that case, the best thing would be to just provide your own override of GetHashCode/Equals. i.e. you know better. So you can give the fastest impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@333fred That's what the proposal is talking about. You put [DataMember(true)] on the private field and it's considered in equality. The public computed properties are not.

I think @CyrusNajmabadi's approach is too heavy-handed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

Note: in fred's example, he jsut wants the private field used for equals/gethashcode. Is there then a way to opt-out of something being a data member? i.e. even if you have an auto-prop, can you put [DataMember(false)] on it?

If so, i think i'm happy. Basically, the language would have rules about if an unadorned member was a data-member or not (i.e. auto-props are, blah blah blah are not). But you can always put on DataMember(true/false) on anything to explicitly opt-in/out.

Does that make sense ot you Andy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi Yeah, the boolean flag in the constructor was supposed to work both ways. To both opt-in and opt-out.

are equal. The members are compared by `==` if it is available. Otherwise,
the method `Equals` is tried according to overload resolution rules (st. an
`Equals` method with an identity conversion to the target type is preferred
over the virtual `Equals(object)` method).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this better than saying that you use EqualityComparer<X>.Default.Equals(a, b)? That shoudl do "the right thing" for basically all types, right? Of course, the compiler could consider optimizing that for some things like primitive types. but it seems odd to bake in the knowledge about == vs .Equals vs IEquatable.Equals.

Copy link
Member Author

@agocke agocke Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question that I don't know the answer to. (yet)


`GetHashCode` would be implemented by calling `GetHashCode` on each of
the data members and these would be combined using a symmetric operation
(like `+`) to prevent ordering from affecting the results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear ot me why/how ordering could/would matter.

Since you can't compare against a different type... your orders will be the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could certainly save the hash code and compare before/after reordering members.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting because now you're providing an even stronger guarantee. That the hashcode is not only safe across reordering. But also safe across versioning of the compiler/language. I'm ok with such a guarantee. but we would really have to be specific about it.

Personally though, i don't like it. :)

I would much rather us get behind a weaker guarantee. That only within a session could you even be sure that hte hashcodes would be the same. I would even do what some systems do and introduce some random value so that each session produces new hashcodes. That way, if you actually tried to take a dependency on hashcode values, that you would break almost immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke The documentation of GetHashCode() strongly warns against doing that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Still, a commutative hashcode (+) is simpler, so I'm not sure why you would want a non-commutative one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, because collisions are far more likely. frankly, i think from a language perspective we should say nothing more than: if you have the same member values in teh same session, you'll get teh same hashcode. Then, from an impl perspective we should basically either call into HashCode.Combine (if it exists), or just emit the same GetHashCode code as we do for anonymous-types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: > is simpler

This isn't really a virtue (to me at least) for generated code. We already have fairly good hashcodes for tuples/anonymous-types. I would not want us to pick a poorer hashcode here (i.e. more likely to collide) and then regret that later on because customers are negatively impacted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. You've convinced me. We should use whatever hash code combine we think is fast and provides a good distribution. Hash.Combine is probably a decent starting point.

@agocke agocke changed the title Proposal: "data" classes for C# Proposal: "nominal records" for C# Aug 4, 2018
@agocke
Copy link
Member Author

agocke commented Aug 4, 2018

I've made some changes here and renamed my proposal to "nominal records" to differentiate from the existing records proposal, which I'll now refer to as "positional records." I have also changed to an "initializer struct" design for lowering, which I previously abandoned because I was worried it had too much metadata overhead. However, looking at it more, I think it's a bit cleaner and the metadata overhead is almost nothing in my tests.

This is not quite @HaloFour's proposal, because I still favor unspeakable initializers to allow compilers to enforce safety rules around required initialization.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 4, 2018

@agocke

This is not quite @HaloFour's proposal, because I still favor unspeakable initializers to allow compilers to enforce safety rules around required initialization.

In my opinion that makes them unnecessarily inaccessible from other languages or downlevel compilers.

public string Password => <>Backing_Password;
public string <>Backing_RememberMe = false;
public bool RememberMe => <>Backing_RememberMe;
public struct <>Initializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my personal preference would be that this just be a well known name that then supports this pattern 'by convention'. Like so many patterns in the language, we would just be looking for a general accessible/public API-shape, and if found, would light up this functionality.

So, in this case, having htis be called "Initializer" or "Builder" would be far preferred to some unnameable name like <>Initializer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You realize this is a ridiculously complicated pattern, right? You have to have a public builder. You have to manually move your initializers from the class to the builder (the compiler can't help with or verify this step). The fields must have the exact same names as the properties. What if the property has a setter? Is the pattern illegal now (allowing it would be a breaking change)? What message does the compiler produce?

Worse, now this whole system is a public API surface that can't be changed. What if we want to change the lowering strategy, as we just did in the course of this PR?

This isn't really a pattern, it's a lowering strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You realize this is a ridiculously complicated pattern, right?

Not to me :)

You have to have a public builder.

Not ridiculous. Not complicated (IMO).

You have to manually move your initializers from the class to the builder (the compiler can't help with or verify this step).

Neither ridiculous nor complicated :)

The fields must have the exact same names as the properties.

Neither ridiculous nor complicated :)

What if the property has a setter? Is the pattern illegal now (allowing it would be a breaking change)? What message does the compiler produce?

I'm happy to discuss all those potential concerns. But i would like to start with an open mind that this isn't defacto 'absurd' as you are making it out to be :)

It may be that there is enough cases to help drive the idea that this is better as something truly unnameable. But i would prefer that be the result of a good discussion with enough informign cases, rather than haivng that be the presumed starting point.

Effectively, none of the above seemed at all strange or onerous to have as a defacto pattern. Heck, it seems exceptionally tame compared to many other patterns we have in the language :) (Just my 2c on these).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't underestand what it buys you if it was a pattern? why you'd want to implement it manually? also I think @agocke's point about it being "complicated" is not the pattern itself, the compiler should be able to extract semantical information from that pattern which could change as the feature involve in the future, defining such pattern can be quite compilicated to be future-proof. on the other hand, as a "lowering strategy" it will come after all the semnatic analysis and also can change which has a greater value imo.

`readonly` semantics. Requiring initialization via constructor means that
field order becomes a public API and requires careful versioning, which is
not true of mutable `data` types and is a constraint that should be
irrelevant to `readonly` semantics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok... i've read this a few times... and i feel like i'm not getting it. Specifically, i don't see why it's necessary for the initializer name to be unspeakable.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 5, 2018

@CyrusNajmabadi

I think the argument for unspeakability is that then the compiler can enforce that some members must be initialized. If the initializer is simply based on a normal coding pattern then downlevel compilers can consume it directly and initialization of those required members can't be enforced.

Given that these protections are at best suggestions which can't be enforced particularly with languages like F# that can speak unspeakables and that adding a new required member is very intentionally not a breaking charge and existing consumers would continue to not initialize that member I don't think it's a point worth making the type unspeakable. The compiler enforcement could just as easily be accomplished through an analyzer on the builder member.

look like:

```C#
public class LoginResource : IEquatable<LoginResource>
Copy link
Member

@alrz alrz Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make IEquatable impl opt-in? e.g. only if spelled out in the source: data class C : IEquatable<C>

(or opt-out a la fsharp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce the amount of possibly unused code generated. the opt-in solution would be synonymous to derive or deriving in rust and haskell.

```C#
public data class LoginResource
{
public string Username { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it even make sense to make these non-public? Could it default to public?

Copy link
Member Author

@agocke agocke Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private fields and properties are perfectly fine. We could consider it, but right now I'm not wild about changing the default accessibility of members. That's a significant amount of extra complexity in the language, and peoples' style guidelines often depend on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also related, I think it's actually useful to force properties to come first in the class body, otherwise it doesnt look like a "data class" anymore. sure it might be basically a style preference but it worth to consider.

@agocke
Copy link
Member Author

agocke commented Aug 5, 2018

@HaloFour That's one argument, but I also want to leave this entire initialization phase as implementation defined to allow for codegen changes later.

@agocke
Copy link
Member Author

agocke commented Aug 5, 2018

@CyrusNajmabadi @HaloFour I think you guys are misremembering how patterns work in C#. Patterns are for allowing user-written types to substitute for well-known framework types. So instead of just Task being awaitable, anything that looks like Task is awaitable.

There's no such thing as a "compiler-generated" pattern. If the pattern components are public and speakable, that means that regular C# code can interface with them, which means that they must be real and actually exist in source. If you can type it, then you need to be able to go-to-def on it, so the source must be there and navigable. The compiler, never, ever, alters source trees or produces source code. This is just a non-starter.

@svick
Copy link
Contributor

svick commented Aug 5, 2018

@agocke

If you can type it, then you need to be able to go-to-def on it, so the source must be there and navigable.

What about properties of anonymous types or the Invoke method of a delegate type? You can type them, but you can't really go-to-def on them.

@alrz
Copy link
Member

alrz commented Aug 5, 2018

@svick I think it means that you can type them at the declaration site.

@agocke
Copy link
Member Author

agocke commented Aug 5, 2018

Well, the rule isn't exact: certainly the generated constructor of a class with no constructors is "real" and has a go-to-def target, but this isn't a pattern, per se. I actually have the same question for the "with" methods proposed in the records proposal. If you can call those methods and treat them as "real" methods, how do they appear to the compiler? Do we actually create fake method symbols to represent them? Where's their syntax?

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi @HaloFour I think you guys are misremembering how patterns work in C#. Patterns are for allowing user-written types to substitute for well-known framework types.

I'm just 'pattern' in a generic sense. i.e. just the general idea of "what the compiler would look for."

If what we lower to is something reasonable (which i feel like it totally could be0, i don't see any reason why i wouldn't want the compiler to be able to see such a thing in IL (which might have been produced by something else) and go: "ah, i know how to use that! i'll let you use things like Object-Initializers, but then have it compile down to that Class and it's Builder class automatically for you".

So instead of just Task being awaitable, anything that looks like Task is awaitable.

Yes. And that's effectively what i want. Instead of 'object initializers' only working with classes/structs with mutable fields/properties, i want anything that looks 'initializable' to be initializable. However, we would extend the 'initializable' pattern to recognize seeing a well known convention to enable this facility.

@CyrusNajmabadi
Copy link
Member

The compiler, never, ever, alters source trees or produces source code. This is just a non-starter.

I don't know why that needed to be mentioned :) There must be some confusion as i haven't ever expected the compiler to do anything of the sort :)

@CyrusNajmabadi
Copy link
Member

That's one argument, but I also want to leave this entire initialization phase as implementation defined to allow for codegen changes later.

I definitely get that. And i think that's a reasonable position to take. It's just one i disagree with.

I think we should spend hte cycles (even though they may not work out) to consider trying to make this not an implementation detail, but to strongly specify what the phase should look like.

If you then created a data class you would get that specific lowering output. However, conversely, if we saw that lowered 'pattern' (i'm not sure what a great term here is, so i'm going to keep using 'pattern'), we would say: "oh, this is a data class!".

That's all. :)

Note: i'm not saying: "we have to do this". Or "it's ludicrous and stupid to do this another way". I'm simply saying that we should examine and consider this and properly weigh out the pros/cons here before dismissing it out of hand right at the beginning. I personally think there is a lot of merit in defining a convention-based approach to mutable/immutable data. And i think if we're going to spend cycles looking into this space, we shouldn't be so quick to just immediately reject this idea.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

@agocke

I think you guys are misremembering how patterns work in C#. Patterns are for allowing user-written types to substitute for well-known framework types.

Are collection or indexer initializers not patterns? If not, what are they? I certainly don't want to get bogged down in an argument over what things are named. I care about how they behave. I'd rather a general purpose feature, one that enables me to then add said support to my own types or possibly even other types via extension. Immutable object initializers are an orthogonal concern with other existing requests/proposals requesting this capability. Having this proposal come up with a method that both can't be used elsewhere in the language and also can't be used well from other languages or downlevel compilers seems like a real shame.

@CyrusNajmabadi
Copy link
Member

I think i feel very similarly to @HaloFour. It seems like a shame to go down a path that seems less capable and less enabling, when (my hope is) that there is a very cheap alternative that surfaces the same user experience, while also not preventing these other interesting scenarios.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

One question, which I don't think is explicitly addressed in the proposal: is the addition of a new required readonly property to a nominal record considered a breaking change?

@agocke
Copy link
Member Author

agocke commented Aug 6, 2018

@HaloFour It would produce a new warning at the minimum, so yes.

@gulshan
Copy link

gulshan commented Aug 6, 2018

Will positional and nominal records be mutually exclusive?

@agocke
Copy link
Member Author

agocke commented Aug 6, 2018

@gulshan I don't think they have to be, no. But the intersection of the two could be very different than either alone.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

@agocke

It would produce a new warning at the minimum, so yes.

Ok. Just mentally reconciling the gymnastics around initialization with the end result of being able to add new members without breaking existing consumers, at least until they're recompiled. This feature seems closely related to records in F# which take a fairly different approach regarding initialization and consider the addition of members to be a hard breaking change which requires recompilation.

@HaloFour
Copy link
Contributor

I know that I've beaten this poor horse to death but I wanted to note the contention that language-specific unspeakable encoding of features is causing in CoreFX to the extent that they are recommending that such features not be used in public APIs:

https://github.com/dotnet/corefx/issues/33553

@agocke
Copy link
Member Author

agocke commented Nov 19, 2018

@HaloFour You're not wrong. In fact, you've convinced me. The nail in the coffin for me is usage in older versions of the C# compiler. I don't want to completely lock out old compiler versions, so I think the builder should be speakable, but marked with [EditorBrowsable(false)] so it doesn't pollute IDE suggestions.

@agocke
Copy link
Member Author

agocke commented Nov 19, 2018

I didn't mention it in the notes, but in the LDM discussion on this I actually proposed making the builder speakable.

@agocke
Copy link
Member Author

agocke commented Jul 27, 2019

I'm gonna close this out because I have a new proposal that I'll add soon that doesn't require builders or marking anything unspeakable

@agocke agocke closed this Jul 27, 2019
This was referenced Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.