Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Only Allow Lexical Keywords in the Language #4460

Open
1 of 4 tasks
jmarolf opened this issue Feb 24, 2021 · 138 comments
Open
1 of 4 tasks

[Proposal]: Only Allow Lexical Keywords in the Language #4460

jmarolf opened this issue Feb 24, 2021 · 138 comments

Comments

@jmarolf
Copy link

jmarolf commented Feb 24, 2021

Only Allow Lexical Keywords in the Language

  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Summary

Today there are keywords in the language that cannot be understood with just lexical information such as var or nameof. These keywords operate this way for backwards compatibility reasons. I propose that we change the language so that there exist no keywords that cannot be determined via lexical analysis.

Motivation

In general, C# strives to be a language that is explicit about program behavior and normally requires the developer to write out what their intention is without ambiguities. This makes it a language that is easy for someone to read and understand. Once you learn it there is little implicit behavior to consider. I believe having this "gotcha" where keywords are only keywords if nothing else in scope is so-named makes the language harder to read and reason about in general.

There is also the reality that the design goals around C# Language versions and .NET Framework versions have changed. In the past it was paramount that developers could take a new language version update without updating the framework version they were targeting. With language features being increasingly tied to the runtime this makes less sense. We now strongly encourage developers to update both the language version and target framework together.

Detailed design

There is an existing concept in the language called "contextual keywords". I am not proposing doing away with this concept altogether just changing it so that a keyword's "contextual-ness" is always able to be determined lexically. Take new (at the time of this writing) keyword record. We can still know if we are referring to the record keyword or some identifier named record based on the lexical context, there is no ambiguity. However var, according to the spec, requires us to check if there are types named var in scope:

spec

In the context of a local variable declaration, the identifier var acts as a contextual keyword. When the local_variable_type is specified as var and no type named var is in scope,

Similarly, nameof requires checking if a there are any identifiers called nameof in scope

spec

Because nameof is not a reserved keyword, a nameof expression is always syntactically ambiguous with an invocation of the simple name nameof. For compatibility reasons, if a name lookup of the name nameof succeeds, the expression is treated as an invocation_expression -- regardless of whether the invocation is legal. Otherwise it is a nameof_expression.

The implementation of this proposal would remove wording from the spec around name lookup collisions, and have a compliant compiler be able to fully determine keywords given only parsing information.

The following keywords would now error if developers attempted to use them as anything other than a keyword

  • var
  • nameof
  • dynamic
  • _

Drawbacks

This is a breaking change, if anyone were relying on this behavior in their code it would no longer compile. For cases where a type is named var, dynamic, or _ or a method is called nameof the developer would need to change the usages to @var, @nameof, @dynamic, or @_.

Alternatives

We could opt to keep _ as a contextual keyword that depends on name lookup rules as this is the change most likely to break real-world programs (see discussion on #1064)

Unresolved questions

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-28.md#ungrouped
https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-09-06.md#only-allow-lexical-keywords-in-the-language

@svick
Copy link
Contributor

svick commented Feb 24, 2021

Previous discussion: #4458.

@svick
Copy link
Contributor

svick commented Feb 24, 2021

I feel that the breaking changes that would be introduced by adopting this proposal on its own fall into three categories:

  1. "Nobody is broken."
    Almost nobody uses var, dynamic or nameof as identifiers in C#, it's fine to break the tiny number of people who do.
  2. "Some people are broken."
    The pattern where _ is the name of a used lambda parameter (e.g. _ => _.Name) is fairly rare, but not unheard of in C#. I think it's probably acceptable to break this kind of code, but ways of softening the blow should be seriously considered (e.g. a code fix to rename such parameters; or warning in C# 10 and only making it an error in C# 11).
  3. "Lots of people are broken."
    The pattern where _ is the name of an unused lambda parameter (e.g. _ => {}) is very common in C# and it is completely unacceptable to break such code. The obvious solution would be to change the meaning of that code to make the _ a discard. But I think it's important to note that this additional change would be required, assuming this proposal is not meant to be a massive breaking change.

@Youssef1313
Copy link
Member

  • Will this break analyzers/codefixes?

@jmarolf
Copy link
Author

jmarolf commented Feb 24, 2021

@svick
looking at github we have:

  • ~17K uses of _ =>
  • ~1K uses of var _ = or using var _ =

While I am only proposing removing name lookup and var _ = GetResults(); is not lexically ambiguous with _ = GetResults(); there could potentially be odd errors with _ in this case. I am willing to say we keep the name lookup rules for _ if there are concerns.

@jmarolf
Copy link
Author

jmarolf commented Feb 24, 2021

Will this break analyzers/codefixes?

that is entirely dependent on the compiler implementation, but new language versions are always allowed to break analyzers.

Consider when expression-bodied-members were added. If you previously assumed (not unreasonably) in your analyzer that all methods with a body contained a block syntax you were broken since now the body could just be an arrow expression.

@HaloFour
Copy link
Contributor

See: #4466

Conflating the two was always going to create a lot of confusion, and the parser that get's broken/confused the most is the human parser.

@jnm2
Copy link
Contributor

jnm2 commented Feb 28, 2021

I've switched my position on #1064 (disallowing _ as an identifier) from downvote to upvote. Sure, it would break the past years of my code in which I used to use this for lambda parameters, but ship a solution-wide light bulb fix for it and I'd want to use that light bulb fix anyway to replace my usages of _, even if I wasn't forced to. Opting into a new major versions of C# feels like an expected time for something like this to happen.

Maybe soft-deprecating by adding a new compiler warning in C# 10 that tells you that _ as an identifier will be disallowed starting in C# 11 would make this seem less abrupt.

I agree with @HaloFour. I think the human parser should be the most important factor. While unlikely, the possibility of code like this should make us uneasy:

public class Foo
{
    private int _;

    // Doing something important in some other file that is affected by reading Foo.WrappedValue?
    public int WrappedValue => _;

    public bool IsNumber(string input)
    {
        return double.TryParse(input, out _); // oops!
    }
}

(stolen from http://gafter.blogspot.com/2017/06/making-new-language-features-stand-out.html?showComment=1509474504510#c7458806139970524286)

@chrizpro
Copy link

chrizpro commented Apr 2, 2021

I hope this proposal goes nowhere. I like the underscore usage because it makes the code less boring. As a writer, I like to use dashes, colons, semicolons, and etc to make my writing more interesting just like an _ makes the code more interesting, though it's rarely used. I want C# to be an intermediate-level language and developers having trouble with var and etc, they should look at Lua or etc.

@HaloFour
Copy link
Contributor

HaloFour commented Apr 2, 2021

Java, which is used by a larger (and some would argue more resistant to change) community seemingly has had little/no issue with the language deprecating and then disallowing the use of _ as an identifier or var as a type name. They've done this in the past as well with names like assert. Usually it doesn't matter as the contextual keyword wouldn't be expected to be used as a type name.

I've been pretty outspoken against the use of _ as a discard as well as an identifier. Since the ship has sailed on discards I think its use as an identifier should be reconsidered. It sounds good on paper to avoid reinterpreting/breaking any existing code, but now the language has this wart where developers need to remember which combination of features will cause the compiler to prefer _ as an identifier vs. where the compiler will always consider it a discard, and where _ is preferred to be an identifier the developer has to wade past the type checks on these "variables" that the developer never intended to actually use. The case of accidentally overwriting some field name might be pathological, but the mental burden on the developer will still always be there. I would've much preferred if the compiler phased out _ as an identifier over a few releases, with fixers to replace it with another identifier, and then switch it to a discard wholesale. Names are cheap. Contextual keywords that change their meaning based on nuanced use of other language features and where the contexts are very likely to collide are not.

@jmarolf
Copy link
Author

jmarolf commented Apr 2, 2021

My views are basically the same as @HaloFour 's. I think semantically contextual keywords make a lot of sense in theory but add developer complexity and overhead for little benefit.

@CyrusNajmabadi
Copy link
Member

for little benefit.

I think this is very debatable. Consider the work we're doing to support field inside properties in C# 10. If we make these keywords and not contextual keywords, we simply break people (including ourselves). And we break them despite them having done nothing wrong. For example, it would break code that is totally normally and reasonable and not at all deviating from teh norms of the ecosystem at all. I don't like the idea that someone coudl follow every best practice we gave, and then end up breaking just for expediency on our part. In most (all?) cases, supporting semantic contextual keywords is not difficult. Indeed, it's one of the simpler things to support. You simply bind as normal and accept the prior meaning if it is valid. If it isn't, then you allow the new meaning. This means we can gently add new things to the language and not have to worry at all about breaking people.

@jmarolf
Copy link
Author

jmarolf commented Apr 2, 2021

Consider the work we're doing to support field inside properties in C# 10.

For clarification: this proposal explicitly states that cases like this should keep working as they do, contextual keywords will always exist. Just because properties use the value contextual keyword does not mean that we should now force that to be a keyword at all times. This proposal is about ensuring that contextual keywords can always be determined based solely on lexical information as opposed to semantic information.

In most (all?) cases, supporting semantic contextual keywords is not difficult. Indeed, it's one of the simpler things to support. You simply bind as normal and accept the prior meaning if it is valid. If it isn't, then you allow the new meaning. This means we can gently add new things to the language and not have to worry at all about breaking people.

I totally agree that there is not engineering reason to change this. It just works for the compiler folks (as far as I am aware). But I think it add an unnecessary burden on programmers using the language. Things like var and nameof feel very unfortunate to me. Anyone following best practices in C# does not expect var to be unavailable to them or for nameof to have different semantics based on exoteric name lookup rules. It feels like a real "gotcha" moment where I can go on twitter and "well actually" anyone that uses code with these semantic contextual keywords and say "Oh you are actually not discarding that but assigning a value to a variable named _".

Every other language I've encounterd (C++, Jave, Typescript, Pytho, Go) does not use name lookup rules to determine whether something is a keyword (including F# and Visual Basic) and there have been no complaints. I personally feel that all this concern over keyword breakage has no real evidence, its all theoretical. Java can just add a new keyword if they need to and no one complains.

@CyrusNajmabadi
Copy link
Member

This proposal is about ensuring that contextual keywords can always be determined based solely on lexical information as opposed to semantic information.

Right. but the problem with that is that it direclty goes against design goals we have for these features. for example, we want you to just be able to say field. There's nothign lexical/syntactic to distinguish that this is special. It's just going to reference the auto-prop field if nothing else binds.

ut I think it add an unnecessary burden on programmers using the language

I don't really see this as a burden. For people just using the language, using var is going to work. So what needs to be fixed? Same with nameof, etc. People using our APIs could certainly be better served here with better APIs. but that would be a roslyn concern.

and there have been no complaints.

This is not true. TAke 'go' for example. There are lots of complaints about the verbosity of the language. And part of htat verbosity arises because the language doesn't want to get into this space. So it ensures all it's constructs are extremely verbose and often unweildy, just so it doesn't have to do any semantic checks on this sort of thing. It's a tradeoff they made, but which we're quite loathe to as it really just bulks up the language.

@jmarolf
Copy link
Author

jmarolf commented Apr 2, 2021

Right. but the problem with that is that it direclty goes against design goals we have for these features. for example, we want you to just be able to say field. There's nothign lexical/syntactic to distinguish that this is special. It's just going to reference the auto-prop field if nothing else binds.

I would need to review the proposal but isn't this going to work exactly like value? You can just say that field is reserved now and you use @field if you need to "escape" the fact that this is a keyword now. I think this is an important distinction to the reader. You now have to explicitly state what your intent is. You are essentially saying "a casual reading of this might lead to believe this is the field keyword, which has specific semantics but that is not what is happening here, this is a custom instance and @field clues you into what is happening." If we were to do it all over again would we have everything be a contextual keyword? I dunno I suppose I could see the argument, why put roadblocks in folks way. My position is that it's a weird language corner case that most C# developers are not aware of and is surprising to them when they learn about it.

If there is a design goal that can only be achieve with name lookup rules or everyone else in the LDM just disagrees and thinks that semantic contextual keywords are awesome and we wish we did them more often great! Thats not my position but I am willing to be convinced.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Apr 2, 2021

I would need to review the proposal but isn't this going to work exactly like value?

No. 'value' always binds to the property parameter prior to anything else in a higher scope. field will not (As that would break existing, perfectly fine code).

You can just say that field is reserved now and you use @field if you need to "escape" the fact that this is a keyword now

That would break lots of code taht is totally fine today and which wasn't doing anything strange or inappropriate. I do not see how customers are helped by just changing the meaning of their code on them.

You are essentially saying "a casual reading of this

We are not, and should be beholder to 'a casual reading of this'.

If you see this:

local = 0;

What does a casual reading tell you? Almost nothing. This could be a local, or a field, or a property, or a parameter. it could be assigned. it could be assigned by-ref. it could have conversions. it could throw. etc. etc. etc.

And that's just assignemnt. Once you get the . operator, all bets are 100% off :)

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Apr 2, 2021

My position is that it's a weird language corner case that most C# developers are not aware of and is surprising to them when they learn about it.

Weird corner cases are always like that. But we have tons of those everywhere. The question is: is getting rid of weird corners better or worse than breaking code? The position we've landed on generally comes down to:

  1. is the code that is breaking reasonable? or is it unreasonable?
  2. is it widespread, or likely not used at all?

If it's unreasonable (which often comes down to debate) we are more likely to take the stance: trying to prop up this code is not worth it, so we would prefer to change it and accept that pathological cases break.

Similarly, if something is widespread, then we've already opened the barn door. People clearly are using the language in this fashion in a significant fashion, and I think we have to accept that.

Where we have room to play around with is when you get into teh 'unreasonable, and not used (or very very rarely used)' territory. This is like someone coming along now and saying: yeah, i'm going to name my type var even though .net naming conventions (both formal and informal) from day 1 have been that types are PascalCased. This is both unreasonable IMO for somoene to do this, and likely extraordinarily niche. (Indeed, my expectation is that this only exists in projects that seek to subvert the language/compiler, in which case i don't think of that as a reasonable thing to cater to).

--

So, in the case of some keywords (var, record, etc.) i'm actually ok with us taking over and saying: yeah, at this point this is ours. Reasonable codebases won't have any pain at all moving to this.

However, for some keywords, i'm not ok with us doing this. If the pattern is either reasonable, or widespread, we need to accept that and not harm users when we have a perfectly suitable way to both introduce the feature and keep things working just fine.

@jmarolf
Copy link
Author

jmarolf commented Apr 2, 2021

So, in the case of some keywords (var, record, etc.) i'm actually ok with us taking over and saying: yeah, at thsi point this is ours. Reasonable codebases won't have any pain at all moving to this.

However, for some keywords, i'm not ok with us doing this. If the pattern is eitehr reasonable, or widespread, we need to accept that and not harm users when we have a perfectly suitable way to both introduce the feature and keep things working just fine.

I think this is a totally reasonable stance to take. var feel pretty uncontroversial (to me) but other keywords feel much further along in the spectrum of causing unreasonable harm. If the LDM says "var should just be a keyword but these others I think should stay as they are" I would be totally fine with that. I Just want us to take the time to re-evaluate this and make sure we still feel the same way.

In the past there were more situations where a newer version of C# could be "pushed" on you. Today it's an explicit decision to update your SDK version to get an updated version of C#. Major SDK versions also have major breaking changes (api names changes etc.) to the point that developers expect some friction. I think its not unreasonable to have folks change field to @field in these upgrade situations but I will admit I am taking a stance that is way over to one side on how ok I am with breaks. Others do not need to join me over here.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Apr 2, 2021

If we scope this to the language reserves the space of lowercase ascii identifiers for **type** contexts, then i'm totally ok with that :)

That would address, var, unmanaged, notnull, dynamic and possibly some others that i'm not remembering.

@jaredpar
Copy link
Member

jaredpar commented Apr 4, 2021

I like the scoping here but I would also like to consider the case of discards. As the feature is written today it's hard to use discards broadly in a method and instead is most useful in a limited set of circumstances. In too many cases it subtly turns into an identifier, not a discard, and suddenly that invalidates other uses within the method body and suddenly you have to drop back to ignored names.

@CyrusNajmabadi
Copy link
Member

I'd def like to break out an issue on discards. I;m curious about hte cases that are hard here and where it's difficult to mesh the idea of:

  1. use existing semantics if the code is legal
  2. reinterpret as discard if not

I think discards are also a space we could potentially experiment with a .net upgrade style approach where we unilaterally reinterpretted this stuff, but had tools fix the issue if you used these as non-discards in your project.

@jaredpar
Copy link
Member

jaredpar commented Apr 4, 2021

I;m curious about hte cases that are hard here

Converting between lambdas and local functions. Parameters in lambdas can be discards but not in local functions. That means when swapping between the two it introduces unnecessary friction because you have to rationalize discard behavior. It's no longer what essentially amounts to a syntax transform.

Whether a _ is a discard or identifier in a lambda comes down to the count of parameters that you have. A single parameter means it's an identifier but multiple mean it's a discard.

// _ is an identifier 
Action<int> action = (_) => { 
    _ = ""; // Error cause _ is an int identifier 
};

// _ is a discard
Action<int, int> action2 = (_, _) => { 
    _ = ""; // Okay cause this is a discard 
};

This is generally frustrating to have to remember but really gets frustrating when you consider it in the context of refactoring or code changes. Consider that lambda parameters are often listed as discards because they're a callback value that you may not need. Circumstances change and it's rational to begin using a parameter which begins by assigning it a name. If assigning that parameter a name though means there is only one _ remaining then it becomes an identifier and suddenly all the other _ inside the method body are now interpretted as identifiers which can cause compilation errors.

string token = ...;
Action<string, string> = (_, value) => {
     // Error: This worked before I changed the second parameter to have a name
     if (int.TryParse(token, out _) { 
        ...    
}

This though means there is a huge incentive to prefer out var _ over out _ even when _ currently points to a discard. The out var _ form is one of the few places where _ unambiguously refers to a discard. Yet even though _ is more succinct developers should consider always using the out var _ form, even though it's longer and doesn't actually declare a variable, because it's more future proof to cases where _ gets bound as a discard.

These together all make it frustrating to use discards. It's too easy to get trapped in a case where _ suddenly binds to an identifier and that will invalidate many other cases in the method where you depended on having discards available and there is little recourse for the developer when that happens.

@CyrusNajmabadi
Copy link
Member

Whether a _ is a discard or identifier in a lambda comes down to the count of parameters that you have. A single parameter means it's an identifier but multiple mean it's a discard.

Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?

These together all make it frustrating to use discards. It's too easy to get trapped in a case where _ suddenly binds to an identifier and that will invalidate many other cases in the method where you depended on having discards available and there is little recourse for the developer when that happens.

I have a supposition we can fix that, without having to go whole-hog into: all _ are always discards.

The open question for me is if there are cases where code would be legal under either interpretation (identifier or discard), and you want the latter, and interpretting as the former would lead to undesirable behavior. If that exists, then this approach would likely not be viable. However, my hunch is that this would allow for:

  1. existing code to continue to compile with its existing meaning.
  2. Code that is currently in error will now compile, with a meaning that is sensible.
  3. Code that could potentially have both meanings (and this will retain the 'identifier' interpretation) will behave in a desirable way.

@HaloFour
Copy link
Contributor

HaloFour commented Apr 5, 2021

@CyrusNajmabadi

Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?

I like where this is going but it sounds like there could be a lot of potentially tricky edge cases, especially if the code seems to intentionally mix discards and _ as an identifier:

if (int.TryParse(s, out _)) {
    // ...
}
// later ...
var bar = foo.Select(_ => _.Bar)

@jaredpar
Copy link
Member

jaredpar commented Apr 5, 2021

Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?

Can't do that because _ is a legal identifier. As @HaloFour pointed out it's just fine to use it via _.ToString(), etc ... You can't even take shortcuts like saying "okay, if _ is only used for assignment or out then make it a discard" because assignments to a _ can have side effects (implicit conversion tricks).

This is the core problem we're facing. The decisions of C# 1.0 are essentially limiting our ability to make _ a friction free feature. Unless we take some sort of conditional break here then we're essentially stuck with those decisions.

@CyrusNajmabadi
Copy link
Member

Can't do that because _ is a legal identifier.

I'm not taking about the cases where is has legal, error free, semantics.

I'm talking about the cases where it has illegal semantics. For example, where it would cause a scope collision.

This code would be illegal today, and so we can come up with rules to make it legal by saying: ah, ready these all as discards now.

@jaredpar
Copy link
Member

jaredpar commented Apr 5, 2021

I'm unsure what you're asking for at this point.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Oct 19, 2021

but for new keywords to be introduced to the language without ever breaking existing code that may have (unknowingly) used words that we now want to use as keywords, maybe I had not made that clear.

I understood that. The problems with that have already been discussed.

@CyrusNajmabadi
Copy link
Member

C# recently introduced record and that is a contextual keyword - you were able to add it despite the fact that some code somewhere might have used record as a name,

We were not. We made a breaking change to support this.

and this is a great ability I am simply arguing that all such terms be contextual ideally.

This is not ideal, for all the reasons i outlined above. Contextual keywords are a mechanism that can allow introduction, but come with drawbacks (again, listed numerous times above).

@CyrusNajmabadi
Copy link
Member

If "more things were keywords" then the legal example above would never be allowed -

Correct, and that's a good thing. That's the premise of this entire issue. Removing syntactically or semantically contextual keywords to not have users have to understand that concept, and to make it so that they can't run into these cases.

@CyrusNajmabadi
Copy link
Member

Seems to me it would make it easier to address the backcompat issue

Backcompat means that your code runs without needing changes. If we could require changes (including making them for you), then there is no compat concern at all and the entire space becomes moot. However, we cannot depend on that, so we have to make the choice around either not doing the change, or doing it and allowing some cases to break (the latter of which we do, but only when we feel it is so worth it, and the pain is hopefully sufficiently small to the ecosystem).

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Oct 19, 2021

yet how could it if it were using keywords invalidly?

Contextual keywords are ones that are legal identifiers. So it's totally possible to be using them with their identifier meaning today in legal code. The fixer would prefix them with @, allowing the code to keep it's meaning, while then allowing the language to then take over those identifiers as keywords in the future.

We are likely loath to do this though as, from teh list given above, it's highly likely people are using these for existing identifiers, and thus the chance of compat pain, and extra work becomes high and likely to very unpleasant.

I don't know of such a thing and it seems to raise the problem that the code would have to already compile cleanly

This is not the case. fixers do not require the code to compile cleanly. indeed, we would implement this like so:

  1. make these words into keywords from the language level.
  2. update the parser to attempt to interpret the code with the keyword meaning. If correct, accept that.
  3. If incorrect, re-examine to see if it would have been ok with prior meaning (which we already have the existing code for). If so, issue error that this is not legal, but state that thsi can be fixed by adding @.

Put another way, internally, these would still be contextual keywords. We would just error if you used them outside of their context. These would have a clear diagnostic allowing a fixer to run to solve this.

--

However, as per above posts, the issue is not about converting code to move them off of using these words as identifiers. If we could forcefully change code to address compat concerns, we'd never have compat concerns.

@CyrusNajmabadi
Copy link
Member

but for new keywords to be introduced to the language without ever breaking existing code that may have (unknowingly) used words that we now want to use as keywords,

@Korporal we already have this capability. That's exactly the primary reason that contextual keywords exist. It's literally why we created them and have used them in all the versions of C# that have added new syntactic constructs.

However, as has been covered a few times already, this is not without drawbacks. That is the reason this topic is being discussed.

@CyrusNajmabadi
Copy link
Member

yet you seem to be missing the point, it is not a desire to allow people to use keywords for identifiers but for new keywords to be introduced to the language

I'm not missing the point. You've talked about two separate things. In the words quoted here you're talking about new keywords being introduced. But in your immediately prior post you wrote:

I'm curious as to what a grammar would, could look like that strived to be as close as possible to the current C# syntax yet was changed just enough to eliminate reserved words. This is something that interests me from an academic point of view.

You literally asked about eliminating reserved words, so i was discussing eliminating reserved words. if you want to discuss adding keywords, that's also fine. But these are two different topics, and my responses about one of your questions should not be dismissed because you also want to discuss something else.

@Korporal
Copy link

Korporal commented Oct 19, 2021

yet you seem to be missing the point, it is not a desire to allow people to use keywords for identifiers but for new keywords to be introduced to the language

I'm not missing the point. You've talked about two separate things. In the words quoted here you're talking about new keywords being introduced. But in your immediately prior post you wrote:

I'm curious as to what a grammar would, could look like that strived to be as close as possible to the current C# syntax yet was changed just enough to eliminate reserved words. This is something that interests me from an academic point of view.

You literally asked about eliminating reserved words, so i was discussing eliminating reserved words. if you want to discuss adding keywords, that's also fine. But these are two different topics, and my responses about one of your questions should not be dismissed because you also want to discuss something else.

By eliminating reserved words Cyrus I meant eliminating them as a grammatical restriction in a language by changing the grammar, and the motive for that is to then enable new keywords to be added to a language without ever breaking existing code because adding them can no longer lead to conflicts because reserved words are a grammatical impossibility.

A language that has no reserved words can obviously have new keywords added to it over time ad infinitum and never fail to compile existing code that might have unwittingly used those new keywords as identifiers.

I use the terms "reserved words" and "keywords" as I defined them earlier.

If I failed to make that clear then I apologize.

@CyrusNajmabadi
Copy link
Member

By eliminating reserved words Cyrus I meant eliminating them as a grammatical concept in a language by changing the grammar

Yes. That was clear. And I covered both how that could be done for c#, or for a new language. However, I also included the information on what makes this unpalatable (both for the language design, and for the user experience). It's a net negative for both.

never fail to compile

This is not the problem we are trying to solve. We already have effective language tools for this. The problems are things like the user experience.

I use the terms "reserved words" and "keywords" as I defined them earlier.

Can you please use the terms as defined in c#? It is very confusing to use the same terminology to mean something different, esp. in the context of a discussion which overlaps C#.

If I failed to make that clear then I apologize.

Nothing to apologize for.

That said,I don't really see any direction for you conversation to go in. The ideas have been explored here, and the pros and cons laid out. I've also explained why our calculus for c# (or most general purpose, user facing languages) would preclude these designs. So I'm not sure what else there is to discuss on this tangent.

@jmarolf
Copy link
Author

jmarolf commented Oct 19, 2021

So are contextual keywords viewed favorably or not?

@Korporal from a language grammar perspective I have no opinion. They are certainly useful to allow for language evolution without breaking existing code.

This proposal attempts to argue that the way contextual keywords were applied to a specific language (C#) make it more complex to use. When language concepts such as nameof can also be a method invocation it becomes another "gotcha" for new (and even experienced) programmers. However, even this proposal is not arguing that all contextual keywords be removed. Only those that require naming lookup rules. Contextual keywords that are never syntactically ambiguous (for example record) are fine to stay the way they are.

Related to this topic is the discussion of warning users about the user of all-lower-case type names:

@sab39
Copy link

sab39 commented Oct 20, 2021

Backcompat means that your code runs without needing changes. If we could require changes (including making them for you), then there is no compat concern at all and the entire space becomes moot. However, we cannot depend on that, so we have to make the choice around either not doing the change, or doing it and allowing some cases to break (the latter of which we do, but only when we feel it is so worth it, and the pain is hopefully sufficiently small to the ecosystem).

I agree that it doesn't eliminate the problem entirely, but as you've said, it's all about tradeoffs and weighing costs vs benefits. It seems to me very analogous to the proposal to warn on lower case type names (which presumably also will come with a VS code fix that offers to correct the casing or add '@' as needed): neither of them eliminate the problem entirely, because existing code must continue to work correctly. But making it easier to identify problematic cases, and giving developers the tools to identify and opt-in to fixing them (without forcing them to, or breaking anything if they don't), can at least give a better sense of the extent of breakage, and potentially shift the balance when weighing the costs vs benefits at some later date.

As has also been pointed out in this discussion, even the "never break existing code" rule wasn't considered sacrosanct and (pun intended) binding with regard to record, and the insights from that experience might affect how these tradeoffs are weighed against each other in future.

@Korporal
Copy link

Korporal commented Oct 20, 2021

So are contextual keywords viewed favorably or not?

@Korporal from a language grammar perspective I have no opinion. They are certainly useful to allow for language evolution without breaking existing code.

This proposal attempts to argue that the way contextual keywords were applied to a specific language (C#) make it more complex to use. When language concepts such as nameof can also be a method invocation it becomes another "gotcha" for new (and even experienced) programmers. However, even this proposal is not arguing that all contextual keywords be removed. Only those that require naming lookup rules. Contextual keywords that are never syntactically ambiguous (for example record) are fine to stay the way they are.

Related to this topic is the discussion of warning users about the user of all-lower-case type names:

You say

"Contextual keywords that are never syntactically ambiguous" but surely such situations would never compile, if the compiler and grammar can't resolve an apparent ambiguity then its a syntax error, no?

Also is there any evidence behind the statement "the way contextual keywords were applied to a specific language (C#) make it more complex to use"?

I've personally never found this aspect of C# even a slight problem, if there was a real risk that a syntactically valid program might execute in a very unexpected way because the author was misled by a contextual keyword then I'd certainly be interested to learn more of these cases.

I suppose this is a candidate though for the kind of thing you're speaking of:

    public class record
    {


    }

To a novice with C# - is that a record declaration or a class declaration?...

@CyrusNajmabadi
Copy link
Member

but surely such situations would never compile, if the compiler and grammar can't resolve an apparent ambiguity then its a syntax error, no?

There are no syntactic ambiguities with any contextual keywords (except 'record', though that was done intentionally).

@CyrusNajmabadi
Copy link
Member

I'd certainly be interested to learn more of these cases.

I covered this above. It would certainly be confusing, for example, if if( might now be an invocation, etc. Etc.

@CyrusNajmabadi
Copy link
Member

To a novice with C#

Yes. This is one of the reasons we've approved making a warning for this. So we can tell users: this likely is a bad idea, and is likely very confusing.

And this is just one case. If there were no keywords, then you get this with everything.

@Korporal
Copy link

Korporal commented Oct 20, 2021

And this is just one case. If there were no keywords, then you get this with everything.

That's simply not true, if the grammar insists that every identifier must precede its attributes then there's no confusion whatsoever. The current grammar mixes the ordering of attributes and identifier and they way these are allowed to be mixed varies from one language construct to another.

public class record is accepted by the grammar but class public record is not.

If the name was always and had to always be, the first part of such definitions then a) there's no more confusion about what is its name and b) the ordering of the various attributes becomes irrelevant.

With that said any developer can see that these are all logically identical:

Mydata class public sealed
{

}

Mydata public sealed class
{

}

Mydata sealed public class
{

}

Because the name must be the first token nobody can ever be confused about what is its name and the attribute ordering is utterly irrelevant in these examples.

This also completely eliminates the current possibility for confusion with a class named record and record (that cannot be) named class.

I'm not suggesting this as a literal grammar either, just trying to show you how this could in principle be accomplished and to answer your post.

@CyrusNajmabadi
Copy link
Member

if the grammar insists that every identifier must precede its attributes then there's no confusion whatsoever.

we keep going in circles @Korporal . This was already discussed heavily above. That sort of anachronism is too restrictive and would not be acceptable to any language we want to invest in. Indeed, we moved away from that in VB because it was simply too unpleasant and unwelcome by the ecosystem.

Furthermore, you've continually asked if it would be possible to do this for the grammar of c#. For example: "I'm curious as to what a grammar would, could look like that strived to be as close as possible to the current C# syntax "

It cannot be both close to the current C# syntax, and yet not support even basic constructs like this. At that point it would not only not be c#, but it would be such a departure as to not even feel slightly close to c#. Effectively, every syntactic construct would need a prefix form. And any place where an identifier could start would have to ensure that any following token both did not collide with any existing grammar construct today (already very not true), and for all future grammar productions we'd want in the future (very very limiting).

As such, such an approach would not be viable or desirable. Furthermore, as stated above, there are practically no benefits to this approach. It does not solve any issues that we're having problems with. However, it introduces substantial issues both in the lang design space and the user-experience spae that make it a complete deal-breaker for C#, C#-prime, or any potential future lang.

@CyrusNajmabadi
Copy link
Member

With that said any developer can see that these are all logically identical:

This is not a virtue. This is a degradation in user experience. You are proposing a 'solution' to a part of language design that we not only consider non-problematic, but we consider a virtue. We don't have an issue introducing syntactic keywords. We have a solution for it. We also have a mechanism to allow users to use anything as an identifier if we've squatted on it. Allowing more keywords to be usable as identifiers in differnet positions is not on a non-goal, it's an anti-goal. It's something we do not want. As stated, it would be possible to have hte language do this. But we intentionally do not want it as it's a net negative for usability.

@CyrusNajmabadi
Copy link
Member

At this point, things seem to be going entirely in circles, and the tangent from teh core issue is enormous. Please open a discussion if you would like to discuss this further, or please take this to somethign like gitter.im/dotnet/csharplang. There's nothing really to be gained repeating the same points over and over here (esp. as they are on a topic that is unrelated to what this issue is tracking).

@Korporal

This comment has been minimized.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Oct 20, 2021

@Korporal, as specified above, please open a new discussion, or take thsi tangent to gitter and/or discord. Repeating the same points cyclicly isn't valuable (Esp. as this is a tangent on an idea that doesn't match this issue).

This was my mistake for allowing this tangent to happen. I thought this was a github discussion, not a github issue. Free form exploration of ideas like this are what discussions are for. For direct issues like this, we want to keep things on topic to the issue itself and only what is relevant to that.

@Korporal

This comment has been minimized.

@CyrusNajmabadi
Copy link
Member

@Korporal, as specified above, please open a new discussion, or take this tangent to gitter and/or discord. As i said before i made a mistake here, believing this to be a discussion already and not an issue. That was my fault and I apologize for it. However, from this point on, any further conversation on this tangent is not appropriate for this issue.

@jmarolf
Copy link
Author

jmarolf commented Oct 20, 2021

Why not stop responding to me altogether,

lol! That is not the @CyrusNajmabadi way :)

"Contextual keywords that are never syntactically ambiguous" but surely such situations would never compile, if the compiler and grammar can't resolve an apparent ambiguity then its a syntax error, no?

@Korporal Sorry I was wrong here. I should not have said "syntactically ambiguous". As you say, it will parse to something. The question is whether that something is useful.

The crux of this issue is that this is valid code in C# today:

class await
{
    async async async(async async) => await async[async][async][async][async];
}
full code sample
using System;
using System.Runtime.CompilerServices;

[AsyncMethodBuilder(typeof(builder))]
class async {
    public async this[async async] { get => null; }
}

class await
{
    async async async(async async) => await async[async][async][async][async];
}

static class util { 
   public static awaiter GetAwaiter(this await a) => throw null;
   public static awaiter GetAwaiter(this async a) => throw null;
}

class awaiter : INotifyCompletion {
    public bool IsCompleted => true;
    public void GetResult() { }
    public void OnCompleted(Action continuation) { }
}

class builder
{
    public builder() { }
    public static builder Create() => throw null;
    public void SetResult() { }
    public void SetException(Exception e) { }
    public void Start<TStateMachine>(ref TStateMachine stateMachine)
        where TStateMachine : IAsyncStateMachine => throw null;
    public async Task => throw null;
    public void AwaitOnCompleted<TAwaiter, TStateMachine>(
        ref TAwaiter awaiter, ref TStateMachine stateMachine)
        where TAwaiter : INotifyCompletion
        where TStateMachine : IAsyncStateMachine => throw null;
    public void AwaitUnsafeOnCompleted<TAwaiter, TStateMachine>(
        ref TAwaiter awaiter, ref TStateMachine stateMachine)
        where TAwaiter : ICriticalNotifyCompletion
        where TStateMachine : IAsyncStateMachine => throw null;
    p

While this is a technical marvel from a parsing perspective, I ask the question "Does this make the language easier to use?"

This proposal argues that is does not. If you read that sample above and say "That seems fine, or at least not harmful" then you certainly don't agree with this proposal. If you also want more language constructs to behave this way (which is what I am interpreting your statements as being about) I agree with @CyrusNajmabadi that we should have that discussion on a separate issue. I am happy to start that if you want.

@TahirAhmadov
Copy link

I would say, let's keep _ as contextual, but the other keywords mentioned should be prohibited from being identifiers. I don't have a single identifier in all of my codebases named var etc.; it's anecdotal evidence but I suspect there aren't many, if any at all, in other teams' projects either. And IDE can probably suggest a fix for these together with the release of this breaking change, making it a trivial 15-minutes exercise of fixing these build errors, using rename refactors. On the other hand, I do use _ from time to time. Imagine method calls like Info.Of<Person>().Property(_=>_.FirstName) - in all of these cases, I purposefully use _ to signify that it's not a real parameter and this whole construct is just a workaround for the time being, until something like infoof or propertyof is added.
I'm sure some or all of these points have already been covered - just don't have the time to read the entire thread.

@Joe4evr
Copy link
Contributor

Joe4evr commented Dec 28, 2021

it's anecdotal evidence but I suspect there aren't many

I agree there likely aren't many, but from what I understand, there are some big companies and/or government orgs whose code style policies are such that they never want to have var or the like, so they define some public struct var/dynamic { } to make the compiler issue an error since then any use will bind to the user-defined type instead of as a keyword. In the past it could've been quite costly for such a code base to be broken by a compiler/SDK update that started disallowing the use of such keywords as typenames, so even if there aren't many, those kinds of customers are worth a lot of money, and I have my doubts that Microsoft would've liked to risk losing them over such cases.

But today there are much better ways to enforce such a policy: There's a whole section of EditorConfig rules for it, so there's a clear migration path for such code bases that wish to ban the use of certain keywords in their entirety while also (possibly?) making it easier for the LDT to add new useful keywords in the future. (That said, the EditorConfig section in question is only for var at this time. It'd probably be good to have similar rules for dynamic and others, but that's a different discussion.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests