-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal]: Only Allow Lexical Keywords in the Language #4460
Comments
Previous discussion: #4458. |
I feel that the breaking changes that would be introduced by adopting this proposal on its own fall into three categories:
|
|
@svick While I am only proposing removing name lookup and |
that is entirely dependent on the compiler implementation, but new language versions are always allowed to break analyzers. Consider when expression-bodied-members were added. If you previously assumed (not unreasonably) in your analyzer that all methods with a body contained a block syntax you were broken since now the body could just be an arrow expression. |
See: #4466 Conflating the two was always going to create a lot of confusion, and the parser that get's broken/confused the most is the human parser. |
I've switched my position on #1064 (disallowing Maybe soft-deprecating by adding a new compiler warning in C# 10 that tells you that I agree with @HaloFour. I think the human parser should be the most important factor. While unlikely, the possibility of code like this should make us uneasy: public class Foo
{
private int _;
// Doing something important in some other file that is affected by reading Foo.WrappedValue?
public int WrappedValue => _;
public bool IsNumber(string input)
{
return double.TryParse(input, out _); // oops!
}
} (stolen from http://gafter.blogspot.com/2017/06/making-new-language-features-stand-out.html?showComment=1509474504510#c7458806139970524286) |
I hope this proposal goes nowhere. I like the underscore usage because it makes the code less boring. As a writer, I like to use dashes, colons, semicolons, and etc to make my writing more interesting just like an _ makes the code more interesting, though it's rarely used. I want C# to be an intermediate-level language and developers having trouble with var and etc, they should look at Lua or etc. |
Java, which is used by a larger (and some would argue more resistant to change) community seemingly has had little/no issue with the language deprecating and then disallowing the use of I've been pretty outspoken against the use of |
My views are basically the same as @HaloFour 's. I think semantically contextual keywords make a lot of sense in theory but add developer complexity and overhead for little benefit. |
I think this is very debatable. Consider the work we're doing to support |
For clarification: this proposal explicitly states that cases like this should keep working as they do, contextual keywords will always exist. Just because properties use the
I totally agree that there is not engineering reason to change this. It just works for the compiler folks (as far as I am aware). But I think it add an unnecessary burden on programmers using the language. Things like Every other language I've encounterd (C++, Jave, Typescript, Pytho, Go) does not use name lookup rules to determine whether something is a keyword (including F# and Visual Basic) and there have been no complaints. I personally feel that all this concern over keyword breakage has no real evidence, its all theoretical. Java can just add a new keyword if they need to and no one complains. |
Right. but the problem with that is that it direclty goes against design goals we have for these features. for example, we want you to just be able to say
I don't really see this as a burden. For people just using the language, using
This is not true. TAke 'go' for example. There are lots of complaints about the verbosity of the language. And part of htat verbosity arises because the language doesn't want to get into this space. So it ensures all it's constructs are extremely verbose and often unweildy, just so it doesn't have to do any semantic checks on this sort of thing. It's a tradeoff they made, but which we're quite loathe to as it really just bulks up the language. |
I would need to review the proposal but isn't this going to work exactly like If there is a design goal that can only be achieve with name lookup rules or everyone else in the LDM just disagrees and thinks that semantic contextual keywords are awesome and we wish we did them more often great! Thats not my position but I am willing to be convinced. |
No. 'value' always binds to the property parameter prior to anything else in a higher scope.
That would break lots of code taht is totally fine today and which wasn't doing anything strange or inappropriate. I do not see how customers are helped by just changing the meaning of their code on them.
We are not, and should be beholder to 'a casual reading of this'. If you see this:
What does a casual reading tell you? Almost nothing. This could be a local, or a field, or a property, or a parameter. it could be assigned. it could be assigned by-ref. it could have conversions. it could throw. etc. etc. etc. And that's just assignemnt. Once you get the |
Weird corner cases are always like that. But we have tons of those everywhere. The question is: is getting rid of weird corners better or worse than breaking code? The position we've landed on generally comes down to:
If it's unreasonable (which often comes down to debate) we are more likely to take the stance: trying to prop up this code is not worth it, so we would prefer to change it and accept that pathological cases break. Similarly, if something is widespread, then we've already opened the barn door. People clearly are using the language in this fashion in a significant fashion, and I think we have to accept that. Where we have room to play around with is when you get into teh 'unreasonable, and not used (or very very rarely used)' territory. This is like someone coming along now and saying: yeah, i'm going to name my type -- So, in the case of some keywords ( However, for some keywords, i'm not ok with us doing this. If the pattern is either reasonable, or widespread, we need to accept that and not harm users when we have a perfectly suitable way to both introduce the feature and keep things working just fine. |
I think this is a totally reasonable stance to take. In the past there were more situations where a newer version of C# could be "pushed" on you. Today it's an explicit decision to update your SDK version to get an updated version of C#. Major SDK versions also have major breaking changes (api names changes etc.) to the point that developers expect some friction. I think its not unreasonable to have folks change |
If we scope this to That would address, |
I like the scoping here but I would also like to consider the case of discards. As the feature is written today it's hard to use discards broadly in a method and instead is most useful in a limited set of circumstances. In too many cases it subtly turns into an identifier, not a discard, and suddenly that invalidates other uses within the method body and suddenly you have to drop back to ignored names. |
I'd def like to break out an issue on discards. I;m curious about hte cases that are hard here and where it's difficult to mesh the idea of:
I think discards are also a space we could potentially experiment with a |
Converting between lambdas and local functions. Parameters in lambdas can be discards but not in local functions. That means when swapping between the two it introduces unnecessary friction because you have to rationalize discard behavior. It's no longer what essentially amounts to a syntax transform. Whether a // _ is an identifier
Action<int> action = (_) => {
_ = ""; // Error cause _ is an int identifier
};
// _ is a discard
Action<int, int> action2 = (_, _) => {
_ = ""; // Okay cause this is a discard
}; This is generally frustrating to have to remember but really gets frustrating when you consider it in the context of refactoring or code changes. Consider that lambda parameters are often listed as discards because they're a callback value that you may not need. Circumstances change and it's rational to begin using a parameter which begins by assigning it a name. If assigning that parameter a name though means there is only one string token = ...;
Action<string, string> = (_, value) => {
// Error: This worked before I changed the second parameter to have a name
if (int.TryParse(token, out _) {
...
} This though means there is a huge incentive to prefer These together all make it frustrating to use discards. It's too easy to get trapped in a case where |
Could we change that and instead make it so that if it's an error with the prior semantics, then it can now be reinterpretted as a discard?
I have a supposition we can fix that, without having to go whole-hog into: all The open question for me is if there are cases where code would be legal under either interpretation (identifier or discard), and you want the latter, and interpretting as the former would lead to undesirable behavior. If that exists, then this approach would likely not be viable. However, my hunch is that this would allow for:
|
I like where this is going but it sounds like there could be a lot of potentially tricky edge cases, especially if the code seems to intentionally mix discards and if (int.TryParse(s, out _)) {
// ...
}
// later ...
var bar = foo.Select(_ => _.Bar) |
Can't do that because This is the core problem we're facing. The decisions of C# 1.0 are essentially limiting our ability to make |
I'm not taking about the cases where is has legal, error free, semantics. I'm talking about the cases where it has illegal semantics. For example, where it would cause a scope collision. This code would be illegal today, and so we can come up with rules to make it legal by saying: ah, ready these all as discards now. |
I'm unsure what you're asking for at this point. |
I understood that. The problems with that have already been discussed. |
We were not. We made a breaking change to support this.
This is not ideal, for all the reasons i outlined above. Contextual keywords are a mechanism that can allow introduction, but come with drawbacks (again, listed numerous times above). |
Correct, and that's a good thing. That's the premise of this entire issue. Removing syntactically or semantically contextual keywords to not have users have to understand that concept, and to make it so that they can't run into these cases. |
Backcompat means that your code runs without needing changes. If we could require changes (including making them for you), then there is no compat concern at all and the entire space becomes moot. However, we cannot depend on that, so we have to make the choice around either not doing the change, or doing it and allowing some cases to break (the latter of which we do, but only when we feel it is so worth it, and the pain is hopefully sufficiently small to the ecosystem). |
Contextual keywords are ones that are legal identifiers. So it's totally possible to be using them with their identifier meaning today in legal code. The fixer would prefix them with We are likely loath to do this though as, from teh list given above, it's highly likely people are using these for existing identifiers, and thus the chance of compat pain, and extra work becomes high and likely to very unpleasant.
This is not the case. fixers do not require the code to compile cleanly. indeed, we would implement this like so:
Put another way, internally, these would still be contextual keywords. We would just error if you used them outside of their context. These would have a clear diagnostic allowing a fixer to run to solve this. -- However, as per above posts, the issue is not about converting code to move them off of using these words as identifiers. If we could forcefully change code to address compat concerns, we'd never have compat concerns. |
@Korporal we already have this capability. That's exactly the primary reason that contextual keywords exist. It's literally why we created them and have used them in all the versions of C# that have added new syntactic constructs. However, as has been covered a few times already, this is not without drawbacks. That is the reason this topic is being discussed. |
I'm not missing the point. You've talked about two separate things. In the words quoted here you're talking about new keywords being introduced. But in your immediately prior post you wrote:
You literally asked about eliminating reserved words, so i was discussing eliminating reserved words. if you want to discuss adding keywords, that's also fine. But these are two different topics, and my responses about one of your questions should not be dismissed because you also want to discuss something else. |
By eliminating reserved words Cyrus I meant eliminating them as a grammatical restriction in a language by changing the grammar, and the motive for that is to then enable new keywords to be added to a language without ever breaking existing code because adding them can no longer lead to conflicts because reserved words are a grammatical impossibility. A language that has no reserved words can obviously have new keywords added to it over time ad infinitum and never fail to compile existing code that might have unwittingly used those new keywords as identifiers. I use the terms "reserved words" and "keywords" as I defined them earlier. If I failed to make that clear then I apologize. |
Yes. That was clear. And I covered both how that could be done for c#, or for a new language. However, I also included the information on what makes this unpalatable (both for the language design, and for the user experience). It's a net negative for both.
This is not the problem we are trying to solve. We already have effective language tools for this. The problems are things like the user experience.
Can you please use the terms as defined in c#? It is very confusing to use the same terminology to mean something different, esp. in the context of a discussion which overlaps C#.
Nothing to apologize for. That said,I don't really see any direction for you conversation to go in. The ideas have been explored here, and the pros and cons laid out. I've also explained why our calculus for c# (or most general purpose, user facing languages) would preclude these designs. So I'm not sure what else there is to discuss on this tangent. |
@Korporal from a language grammar perspective I have no opinion. They are certainly useful to allow for language evolution without breaking existing code. This proposal attempts to argue that the way contextual keywords were applied to a specific language (C#) make it more complex to use. When language concepts such as Related to this topic is the discussion of warning users about the user of all-lower-case type names: |
I agree that it doesn't eliminate the problem entirely, but as you've said, it's all about tradeoffs and weighing costs vs benefits. It seems to me very analogous to the proposal to warn on lower case type names (which presumably also will come with a VS code fix that offers to correct the casing or add '@' as needed): neither of them eliminate the problem entirely, because existing code must continue to work correctly. But making it easier to identify problematic cases, and giving developers the tools to identify and opt-in to fixing them (without forcing them to, or breaking anything if they don't), can at least give a better sense of the extent of breakage, and potentially shift the balance when weighing the costs vs benefits at some later date. As has also been pointed out in this discussion, even the "never break existing code" rule wasn't considered sacrosanct and (pun intended) binding with regard to |
You say "Contextual keywords that are never syntactically ambiguous" but surely such situations would never compile, if the compiler and grammar can't resolve an apparent ambiguity then its a syntax error, no? Also is there any evidence behind the statement "the way contextual keywords were applied to a specific language (C#) make it more complex to use"? I've personally never found this aspect of C# even a slight problem, if there was a real risk that a syntactically valid program might execute in a very unexpected way because the author was misled by a contextual keyword then I'd certainly be interested to learn more of these cases. I suppose this is a candidate though for the kind of thing you're speaking of: public class record
{
} To a novice with C# - is that a record declaration or a class declaration?... |
There are no syntactic ambiguities with any contextual keywords (except 'record', though that was done intentionally). |
I covered this above. It would certainly be confusing, for example, if |
Yes. This is one of the reasons we've approved making a warning for this. So we can tell users: this likely is a bad idea, and is likely very confusing. And this is just one case. If there were no keywords, then you get this with everything. |
That's simply not true, if the grammar insists that every identifier must precede its attributes then there's no confusion whatsoever. The current grammar mixes the ordering of attributes and identifier and they way these are allowed to be mixed varies from one language construct to another.
If the name was always and had to always be, the first part of such definitions then a) there's no more confusion about what is its name and b) the ordering of the various attributes becomes irrelevant. With that said any developer can see that these are all logically identical: Mydata class public sealed
{
}
Mydata public sealed class
{
}
Mydata sealed public class
{
} Because the name must be the first token nobody can ever be confused about what is its name and the attribute ordering is utterly irrelevant in these examples. This also completely eliminates the current possibility for confusion with a I'm not suggesting this as a literal grammar either, just trying to show you how this could in principle be accomplished and to answer your post. |
we keep going in circles @Korporal . This was already discussed heavily above. That sort of anachronism is too restrictive and would not be acceptable to any language we want to invest in. Indeed, we moved away from that in VB because it was simply too unpleasant and unwelcome by the ecosystem. Furthermore, you've continually asked if it would be possible to do this for the grammar of c#. For example: "I'm curious as to what a grammar would, could look like that strived to be as close as possible to the current C# syntax " It cannot be both close to the current C# syntax, and yet not support even basic constructs like this. At that point it would not only not be c#, but it would be such a departure as to not even feel slightly close to c#. Effectively, every syntactic construct would need a prefix form. And any place where an identifier could start would have to ensure that any following token both did not collide with any existing grammar construct today (already very not true), and for all future grammar productions we'd want in the future (very very limiting). As such, such an approach would not be viable or desirable. Furthermore, as stated above, there are practically no benefits to this approach. It does not solve any issues that we're having problems with. However, it introduces substantial issues both in the lang design space and the user-experience spae that make it a complete deal-breaker for C#, C#-prime, or any potential future lang. |
This is not a virtue. This is a degradation in user experience. You are proposing a 'solution' to a part of language design that we not only consider non-problematic, but we consider a virtue. We don't have an issue introducing syntactic keywords. We have a solution for it. We also have a mechanism to allow users to use anything as an identifier if we've squatted on it. Allowing more keywords to be usable as identifiers in differnet positions is not on a non-goal, it's an anti-goal. It's something we do not want. As stated, it would be possible to have hte language do this. But we intentionally do not want it as it's a net negative for usability. |
At this point, things seem to be going entirely in circles, and the tangent from teh core issue is enormous. Please open a discussion if you would like to discuss this further, or please take this to somethign like gitter.im/dotnet/csharplang. There's nothing really to be gained repeating the same points over and over here (esp. as they are on a topic that is unrelated to what this issue is tracking). |
This comment has been minimized.
This comment has been minimized.
@Korporal, as specified above, please open a new discussion, or take thsi tangent to gitter and/or discord. Repeating the same points cyclicly isn't valuable (Esp. as this is a tangent on an idea that doesn't match this issue). This was my mistake for allowing this tangent to happen. I thought this was a github discussion, not a github issue. Free form exploration of ideas like this are what discussions are for. For direct issues like this, we want to keep things on topic to the issue itself and only what is relevant to that. |
This comment has been minimized.
This comment has been minimized.
@Korporal, as specified above, please open a new discussion, or take this tangent to gitter and/or discord. As i said before i made a mistake here, believing this to be a discussion already and not an issue. That was my fault and I apologize for it. However, from this point on, any further conversation on this tangent is not appropriate for this issue. |
lol! That is not the @CyrusNajmabadi way :)
@Korporal Sorry I was wrong here. I should not have said "syntactically ambiguous". As you say, it will parse to something. The question is whether that something is useful. The crux of this issue is that this is valid code in C# today: class await
{
async async async(async async) => await async[async][async][async][async];
} full code sampleusing System;
using System.Runtime.CompilerServices;
[AsyncMethodBuilder(typeof(builder))]
class async {
public async this[async async] { get => null; }
}
class await
{
async async async(async async) => await async[async][async][async][async];
}
static class util {
public static awaiter GetAwaiter(this await a) => throw null;
public static awaiter GetAwaiter(this async a) => throw null;
}
class awaiter : INotifyCompletion {
public bool IsCompleted => true;
public void GetResult() { }
public void OnCompleted(Action continuation) { }
}
class builder
{
public builder() { }
public static builder Create() => throw null;
public void SetResult() { }
public void SetException(Exception e) { }
public void Start<TStateMachine>(ref TStateMachine stateMachine)
where TStateMachine : IAsyncStateMachine => throw null;
public async Task => throw null;
public void AwaitOnCompleted<TAwaiter, TStateMachine>(
ref TAwaiter awaiter, ref TStateMachine stateMachine)
where TAwaiter : INotifyCompletion
where TStateMachine : IAsyncStateMachine => throw null;
public void AwaitUnsafeOnCompleted<TAwaiter, TStateMachine>(
ref TAwaiter awaiter, ref TStateMachine stateMachine)
where TAwaiter : ICriticalNotifyCompletion
where TStateMachine : IAsyncStateMachine => throw null;
p While this is a technical marvel from a parsing perspective, I ask the question "Does this make the language easier to use?" This proposal argues that is does not. If you read that sample above and say "That seems fine, or at least not harmful" then you certainly don't agree with this proposal. If you also want more language constructs to behave this way (which is what I am interpreting your statements as being about) I agree with @CyrusNajmabadi that we should have that discussion on a separate issue. I am happy to start that if you want. |
I would say, let's keep |
I agree there likely aren't many, but from what I understand, there are some big companies and/or government orgs whose code style policies are such that they never want to have But today there are much better ways to enforce such a policy: There's a whole section of EditorConfig rules for it, so there's a clear migration path for such code bases that wish to ban the use of certain keywords in their entirety while also (possibly?) making it easier for the LDT to add new useful keywords in the future. (That said, the EditorConfig section in question is only for |
Only Allow Lexical Keywords in the Language
Summary
Today there are keywords in the language that cannot be understood with just lexical information such as
var
ornameof
. These keywords operate this way for backwards compatibility reasons. I propose that we change the language so that there exist no keywords that cannot be determined via lexical analysis.Motivation
In general, C# strives to be a language that is explicit about program behavior and normally requires the developer to write out what their intention is without ambiguities. This makes it a language that is easy for someone to read and understand. Once you learn it there is little implicit behavior to consider. I believe having this "gotcha" where keywords are only keywords if nothing else in scope is so-named makes the language harder to read and reason about in general.
There is also the reality that the design goals around C# Language versions and .NET Framework versions have changed. In the past it was paramount that developers could take a new language version update without updating the framework version they were targeting. With language features being increasingly tied to the runtime this makes less sense. We now strongly encourage developers to update both the language version and target framework together.
Detailed design
There is an existing concept in the language called "contextual keywords". I am not proposing doing away with this concept altogether just changing it so that a keyword's "contextual-ness" is always able to be determined lexically. Take new (at the time of this writing) keyword
record
. We can still know if we are referring to therecord
keyword or some identifier named record based on the lexical context, there is no ambiguity. Howevervar
, according to the spec, requires us to check if there are types named var in scope:spec
Similarly,
nameof
requires checking if a there are any identifiers called nameof in scopespec
The implementation of this proposal would remove wording from the spec around name lookup collisions, and have a compliant compiler be able to fully determine keywords given only parsing information.
The following keywords would now error if developers attempted to use them as anything other than a keyword
var
nameof
dynamic
_
Drawbacks
This is a breaking change, if anyone were relying on this behavior in their code it would no longer compile. For cases where a type is named
var
,dynamic
, or_
or a method is callednameof
the developer would need to change the usages to@var
,@nameof
,@dynamic
, or@_
.Alternatives
We could opt to keep
_
as a contextual keyword that depends on name lookup rules as this is the change most likely to break real-world programs (see discussion on #1064)Unresolved questions
Design meetings
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-28.md#ungrouped
https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-09-06.md#only-allow-lexical-keywords-in-the-language
The text was updated successfully, but these errors were encountered: