-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft proposed revision of syntax model for deconstruction declarations #12688
Conversation
@@ -1852,13 +1837,71 @@ | |||
</Field> | |||
<Field Name="Value" Type="ExpressionSyntax"/> | |||
</Node> | |||
|
|||
|
|||
<AbstractNode Name="MultiVariableDeclarationSyntax" Base="CSharpSyntaxNode"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a naming perspective, it seems like perhaps this should just be called BaseVariableDeclarationSyntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm.. this doesn't match my remembrance of what we decided on in our previous meetings. Namely that we would reuse VariableDeclarationSyntax. |
<Kind Name="CloseParenToken"/> | ||
</Field> | ||
</Node> | ||
<Node Name="WildcardVariableDeclaratorSyntax" Base="MultiVariableDeclaratorSyntax"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this IgnoredVariableDeclaratorSyntax
.
My recollection of our meeting on this was that we would have the following:
This would handle any deconstruction form we could think of except for: To that end we were having a discussion about fitting the wildcard version into the above. Two suggestions were proposed. Specifically:
Then, when we saw
In both cases, i would recommend having a different In our email thread on the topic, you raised concerns about '1', but i pointed out why i think it's a totally ok perspective on what Given the suitability of using |
This is not intended to be a reflection of the process that resulted in the previous revision of the syntax. That revision ran into problems, including a number of design violations that we do not have a solution for
This revision is intended to be a solution that addresses all of these issues, and also facilitates further extension of this area of the syntax (e.g. wildcard support, support for flexible parsing in error scenarios, etc) by virtue of the new nonterminals. My question for you is whether you believe this would be usable from the perspective of consumption in the IDE code base. In reply to: 234688286 [](ancestors = 234688286) |
I'd like to discuss each of these issues to better understand what they are, if they're important, and if they necessitate a new syntactic model. I'm going to start with bullet 2:
This is not my recollection. When we dicussed this model it was stated that we would not make Type optional and we would use 'OmittedTypeSyntax' to represent the type. The syntactic model would be:
Note that in the above model The
I have looked and i can see no such invariant in our system. For example, the SyntaxFactory we've already shipped has: public static VariableDeclarationSyntax VariableDeclaration(TypeSyntax type)
{
return SyntaxFactory.VariableDeclaration(type,
default(SeparatedSyntaxList<VariableDeclaratorSyntax>));
} It has always been possible to create a VariableDeclaration with no declarators. This does not change with the API that was originally proposed. Indeed, i don't believe we have any invariants with lists ever. They can always be 0, 1 or many in the actual tree. |
Bullet 3:
First, as has been discussed (with support from other team members as well), it is not a given that the SyntaxModel and Language grammar have to have the tight coupling you're requiring. Second, if we do feel such a tight coupling is appropriate/desirable, then i believe that the syntax model should inform the grammar and not the other way around. First and formost, the syntax model has significant binary compatibility requiremetns. As such, if we make it follow the grammar, then we run into these sorts of issues where we end up with a much more complex syntax model than otherwise necessary. Conversely, the spec is much easier to change. It has no binary compat concerns. It cares about the compatability of the language. But it is free to reword/restructure any part of itself as long as it preserves that. It can change it's grammar if it needs to. Second, it should be clear that we've already decided to substantially deviate from the grammar in teh spec. The grammar in the spec is veyr over-constraining. That's terrific for a language spec as it allows hte grammar to just wholely exclude large swaths of inputs without needing to explain why they're not legal. But the same is not as desirable for our syntactic model. There, we want to operate in a much more flexible manner even if it allows for legal syntax trees that go far beyond what the language actually things is syntactically valid. Finally, even if the above statement is correct and this is "far beyond what we've accepted previously" (which i disagree is the case), then that's not a case where "we do not have a solution". The solution is simple: accept this form :) |
This comes down to opinion. From my own experience consuming our syntax model, the original API approach feels much simpler and easier to use. |
I need this to be explained better. I asked for examples on the internal email thread on this and none were provided. I would see no need for a mismatch between the shape of hte API and the factories. Namely i would expect the following to be generated: // Core factory method for full signature.
VariableDeclaration VariableDeclaration(TypeSyntax, Deconstruction, SeparatedSyntaxlist<VariableDeclarator>); // newly added
// Minimal factory method based on optional properties and lists.
// Note: this factory method already exists.
VariableDeclaration VariableDeclaration(TypeSyntax);
// Bridge method we would manually add to preserve binary compat:
VariableDeclaration VariableDeclaration(TypeSyntax, SeparatedSyntaxlist<VariableDeclarator>); It's unclear what changes we would need to make to the generator at all to support this. The addition of an optional property to an existing construct should be no problem. Indeed, we've moved the syntax model forward using that approach numerous times already. |
Before i can get to that, i want to understand this better:
That revision was simply the addition of a single optional property to an existing declaration node. This has never caused design violations in the past, so i'm not understanding why it caused design violations now. |
This concern does not make sense to me. First, we've clearly demonstrated in our syntax model so far that it is acceptable to have optional fields despite the extra space they will take up. For example, here is a small list that causes excess usage despite alternate syntactic models that could have avoided it:
As we've added new language features, we've also gone with the optional slot model, despite the increase in space. For example:
Our other new language constructs even started down the route of adding slots, without any concern about the cost there. For example, the original 'ref' syntax proposal added the 'ref' keyword all over the place (including as a new slot for all local declarations, all equals-value clauses, all returns, etc. etc.). Now, we're not going with that model, but it's not because of the concern over tree size growth. The syntax model, early on, went with a model of preferring node reuse, with optional slots for cases like this. When we could share concrete nodes we chose to in a tremendous number of cases. Subclassing was used when there was little to share, and there was substantial difference in the individual syntactic forms. Something which does not seem to be the case for VariableDeclarations even in the presence of Deconstructor declarations. |
@mattwar i'd like to hear your thoughts on this. You have a lot of feelings on the design behind hte syntax model and the thought process around changing it as we move forward. If you have the time, can you read through this thread and let us know how you feel about this? Thanks! |
Can you please expand on this, and provide specifics? |
Sure. When i see VariableDeclaration i can now use my knowledge of that type everywhere I see it in the API. It also helps me avoid missing things. For example, say we have this new syntax model for variables, now i also need to know that it's not sufficient to just make my code work on LocalDeclarationStatement, i also have to make it work on the new syntax for local declarations that suppports Deconstruction. The same is true for all other forms of the syntax model where we introduce this new node. I now need to support foreach and multivariable-foreach for example. I'd much rather just have my code that works for 'foreach' statements, and then only have to care about the differrences in the variable declaration if that's what my domain cares about. Yes, i could just target CommonForeachSyntax. However, that's very different than how we've done the much of the rest of the syntax API. I don't need to operate on CommonForStatement and then cast down to VariableDeclarationForStatement and ExpressionBasedForStatement. I don't need to cast down from CommonProperty to InitializerProperty or ExpressionBodyProperty. etc. etc. Yes, this pattern does exist elsewhere, but IMO it's the uncommon case and should only be done when appropriate/necessary (which i don't feel is the case here). When the syntax model changes like this, it ends up bleeding higher and making me care about this distinction at a level that doesn't make sense to me. My intuition using the language and syntax model is there is no difference for the foreach itself for:
To me, it's much easier and more intuitive if the difference in shape happens at as low a point as possible. Therefore, all other things being equal, i'm going to strongly prefer syntactic formulations that do not cause this sort of spread upwards through the hierarchy. To me this make it feel like we're bifurcating things Also, the approach here means i now need to understand yet another way of describing what seems to be the same thing to me. When i have, for example "var v" in a field is a VariableDeclaration. "var v" in a local is a VariableDeclaration. "var v" in a "for" statement is a VariableDeclaration. "var v" in a FixedStatemnet is a VariableDeclaration. "var v" in an event is a "VariableDeclaration" By adding another place where "var v" is not a VariableDeclaration you add another tax to my mental model and you make it so i need to add more code that attempts to paper over this syntactic decision so that the IDE code can treat that uniformly with all the other VariableDeclarations out there. With a unified VariableDeclaration form, i only need to learn the distinction once, and i can now apply that knowledge everywhere that VariableDeclaration appears. I can write code that can be used in many more places. I don't need more surface area (for example, on SemanticModel) to ask the same questions i already have the ability to ask. So, going back to my original statement. I want to be able to ignore the different forms of variable declaration until i'm doing something that would actually care about that distinction. For code that doesn't care about the distinction, i don't want to have to see how this type bled upwards. And for code that does care about the distinction, i also don't want to have to care how it bled upwards :) That's why i find the original approach much simpler and easier. The distinction doesn't bleed out. And it means i can learn one thing and then use that in far more places. ** Note: i do understand we violated that with things like "foreach". I personally really regret that we did things like have "foreach" and "catch" not have a VariableDeclaration inside of them. I view that as a very big mistake that we have no good solution for. I do not want to compound that mistake by adding new constructs that don't use VariableDeclaration when they would be able to. |
@@ -2035,6 +2078,7 @@ | |||
<Field Name="RefKeyword" Type="SyntaxToken" Optional="true"> | |||
<Kind Name="RefKeyword"/> | |||
</Field> | |||
<Field Name="Deconstuction" Type="DeconstructionDeclarationAssignment" Optional="true"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deconstruction
Note: this was a similar reason i pushed back on the original 'ref' syntactic changes. The original addition made it necessary to see and absorb the impact on a large number of nodes. And the impact felt like a disjoint merging of concerns. To absorb that change, lots more code would have now have to be updated to handle 'ref' on declarations. And the same was true for all the places it was showing up effectively at the expression level. With the final syntactic model we decided on, 'ref' ended up being added syntactically with only two syntax changes:
The syntactic fallout of this change was now incredibly minor. The changes to respond to this will be commensurately minor and much easier to do. The larger the syntactic fallout, the larger the impact on all higher layers. And the more code we need at layers that appears to exist just to support the syntax choices, versus code that feels necessary to represent the domain of that feature itself. |
That remains the case in the proposed API.
If we take the proposed solution for #1503, we'd only ever generate the new foreach statement, and you'd only have one syntax form to handle.
Not necessary with the proposed fix for #1503.
As described, it would not.
Indeed, it is not, and it needs to be treated differently!
But a deconstruction declaration is fundamentally different. All of those other contexts have a list of declarators-with-initializer, and the deconstruction doesn't. The "initializer" in a deconstruction isn't associated with any particular variable, and it isn't converted to something. The initializer is part of the deconstruction statemnet, and it is subject to deconstruction.
I think you're confusing the aggregation of unlike things with unification. In the old syntax, it isn't unified, they are represented by mutully exclusive sets of subtrees. You need to learn that distinction once for this proposed form too, except that the shape of the syntax helps you instead of hindering you.
With this proposal, that is exactly what you would be able to do. You'd be able to handle one kind of foreach loop, for example, and ignore what it declares. One kind of field. One kind of for loop. One kind of fixed statement. And for each of them, the shape of the syntax tree would be helpful in understanding what can appear in the code. That is particularly useful for producers of syntax trees, which are heavily burdened by the old model. |
it feels like we're talking in circles. We fundamentally see things differently and that affects how words like "simple" and "intuition" play out.
I am loath to solve our versioning story by just no longer returning old constructs and instead generating new ones. This means IDE code now needs to be bifurcated to handle older trees (which may be returned by existing clients) as well as the new trees that the current parser would produce.
I'm not understanding how that's the case. In this very review i see DeconstructionDeclarationStatementSyntax added. The deconstruction has bled up through to higher levels.
I could say that about so many other parts of the API that we've provided our current syntactic model on. Furthermore, any time we add something new to the language, we're adding something "fundamentally different", hence why we're adding it in the first place. Because the language clearly does not support it and thus it must be different from what we have. That does not ever mean htat we need our syntactic model to be versioned in the manner presented here. Indeed, that was the very argument made about the 'ref' feature. The language feature introduced things that were 'fundamentally' different, and initially we had a syntactic model that really pushed that 'different' view. However, we still finalized on a syntactic incorporation that was still very localized and minimal. The syntactic model is ours to do with as we please. We could introduce a massively new type of feature in the language with massively different semantics from what we have today (i.e. switches, ref, etc.), but that doesn't preclude us from reusing existing syntactic elements. There is no requirement or necessity that this be the case.
You have my argument reversed. My argument is that allowing VariableDeclaration to represent both the 'declarators-with-initializer' and the 'deconstruction' means that those existing constructs can now support both these forms without bifurcation. It may be the case that today we won't allow Deconstruction. But if we ever relax that then that's trivial to do. These existing constructs use VariableDeclaration already. Now they're capable of representing deconstructions without any need to do anything to them. The impact on the syntactic model is kept very narrow.
You keep stating opinion as fact. You feel they are unlike. I feel that they are alike. These are both opinions. The fact that i don't feel the same way as you does not make me confused. I'm very familiar with these features and i've spent a lot of time involved on the discussions about them as well as experimenting with using them. I'm also familiar with this whole family of features thanks to experience in other languages that provide them. I see and accept how you view them, but that doesn't mean that how you view them is the only true way to do so. To get into specifics: To me both the forms are about declaring variables. One declares by taking a value, deconstructing (possibly recursively), and storing into variables. The other takes a value and stores into a variable. These are extremely similar to me. I say that as both a user of the language and someone deeply involved in consuming and writing features that sit on top of our APIs. To me, these are incredibly close, and that's one of the reasons (IMO) it was so trivial to incorporate all the different syntaxes together just by performing a very small augmentation to the syntactic model.
We can not stop supporting our features on older trees that our clients are producing. We will need to maintain our code for that. |
Note: this part is important, and why i so regret that we did not use VariableDeclaration in many more places in the language. Were we to have used VariableDeclaration in places like foreach the incorporating 'deconstructors' would have been easy without any work on the foreach side. It woudl also have meant we'd need a smaller SemanticModel surface, with less entrypoints to map from variable declarations to symbols. And it would have allowed more sharing of code at the higher layers that depend on the semantic model. I'd like us to move away from the model where we say "the syntactic node represents a super-set of what is allowed here. Hence we need a specific node that is more restrictive". I'd much rather say "the syntactic node may allow for more, though the parser may not ever produce nodes in that super-set". These restrictive nodes end up causing a lot more work to need to be done on the consumption side. And, as demonstrated here, mean that absorbing new language changes ends up costing much more because each specific area needs a specific solution to address the limitation. |
You are now suggesting a new approach to errors that has not been tried before. Furthermore, the cases where we have done something akin to that (i.e. Incomplete-Whatever) have traditionally been very under-served at both the compiler and IDE layers. I do not want us to introduce a new unproven mechanism when we have an existing system that works.
Great. Now we will need an error deconstruction/declarator that adds more nodes into the system. Which means more cases to handle at the IDE layer. Which requires more binding code. Which may require more semantic model entrypoints. This makes consuming this feature more expensive at hte higher layers. The semantic model exists to serve the compiler and IDE layers first and foremost. It should be implemented to make us efficient and to not introduce unnecessary taxes on it. I do not approach these discussions from a perspective of "we should have a pure translation of the spec into code form" i come from them directly from a perspective of "i need to consume this and i need to absorb the costs of these API decisions". As such, i'm going to push back on APIs that increase costs without buying me as the consumer literally anything. I want to repeat that last point. This isn't a situation with pros/cons where i can at least see some positives of moving to this new model from my consumptive purposes. This is a situations where the new model appears to be, across the board, worse to consume and maintain for us. Without substantial demonstrated benefits for our use cases that's not going to be something i'm going to support. |
To repeat: i really do not care what the syntax forbids :-) . The syntax forbids interface members from having block bodies. And yet we allow it. The syntax forbids in/out modifiers on class type parameters. And yet we allow it. We allow it because the purpose of the syntactic model is to not serve the syntax of the language or the language specific. The purpose of the syntactic model is to serve tooling. That's it. That is it's sole purpose. While it is nice when it can be more in line with the language-specification, that is a complete non-requirement. |
It sounds like the argument there is: "As part of the original model we also thought we might make some factory changes. But we're deciding against making those factory changes." That seems to be an argument against those factory changes (which i would have also likely been against as i see no need for factory changes). But it's not an argument against the syntactic model proposed. |
I think one thing we need to come to a conclusion on first: Can we deprecate existing nodes and move entirely to new, better shaped ones? If we do what is our deprecation story allowed to be? A lot the angst here is over the cost of still needing to maintain support for old nodes as we then have to go support new nodes. If, for example, we were allowed to deprecate/remove and then no longer support these old nodes at all, that would drive down the costs. If, however, we still need to understand and support the old nodes, then i'm going to push for approaches that have us continuing to use them over approaches that introduce new replacement nodes. |
There is also no such thing as |
DeconstructionVariableDeclaration. |
|
I agree we need to decide this in order to decide how to handle the |
DeconstructionVariableDeclaration is the kind, not the type. That's why i said "Not the kind". The type will remain VariableDeclaration. However, in my original proposal i stated that it would be valuable to have different kinds to make it easy for people to consume without deep introspection. For example, i expected we would have at least:
Note: the naming isn't finalized. The last one is optional because i'm not certain i want a separate kind for the |
No, the argument was that the unpleasant side-effects of the previous model would be partially tempered by the introduction of factory changes. |
I really need more actual examples provided. There is talk about what was going on in the factory, and no actual specifics. AFAICT, there would be no need for anything but a bridge method to allow you to still call into Without actual specifics on the problem here, i'm going to have to reject this as a bullet point against the original proposal. |
That is interesting, and though I have the same problems with it, it is nothing like where we ended up. I'm proposing an alternative to the syntax model we have today. |
I feel like we somehow ended up in a different place than what we originally discussed. That original place may have some problems. However, it seems like addressing those problems is much simpler than just replacing the original proposal with a new one entirely. For example, it sounds like:
The other concerns (closeness to the spec's grammar) are things i think should not be included as part of the discussion, except as some sort of tie breaker of otherwise equally good proposals. They should not be part of the initial comparison of proposals as those should be weighed entirely on their appropriateness for our tooling. |
Don't worry about the work involved. I'm so uncomfortable with the existing direction that I'll do a migration of the entire Roslyn code base.
That is a deep mismatch with the shape of the language. See Mads' grammar. Semantically * takes the place of (what would otherwise be) an identifier, or a type and and identifier.
All of the fields of a VariableDeclaration (including the new one) are now optional both for producers and consumers. That provides little guidance to either. This alone may be sufficient to scrap the previous API, but combined with everything else its just one more thing.
The Kinds don't affect the set of members of the type.
I completely disagree, but what we have is far from equally good proposals. The proposal herein models the kinds of factorings to map the language specification to the syntax model in v1.0, though the existing mapping has a few warts. The model proposed to be replaced is both incomplete and exploits warts to the maximum. |
I can't reconcile that with: "a mismatch between the shape of the syntax API and its factories, that required making changes to the syntax generator that were rejected in design review" Can we explicitly state what factory changes were actually being proposed. We've stated that we wanted to temper things by introducing some changes, but those changes were then rejected. Not knowing either:
is making it impossible to actually address this point. As i mentioned above, the only factory method i would feel we would be required to add would be a binary-compat bridge method for Now, i can think of other "nice to have" factory methods. But i don't think they're necessary. And i would not state the proposal could not go through because of them. These nice-to-haves could include:
(***Note that '2' is very-very-optional. We don't have helpers to make "var" more convenient in other areas AFAICT, so we could just require you to call the normal VariableDeclaration factory, passing in 'var' as the Type). In other words, for all the cases where i mentioned wanting new SyntaxKinds for the VariableDeclaration node, i think it would be nice (not required) to have a corresponding hand-written factory helper to call to make such a node for you. |
You say that.. but i think that's circular. To me, it's completely reasonable to look at * and say "it means: allow for a variable of any type, which i won't even bother naming because i'm not going to use it". At the end of the day the result for the usage of the language is the same. As such, i see no issue with thinking about it in that manner. Not that we're also going with other syntactic formalizations which also don't necessarily 'match' the 'shape of the language. For example, "ref Type" and "ref Expressions" seems to be where we're going with 'ref' in our syntactic model. These are likely not going to be the formilizations in the language itself. As has been mentioned by people like vlad/jared, 'ref type' is not a thing. And yet, it still provides from one of the cleanest ways of integrating "ref" into our syntactic model. Similarly, vlad does not believe 'ref expression' is a real thing. But it means we can integrate ref into so many contexts without having to bleed the ref higher. At the end of the day, but incorporating in tihs manner, we just introduce two simple types and we get 'ref' all the places we want it today (and the places we know we want it in the future). It avoid more than a dozen other changes that would have been necessary. This was beneficial because the syntax model serves tooling, and making these nice focused and localized changes makes tooling and maintaining our code much simpler. I see the same as being true for VariableDeclarations. I do agree it doesn't match precisely the shape of the language. I just care very little as that's one of the lowest things on my priority list when deciding what we do with the syntactic model. |
Where is it incomplete?
Not in my opinion. Indeed, i'm a fan of it because i don't think it exhibits barely any warts at all. The only one i'm even slightly feeling is there is that it allows for a node that can produce an empty production. But that's so minor to me, and so irrelevant in practice that i can't even must the slightest bit of caring about it :) |
There is a difference between, on the one hand,
The former is a useful pattern we use successfully throughout the syntax models. The latter is an anti-pattern that we avoid. |
It's not the work that worries me. It's that i don't think the new proposal is a good choice and i feel like it will add more cost than i want to the IDE layer :-/
Note that part of my worry is that we can't just "migrate" to this new model. We must preserve all the code to work with the current shipped model as well. As we cannot go and fix up all the code from all our customers, then we're left in a state where even if you want to go do all the work, we now have a higher maintenance cost going forward. |
We always do. And so do our customers. That code will continue to work just as well tomorrow as it does today. Both we and our customers have to support the new language features if they want to support them, no matter how we represent them. That is the cost of adding features to the language. |
At an added cost for us. My proposal does not have that cost. Our existing code works on the same existing constructs that the parser/third-parties produce today. We only need to add support for the deconstruction cases. In your model the parser would not produce new nodes for foreach statements. But we'd still need to maintain the old code for foreach because customers produce them even if the parser doesn't anymore. That's a maintenance cost for us and it's not something i want to pay when there's an alternative that allows me to keep the same code working today on hte same foreach statements. |
Mine too!
I don't care as much about how we handle
I'm fine leaving the syntax nodes for the existing |
I think we're still talking in circles. AFAICT, the decisions that led to the original design still hold just fine. I'm not feeling that the objection to how Furthermore, the only other new issue i've seen presented (i.e which wasn't already accepted when we did this originally), is that there's some problem with factory methods. But without actually demonstrating what that problem is, i can't see this as an issue that would prevent us from sticking with the original design. Without sufficient good enough reasons, i would push back on moving from the original proposal. |
The latter is something you would like to avoid :-) Whereas the splitting and bleeding approach is a pattern I would like to avoid. |
We're continuing to go round and round. It's clear where each of us stands. As this is a syntactic model decision, i think we need to have the meeting with the necessary stakeholders to decide where to go from here. My position (summarized) is that the existing proposal, with some small tweaks**, is the desirable course forward. Let's see how everyone else feels and go forward based on the collective group decision. ** Tweaks are:
|
I'm going to close this and reopen a more complete implementation when ready. |
This is a proposed revision of the syntax model for deconstruction declarations that is capable of supporting wildcards. It also supports wildcards and tuple deconstruction in out var. This resolves all of the issues around optional and missing parts and incompatibilities in previous proposals.
Only
Syntax.xml
is worth reviewing herein. I have not modified the rest of the compiler to accommodate the changes to the syntax model. I'd like to get feedback on the revised model before I update the rest of the compiler.@jcouv @VSadov @mattwar Please provide a preliminary review of the revised syntax model
@dotnet/roslyn-compiler @CyrusNajmabadi FYI
@tmat Thank you for the name suggestions