Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft proposed revision of syntax model for deconstruction declarations #12688

Closed
wants to merge 2 commits into from

Conversation

gafter
Copy link
Member

@gafter gafter commented Jul 22, 2016

This is a proposed revision of the syntax model for deconstruction declarations that is capable of supporting wildcards. It also supports wildcards and tuple deconstruction in out var. This resolves all of the issues around optional and missing parts and incompatibilities in previous proposals.

Only Syntax.xml is worth reviewing herein. I have not modified the rest of the compiler to accommodate the changes to the syntax model. I'd like to get feedback on the revised model before I update the rest of the compiler.

@jcouv @VSadov @mattwar Please provide a preliminary review of the revised syntax model
@dotnet/roslyn-compiler @CyrusNajmabadi FYI
@tmat Thank you for the name suggestions

@gafter
Copy link
Member Author

gafter commented Jul 22, 2016

Addresses #12664 #12588

Can be used to address #1503 if we use the new foreach node exclusively and deprecate (stop producing from the parser) the old one. If we aren't comfortable doing that, I don't have any idea how we would address #1503.

@@ -1852,13 +1837,71 @@
</Field>
<Field Name="Value" Type="ExpressionSyntax"/>
</Node>


<AbstractNode Name="MultiVariableDeclarationSyntax" Base="CSharpSyntaxNode">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a naming perspective, it seems like perhaps this should just be called BaseVariableDeclarationSyntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a base of VariableDeclarationSyntax.


In reply to: 71962344 [](ancestors = 71962344)

@CyrusNajmabadi
Copy link
Member

Hrmm.. this doesn't match my remembrance of what we decided on in our previous meetings. Namely that we would reuse VariableDeclarationSyntax.

<Kind Name="CloseParenToken"/>
</Field>
</Node>
<Node Name="WildcardVariableDeclaratorSyntax" Base="MultiVariableDeclaratorSyntax">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this IgnoredVariableDeclaratorSyntax.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 23, 2016

My recollection of our meeting on this was that we would have the following:

VariableDeclarationSyntax
    Type
    DeconstructionSyntax_opt
    List<VariableDeclarator>

DeconstructionSyntax
     (    SeparatedList<VariableDeclarationSyntax>    )

This would handle any deconstruction form we could think of except for: (*, int i) = ...

To that end we were having a discussion about fitting the wildcard version into the above. Two suggestions were proposed. Specifically:

  1. Make a WildCardTypeSyntax like so:
TypeSyntax
    ...
    WildcardTypeSyntax

WildcardTypeSyntax
    *

Then, when we saw * that would be a VariableDeclaration with a WildcardType, no deconstruction, and no variable declarators.

  1. Make * a legal contextual identifier. Then, when we saw * we would have a VariableDeclaration with an OmittedType, no deconstruction, and a VariableDeclarator with an Identifier for *

In both cases, i would recommend having a different Kind for the VariableDeclaration. Perhaps call it WildcardVariableDeclaration.

In our email thread on the topic, you raised concerns about '1', but i pointed out why i think it's a totally ok perspective on what * means. You also were concerned about the incremental parsing impact of '2', but i believe i demonstrated why that should not be a concern here.

Given the suitability of using VariableDeclarationSyntax for this area, i'm not really behind the proposal put forth here as i'm not sure it is necessary and it seems to add more complexity than i would like for these new variable declaration scenarios.

@gafter
Copy link
Member Author

gafter commented Jul 23, 2016

This is not intended to be a reflection of the process that resulted in the previous revision of the syntax. That revision ran into problems, including a number of design violations that we do not have a solution for

  • a mismatch between the shape of the syntax API and its factories, that required making changes to the syntax generator that were rejected in design review
  • no good mechanism to address the incompatibility moving from required fields to optional fields (e.g. Type in VariableDeclaration, and the new violation of the invariant that there is at least one variable declarator in VariableDeclaration)
  • a mismatch between the shape of the syntax trees and the shape of the language specification that is far beyond what we've accepted previously (e.g. nodes with all members optional being used to represent disjoint language contexts)
  • a public API shape for the syntax trees that will be very difficult to use for consumers due to the plethora of optional fields that are meaningless in context

This revision is intended to be a solution that addresses all of these issues, and also facilitates further extension of this area of the syntax (e.g. wildcard support, support for flexible parsing in error scenarios, etc) by virtue of the new nonterminals.

My question for you is whether you believe this would be usable from the perspective of consumption in the IDE code base.


In reply to: 234688286 [](ancestors = 234688286)

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 23, 2016

I'd like to discuss each of these issues to better understand what they are, if they're important, and if they necessitate a new syntactic model.

I'm going to start with bullet 2:

no good mechanism to address the incompatibility moving from required fields to optional fields (e.g. Type in VariableDeclaration,

This is not my recollection. When we dicussed this model it was stated that we would not make Type optional and we would use 'OmittedTypeSyntax' to represent the type. The syntactic model would be:

User Form Type Deconstruction Declarators
int i int <null> i
var (x, y) var (x, y) <empty>
(x, y) OmmittedType (x, y) <empty>
x (when used inside a Deconstruction) OmittedType <null> x
* (option 1) WildCardType <null> <empty
* (option 2) OmmittedType <null> *

Note that in the above model Type is never null.

The Omitted form was thought to be an acceptable way to bridge older null-rejecting APIs with new language constructs that allowed for optionality.

and the new violation of the invariant that there is at least one variable declarator in VariableDeclaration)

I have looked and i can see no such invariant in our system. For example, the SyntaxFactory we've already shipped has:

        public static VariableDeclarationSyntax VariableDeclaration(TypeSyntax type)
        {
            return SyntaxFactory.VariableDeclaration(type, 
                            default(SeparatedSyntaxList<VariableDeclaratorSyntax>));
        }

It has always been possible to create a VariableDeclaration with no declarators. This does not change with the API that was originally proposed. Indeed, i don't believe we have any invariants with lists ever. They can always be 0, 1 or many in the actual tree.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 23, 2016

Bullet 3:

a mismatch between the shape of the syntax trees and the shape of the language specification that is far beyond what we've accepted previously (e.g. nodes with all members optional being used to represent disjoint language contexts)

First, as has been discussed (with support from other team members as well), it is not a given that the SyntaxModel and Language grammar have to have the tight coupling you're requiring. Second, if we do feel such a tight coupling is appropriate/desirable, then i believe that the syntax model should inform the grammar and not the other way around. First and formost, the syntax model has significant binary compatibility requiremetns. As such, if we make it follow the grammar, then we run into these sorts of issues where we end up with a much more complex syntax model than otherwise necessary. Conversely, the spec is much easier to change. It has no binary compat concerns. It cares about the compatability of the language. But it is free to reword/restructure any part of itself as long as it preserves that. It can change it's grammar if it needs to.

Second, it should be clear that we've already decided to substantially deviate from the grammar in teh spec. The grammar in the spec is veyr over-constraining. That's terrific for a language spec as it allows hte grammar to just wholely exclude large swaths of inputs without needing to explain why they're not legal. But the same is not as desirable for our syntactic model. There, we want to operate in a much more flexible manner even if it allows for legal syntax trees that go far beyond what the language actually things is syntactically valid.

Finally, even if the above statement is correct and this is "far beyond what we've accepted previously" (which i disagree is the case), then that's not a case where "we do not have a solution". The solution is simple: accept this form :)

@CyrusNajmabadi
Copy link
Member

a public API shape for the syntax trees that will be very difficult to use for consumers due to the plethora of optional fields that are meaningless in context

This comes down to opinion. From my own experience consuming our syntax model, the original API approach feels much simpler and easier to use.

@CyrusNajmabadi
Copy link
Member

a mismatch between the shape of the syntax API and its factories, that required making changes to the syntax generator that were rejected in design review

I need this to be explained better. I asked for examples on the internal email thread on this and none were provided.

I would see no need for a mismatch between the shape of hte API and the factories. Namely i would expect the following to be generated:

// Core factory method for full signature.
VariableDeclaration VariableDeclaration(TypeSyntax, Deconstruction, SeparatedSyntaxlist<VariableDeclarator>); // newly added

// Minimal factory method based on optional properties and lists.
// Note: this factory method already exists.
VariableDeclaration VariableDeclaration(TypeSyntax); 

// Bridge method we would manually add to preserve binary compat:
VariableDeclaration VariableDeclaration(TypeSyntax, SeparatedSyntaxlist<VariableDeclarator>); 

It's unclear what changes we would need to make to the generator at all to support this. The addition of an optional property to an existing construct should be no problem.

Indeed, we've moved the syntax model forward using that approach numerous times already.

@CyrusNajmabadi
Copy link
Member

My question for you is whether you believe this would be usable from the perspective of consumption in the IDE code base.

Before i can get to that, i want to understand this better:

That revision ran into problems, including a number of design violations that we do not have a solution for

That revision was simply the addition of a single optional property to an existing declaration node. This has never caused design violations in the past, so i'm not understanding why it caused design violations now.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 23, 2016

no mechanism to address the incompatibility moving from required fields to optional fields (e.g. Type) that would not significantly increase the size of a typical syntax tree

This concern does not make sense to me. First, we've clearly demonstrated in our syntax model so far that it is acceptable to have optional fields despite the extra space they will take up. For example, here is a small list that causes excess usage despite alternate syntactic models that could have avoided it:

  1. All variable declarators have an optional initializer. That means we pay for the space for = expr even if there is no initializer.
  2. All 'if' statements have space for an optional 'else clause'.
  3. All 'return' statements have space for an optional expression.
  4. Lambdas have space for an 'async' modifier, despite being extremely rare.
  5. All local declarations have a slot for modifiers, despite modifiers almost never being used on locals.
  6. All arguments have two optional spaces. One for the named arg portion, and one for out/ref. Arguments are everywhere (likely more common that variable declarations), and we accept that amount of space increase.

As we've added new language features, we've also gone with the optional slot model, despite the increase in space. For example:

  1. Properties now support both = expr and => expr forms. These are modeled as two optional slots on all properties.
  2. All method-like entities have space for a semicolon, block, and now an arrow initializer.

Our other new language constructs even started down the route of adding slots, without any concern about the cost there. For example, the original 'ref' syntax proposal added the 'ref' keyword all over the place (including as a new slot for all local declarations, all equals-value clauses, all returns, etc. etc.). Now, we're not going with that model, but it's not because of the concern over tree size growth.

The syntax model, early on, went with a model of preferring node reuse, with optional slots for cases like this. When we could share concrete nodes we chose to in a tremendous number of cases. Subclassing was used when there was little to share, and there was substantial difference in the individual syntactic forms. Something which does not seem to be the case for VariableDeclarations even in the presence of Deconstructor declarations.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 23, 2016

@mattwar i'd like to hear your thoughts on this. You have a lot of feelings on the design behind hte syntax model and the thought process around changing it as we move forward. If you have the time, can you read through this thread and let us know how you feel about this?

Thanks!

@gafter
Copy link
Member Author

gafter commented Jul 23, 2016

@CyrusNajmabadi

This comes down to opinion. From my own experience consuming our syntax model, the original API approach feels much simpler and easier to use.

Can you please expand on this, and provide specifics?

@CyrusNajmabadi
Copy link
Member

From my own experience consuming our syntax model, the original API approach feels much simpler and easier to use.

Can you please expand on this, and provide specifics?

Sure. When i see VariableDeclaration i can now use my knowledge of that type everywhere I see it in the API.

It also helps me avoid missing things. For example, say we have this new syntax model for variables, now i also need to know that it's not sufficient to just make my code work on LocalDeclarationStatement, i also have to make it work on the new syntax for local declarations that suppports Deconstruction. The same is true for all other forms of the syntax model where we introduce this new node. I now need to support foreach and multivariable-foreach for example. I'd much rather just have my code that works for 'foreach' statements, and then only have to care about the differrences in the variable declaration if that's what my domain cares about.

Yes, i could just target CommonForeachSyntax. However, that's very different than how we've done the much of the rest of the syntax API. I don't need to operate on CommonForStatement and then cast down to VariableDeclarationForStatement and ExpressionBasedForStatement. I don't need to cast down from CommonProperty to InitializerProperty or ExpressionBodyProperty. etc. etc. Yes, this pattern does exist elsewhere, but IMO it's the uncommon case and should only be done when appropriate/necessary (which i don't feel is the case here).

When the syntax model changes like this, it ends up bleeding higher and making me care about this distinction at a level that doesn't make sense to me. My intuition using the language and syntax model is there is no difference for the foreach itself for:

foreach (var i in ...) {} vs foreach ((var x, var y) in ...) {}

To me, it's much easier and more intuitive if the difference in shape happens at as low a point as possible.

Therefore, all other things being equal, i'm going to strongly prefer syntactic formulations that do not cause this sort of spread upwards through the hierarchy. To me this make it feel like we're bifurcating things

Also, the approach here means i now need to understand yet another way of describing what seems to be the same thing to me. When i have, for example (var v), that "var v" is not the same as the "var v" i would have gotten nearly every other place**

"var v" in a field is a VariableDeclaration. "var v" in a local is a VariableDeclaration. "var v" in a "for" statement is a VariableDeclaration. "var v" in a FixedStatemnet is a VariableDeclaration. "var v" in an event is a "VariableDeclaration" By adding another place where "var v" is not a VariableDeclaration you add another tax to my mental model and you make it so i need to add more code that attempts to paper over this syntactic decision so that the IDE code can treat that uniformly with all the other VariableDeclarations out there.

With a unified VariableDeclaration form, i only need to learn the distinction once, and i can now apply that knowledge everywhere that VariableDeclaration appears. I can write code that can be used in many more places. I don't need more surface area (for example, on SemanticModel) to ask the same questions i already have the ability to ask.

So, going back to my original statement. I want to be able to ignore the different forms of variable declaration until i'm doing something that would actually care about that distinction. For code that doesn't care about the distinction, i don't want to have to see how this type bled upwards. And for code that does care about the distinction, i also don't want to have to care how it bled upwards :)

That's why i find the original approach much simpler and easier. The distinction doesn't bleed out. And it means i can learn one thing and then use that in far more places.

** Note: i do understand we violated that with things like "foreach". I personally really regret that we did things like have "foreach" and "catch" not have a VariableDeclaration inside of them. I view that as a very big mistake that we have no good solution for. I do not want to compound that mistake by adding new constructs that don't use VariableDeclaration when they would be able to.

@@ -2035,6 +2078,7 @@
<Field Name="RefKeyword" Type="SyntaxToken" Optional="true">
<Kind Name="RefKeyword"/>
</Field>
<Field Name="Deconstuction" Type="DeconstructionDeclarationAssignment" Optional="true"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deconstruction

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 23, 2016

Note: this was a similar reason i pushed back on the original 'ref' syntactic changes. The original addition made it necessary to see and absorb the impact on a large number of nodes. And the impact felt like a disjoint merging of concerns.

To absorb that change, lots more code would have now have to be updated to handle 'ref' on declarations. And the same was true for all the places it was showing up effectively at the expression level. With the final syntactic model we decided on, 'ref' ended up being added syntactically with only two syntax changes:

  1. A new 'RefExpression'.
  2. Additional of 'DeclarationModifiers' to for-statements.

The syntactic fallout of this change was now incredibly minor. The changes to respond to this will be commensurately minor and much easier to do.

The larger the syntactic fallout, the larger the impact on all higher layers. And the more code we need at layers that appears to exist just to support the syntax choices, versus code that feels necessary to represent the domain of that feature itself.

@gafter
Copy link
Member Author

gafter commented Jul 24, 2016

@CyrusNajmabadi

Sure. When i see VariableDeclaration i can now use my knowledge of that type everywhere I see it in the API.

That remains the case in the proposed API.

It also helps me avoid missing things. For example, say we have this new syntax model for variables, now i also need to know that it's not sufficient to just make my code work on LocalDeclarationStatement, i also have to make it work on the new syntax for local declarations that suppports Deconstruction. The same is true for all other forms of the syntax model where we introduce this new node. I now need to support foreach and multivariable-foreach for example. I'd much rather just have my code that works for 'foreach' statements, and then only have to care about the differrences in the variable declaration if that's what my domain cares about.

If we take the proposed solution for #1503, we'd only ever generate the new foreach statement, and you'd only have one syntax form to handle.

Yes, i could just target CommonForeachSyntax.

Not necessary with the proposed fix for #1503.

Therefore, all other things being equal, i'm going to strongly prefer syntactic formulations that do not cause this sort of spread upwards through the hierarchy.

As described, it would not.

Also, the approach here means i now need to understand yet another way of describing what seems to be the same thing to me. When i have, for example (var v), that "var v" is not the same as the "var v" i would have gotten nearly every other place**

Indeed, it is not, and it needs to be treated differently!

"var v" in a field is a VariableDeclaration. "var v" in a local is a VariableDeclaration. "var v" in a "for" statement is a VariableDeclaration. "var v" in a FixedStatemnet is a VariableDeclaration. "var v" in an event is a "VariableDeclaration" By adding another place where "var v" is not a VariableDeclaration you add another tax to my mental model and you make it so i need to add more code that attempts to paper over this syntactic decision so that the IDE code can treat that uniformly with all the other VariableDeclarations out there.

But a deconstruction declaration is fundamentally different. All of those other contexts have a list of declarators-with-initializer, and the deconstruction doesn't. The "initializer" in a deconstruction isn't associated with any particular variable, and it isn't converted to something. The initializer is part of the deconstruction statemnet, and it is subject to deconstruction.

With a unified VariableDeclaration form, i only need to learn the distinction once

I think you're confusing the aggregation of unlike things with unification. In the old syntax, it isn't unified, they are represented by mutully exclusive sets of subtrees. You need to learn that distinction once for this proposed form too, except that the shape of the syntax helps you instead of hindering you.

So, going back to my original statement. I want to be able to ignore the different forms of variable declaration until i'm doing something that would actually care about that distinction. For code that doesn't care about the distinction, i don't want to have to see how this type bled upwards. And for code that does care about the distinction, i also don't want to have to care how it bled upwards :)

With this proposal, that is exactly what you would be able to do. You'd be able to handle one kind of foreach loop, for example, and ignore what it declares. One kind of field. One kind of for loop. One kind of fixed statement. And for each of them, the shape of the syntax tree would be helpful in understanding what can appear in the code. That is particularly useful for producers of syntax trees, which are heavily burdened by the old model.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 24, 2016

it feels like we're talking in circles. We fundamentally see things differently and that affects how words like "simple" and "intuition" play out.

If we take the proposed solution for #1503, we'd only ever generate the new foreach statement, and you'd only have one syntax form to handle.

I am loath to solve our versioning story by just no longer returning old constructs and instead generating new ones. This means IDE code now needs to be bifurcated to handle older trees (which may be returned by existing clients) as well as the new trees that the current parser would produce.

As described, it would not.

I'm not understanding how that's the case. In this very review i see DeconstructionDeclarationStatementSyntax added. The deconstruction has bled up through to higher levels.

But a deconstruction declaration is fundamentally different.

I could say that about so many other parts of the API that we've provided our current syntactic model on.

Furthermore, any time we add something new to the language, we're adding something "fundamentally different", hence why we're adding it in the first place. Because the language clearly does not support it and thus it must be different from what we have.

That does not ever mean htat we need our syntactic model to be versioned in the manner presented here. Indeed, that was the very argument made about the 'ref' feature. The language feature introduced things that were 'fundamentally' different, and initially we had a syntactic model that really pushed that 'different' view. However, we still finalized on a syntactic incorporation that was still very localized and minimal.

The syntactic model is ours to do with as we please. We could introduce a massively new type of feature in the language with massively different semantics from what we have today (i.e. switches, ref, etc.), but that doesn't preclude us from reusing existing syntactic elements. There is no requirement or necessity that this be the case.

All of those other contexts have a list of declarators-with-initializer, and the deconstruction doesn't.

You have my argument reversed. My argument is that allowing VariableDeclaration to represent both the 'declarators-with-initializer' and the 'deconstruction' means that those existing constructs can now support both these forms without bifurcation. It may be the case that today we won't allow Deconstruction. But if we ever relax that then that's trivial to do.

These existing constructs use VariableDeclaration already. Now they're capable of representing deconstructions without any need to do anything to them. The impact on the syntactic model is kept very narrow.

I think you're confusing the aggregation of unlike things with unification

You keep stating opinion as fact. You feel they are unlike. I feel that they are alike. These are both opinions. The fact that i don't feel the same way as you does not make me confused. I'm very familiar with these features and i've spent a lot of time involved on the discussions about them as well as experimenting with using them. I'm also familiar with this whole family of features thanks to experience in other languages that provide them. I see and accept how you view them, but that doesn't mean that how you view them is the only true way to do so.

To get into specifics: To me both the forms are about declaring variables. One declares by taking a value, deconstructing (possibly recursively), and storing into variables. The other takes a value and stores into a variable. These are extremely similar to me. I say that as both a user of the language and someone deeply involved in consuming and writing features that sit on top of our APIs.

To me, these are incredibly close, and that's one of the reasons (IMO) it was so trivial to incorporate all the different syntaxes together just by performing a very small augmentation to the syntactic model.

With this proposal, that is exactly what you would be able to do. You'd be able to handle one kind of foreach loop, for example, and ignore what it declares

We can not stop supporting our features on older trees that our clients are producing. We will need to maintain our code for that.

@CyrusNajmabadi
Copy link
Member

You have my argument reversed. My argument is that allowing VariableDeclaration to represent both the 'declarators-with-initializer' and the 'deconstruction' means that those existing constructs can now support both these forms without bifurcation.

Note: this part is important, and why i so regret that we did not use VariableDeclaration in many more places in the language.

Were we to have used VariableDeclaration in places like foreach the incorporating 'deconstructors' would have been easy without any work on the foreach side. It woudl also have meant we'd need a smaller SemanticModel surface, with less entrypoints to map from variable declarations to symbols. And it would have allowed more sharing of code at the higher layers that depend on the semantic model.

I'd like us to move away from the model where we say "the syntactic node represents a super-set of what is allowed here. Hence we need a specific node that is more restrictive". I'd much rather say "the syntactic node may allow for more, though the parser may not ever produce nodes in that super-set".

These restrictive nodes end up causing a lot more work to need to be done on the consumption side. And, as demonstrated here, mean that absorbing new language changes ends up costing much more because each specific area needs a specific solution to address the limitation.

@CyrusNajmabadi
Copy link
Member

We have error declarations, and we can parse and represent anything we want with them.

You are now suggesting a new approach to errors that has not been tried before.

Furthermore, the cases where we have done something akin to that (i.e. Incomplete-Whatever) have traditionally been very under-served at both the compiler and IDE layers. I do not want us to introduce a new unproven mechanism when we have an existing system that works.

In the proposed syntax, because we have abstract nodes that enable us to extend the representation, we can introduce error deconstruction, error declarators, etc for whatever we find to be common and useful scenarios, if we are willing to put in the work in the parser, binder, IDE, etc.

Great. Now we will need an error deconstruction/declarator that adds more nodes into the system. Which means more cases to handle at the IDE layer. Which requires more binding code. Which may require more semantic model entrypoints.

This makes consuming this feature more expensive at hte higher layers. The semantic model exists to serve the compiler and IDE layers first and foremost. It should be implemented to make us efficient and to not introduce unnecessary taxes on it. I do not approach these discussions from a perspective of "we should have a pure translation of the spec into code form" i come from them directly from a perspective of "i need to consume this and i need to absorb the costs of these API decisions". As such, i'm going to push back on APIs that increase costs without buying me as the consumer literally anything.

I want to repeat that last point. This isn't a situation with pros/cons where i can at least see some positives of moving to this new model from my consumptive purposes. This is a situations where the new model appears to be, across the board, worse to consume and maintain for us. Without substantial demonstrated benefits for our use cases that's not going to be something i'm going to support.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 25, 2016

The current syntax does not merely "omit" a type in the places you propose to use that pattern... in most of those contexts the syntax forbids them.

To repeat: i really do not care what the syntax forbids :-) . The syntax forbids interface members from having block bodies. And yet we allow it. The syntax forbids in/out modifiers on class type parameters. And yet we allow it.

We allow it because the purpose of the syntactic model is to not serve the syntax of the language or the language specific. The purpose of the syntactic model is to serve tooling. That's it. That is it's sole purpose. While it is nice when it can be more in line with the language-specification, that is a complete non-requirement.

@CyrusNajmabadi
Copy link
Member

The previous revisions made changes to permit the auto-generated factory methods to be internal rather than public. That is being reversed.

It sounds like the argument there is:

"As part of the original model we also thought we might make some factory changes. But we're deciding against making those factory changes."

That seems to be an argument against those factory changes (which i would have also likely been against as i see no need for factory changes). But it's not an argument against the syntactic model proposed.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 25, 2016

I think one thing we need to come to a conclusion on first:

Can we deprecate existing nodes and move entirely to new, better shaped ones? If we do what is our deprecation story allowed to be?

A lot the angst here is over the cost of still needing to maintain support for old nodes as we then have to go support new nodes. If, for example, we were allowed to deprecate/remove and then no longer support these old nodes at all, that would drive down the costs.

If, however, we still need to understand and support the old nodes, then i'm going to push for approaches that have us continuing to use them over approaches that introduce new replacement nodes.

@gafter
Copy link
Member Author

gafter commented Jul 25, 2016

This is not the model i proposed. My model would produce:

  Modifiers: SyntaxList<SyntaxToken> = default
  Declaration: VariableDeclarationSyntax - Kind DeconstructionDeclaration // <-- note the kind.

DeconstructionDeclaration is not a subtype of VariableDeclarationSyntax so that does not work. We cannot change the type of existing fields.

There is also no such thing as OmittedTypeSyntax.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 25, 2016

DeconstructionVariableDeclaration.

@gafter
Copy link
Member Author

gafter commented Jul 25, 2016

DeconstructionVariableDeclaration is not a subtype of VariableDeclarationSyntax. The latter is a sealed class, so there are no subtypes aside from itself.

@gafter
Copy link
Member Author

gafter commented Jul 25, 2016

I think one thing we need to come to a conclusion on first:

Can we deprecate existing nodes and move entirely to new, better shaped ones? If we do what is our deprecation story allowed to be?

I agree we need to decide this in order to decide how to handle the foreach loop, and that is true whatever we do with local-variable-declaration. It has no impact on the syntax model for deconstruction.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 25, 2016

DeconstructionVariableDeclaration is the kind, not the type. That's why i said "Not the kind". The type will remain VariableDeclaration. However, in my original proposal i stated that it would be valuable to have different kinds to make it easy for people to consume without deep introspection. For example, i expected we would have at least:

  1. SyntaxKind.VariableDeclaration. For: existing declarations.
  2. SyntaxKind.WildcardVariableDeclaration. For: *
  3. SyntaxKind.DeconstructionVariableDeclaration. For: (int x, int y).
  4. Optionally: SyntaxKind.VarDeconstructionDeclaration. For: var (x, y)

Note: the naming isn't finalized.

The last one is optional because i'm not certain i want a separate kind for the (int x, int y) case or the var (x,y) case. I could be ok with them using the same kind. (Though i'm leaning toward having separate kinds personally).

@gafter
Copy link
Member Author

gafter commented Jul 25, 2016

It sounds like the argument there is:

"As part of the original model we also thought we might make some factory changes. But we're deciding against making those factory changes."

No, the argument was that the unpleasant side-effects of the previous model would be partially tempered by the introduction of factory changes.

@CyrusNajmabadi
Copy link
Member

No, the argument was that the unpleasant side-effects of the previous model would be partially tempered by the introduction of factory changes.

I really need more actual examples provided. There is talk about what was going on in the factory, and no actual specifics. AFAICT, there would be no need for anything but a bridge method to allow you to still call into .VariableDeclaration(Type, SeparateList<VariableDeclarator>)

Without actual specifics on the problem here, i'm going to have to reject this as a bullet point against the original proposal.

@gafter
Copy link
Member Author

gafter commented Jul 25, 2016

For example, i expected we would have at least:

SyntaxKind.VariableDeclaration. For: existing declarations.
SyntaxKind.WildcardVariableDeclaration. For: *
SyntaxKind.DeconstructionVariableDeclaration. For: (int x, int y).
Optionally: SyntaxKind.VarDeconstructionDeclaration. For: var (x, y)

That is interesting, and though I have the same problems with it, it is nothing like where we ended up. I'm proposing an alternative to the syntax model we have today.

@CyrusNajmabadi
Copy link
Member

it is nothing like where we ended up

I feel like we somehow ended up in a different place than what we originally discussed. That original place may have some problems. However, it seems like addressing those problems is much simpler than just replacing the original proposal with a new one entirely.

For example, it sounds like:

  1. we went with * is an identifier vs a Type. That is trivially solved with the introduction of a wildcard type.
  2. we had some problems with factories. But i really want to see waht they are before believing that they warrant scrapping things altogether.
  3. there is concern about consumption. But i believe kinds trivially make consumption fine. And that consumption is super simple when there is a single VariableDeclaration we can count on everywhere without needing things like new syntactic constructs, forked syntactic nodes, and new semantic APIs to support them.

The other concerns (closeness to the spec's grammar) are things i think should not be included as part of the discussion, except as some sort of tie breaker of otherwise equally good proposals. They should not be part of the initial comparison of proposals as those should be weighed entirely on their appropriateness for our tooling.

@gafter
Copy link
Member Author

gafter commented Jul 25, 2016

I feel like we somehow ended up in a different place than what we originally discussed. That original place may have some problems. However, it seems like addressing those problems is much simpler than just replacing the original proposal with a new one entirely.

Don't worry about the work involved. I'm so uncomfortable with the existing direction that I'll do a migration of the entire Roslyn code base.

  1. we went with * is an identifier vs a Type. That is trivially solved with the introduction of a wildcard type.

That is a deep mismatch with the shape of the language. See Mads' grammar. Semantically * takes the place of (what would otherwise be) an identifier, or a type and and identifier.

  1. we had some problems with factories. But i really want to see what they are before believing that they warrant scrapping things altogether.

All of the fields of a VariableDeclaration (including the new one) are now optional both for producers and consumers. That provides little guidance to either. This alone may be sufficient to scrap the previous API, but combined with everything else its just one more thing.

  1. there is concern about consumption. But i believe kinds trivially make consumption fine.

The Kinds don't affect the set of members of the type.

The other concerns (closeness to the spec's grammar) are things i think should not be included as part of the discussion, except as some sort of tie breaker of otherwise equally good proposals.

I completely disagree, but what we have is far from equally good proposals. The proposal herein models the kinds of factorings to map the language specification to the syntax model in v1.0, though the existing mapping has a few warts. The model proposed to be replaced is both incomplete and exploits warts to the maximum.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 26, 2016

No, the argument was that the unpleasant side-effects of the previous model would be partially tempered by the introduction of factory changes.

I can't reconcile that with: "a mismatch between the shape of the syntax API and its factories, that required making changes to the syntax generator that were rejected in design review"

Can we explicitly state what factory changes were actually being proposed. We've stated that we wanted to temper things by introducing some changes, but those changes were then rejected.

Not knowing either:

  1. the actual proposed changes.
  2. why they were rejected.

is making it impossible to actually address this point.

As i mentioned above, the only factory method i would feel we would be required to add would be a binary-compat bridge method for .VaraiableDeclaration(TypeSyntax, SeparatedList<VariableDeclarator>.

Now, i can think of other "nice to have" factory methods. But i don't think they're necessary. And i would not state the proposal could not go through because of them. These nice-to-haves could include:

  1. A helper for .WildcardVariableDeclaration() that would create the * VariableDeclaration with the correct TypeNode and the correct kind.
  2. A helper to make the var (a, b, c) form out of an (a, b, c) form. Note: that helper will be needed in any proposal as "var" is just an IdentifierName/IdentifierToken, so the SyntaxGenerator won't know that it can be elided from the helpers it auto-generates.***
  3. A helper to create the VariableDeclaration for the a node in var (a, b, c).
  4. A helper to create a VariableDeclaration out of a DeconstructionDeclaration.

(***Note that '2' is very-very-optional. We don't have helpers to make "var" more convenient in other areas AFAICT, so we could just require you to call the normal VariableDeclaration factory, passing in 'var' as the Type).

In other words, for all the cases where i mentioned wanting new SyntaxKinds for the VariableDeclaration node, i think it would be nice (not required) to have a corresponding hand-written factory helper to call to make such a node for you.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 26, 2016

That is a deep mismatch with the shape of the language.

You say that.. but i think that's circular. To me, it's completely reasonable to look at * and say "it means: allow for a variable of any type, which i won't even bother naming because i'm not going to use it".

At the end of the day the result for the usage of the language is the same. As such, i see no issue with thinking about it in that manner.

Not that we're also going with other syntactic formalizations which also don't necessarily 'match' the 'shape of the language.

For example, "ref Type" and "ref Expressions" seems to be where we're going with 'ref' in our syntactic model. These are likely not going to be the formilizations in the language itself. As has been mentioned by people like vlad/jared, 'ref type' is not a thing. And yet, it still provides from one of the cleanest ways of integrating "ref" into our syntactic model. Similarly, vlad does not believe 'ref expression' is a real thing. But it means we can integrate ref into so many contexts without having to bleed the ref higher.

At the end of the day, but incorporating in tihs manner, we just introduce two simple types and we get 'ref' all the places we want it today (and the places we know we want it in the future). It avoid more than a dozen other changes that would have been necessary.

This was beneficial because the syntax model serves tooling, and making these nice focused and localized changes makes tooling and maintaining our code much simpler.

I see the same as being true for VariableDeclarations. I do agree it doesn't match precisely the shape of the language. I just care very little as that's one of the lowest things on my priority list when deciding what we do with the syntactic model.

@CyrusNajmabadi
Copy link
Member

The model proposed to be replaced is both incomplete and

Where is it incomplete?

exploits warts to the maximum.

Not in my opinion. Indeed, i'm a fan of it because i don't think it exhibits barely any warts at all. The only one i'm even slightly feeling is there is that it allows for a node that can produce an empty production. But that's so minor to me, and so irrelevant in practice that i can't even must the slightest bit of caring about it :)

@gafter
Copy link
Member Author

gafter commented Jul 26, 2016

There is a difference between, on the one hand,

  1. Introducing a new node (with no optional parts) to represent something rather than including it in every containing node. Like the proposed "ref expression". That's factoring. And, on the other hand,
  2. Adding a new optional field to an existing node, making all of its members optional, and then using it to represent syntactically distinct language elements.

The former is a useful pattern we use successfully throughout the syntax models. The latter is an anti-pattern that we avoid.

@CyrusNajmabadi
Copy link
Member

Don't worry about the work involved.

It's not the work that worries me. It's that i don't think the new proposal is a good choice and i feel like it will add more cost than i want to the IDE layer :-/

I'm so uncomfortable with the existing direction that I'll do a migration of the entire Roslyn code base.

Note that part of my worry is that we can't just "migrate" to this new model. We must preserve all the code to work with the current shipped model as well. As we cannot go and fix up all the code from all our customers, then we're left in a state where even if you want to go do all the work, we now have a higher maintenance cost going forward.

@gafter
Copy link
Member Author

gafter commented Jul 26, 2016

We must preserve all the code to work with the current shipped model as well.

We always do. And so do our customers. That code will continue to work just as well tomorrow as it does today.

Both we and our customers have to support the new language features if they want to support them, no matter how we represent them. That is the cost of adding features to the language.

@CyrusNajmabadi
Copy link
Member

That code will continue to work just as well tomorrow as it does today.

At an added cost for us.

My proposal does not have that cost. Our existing code works on the same existing constructs that the parser/third-parties produce today. We only need to add support for the deconstruction cases.

In your model the parser would not produce new nodes for foreach statements. But we'd still need to maintain the old code for foreach because customers produce them even if the parser doesn't anymore.

That's a maintenance cost for us and it's not something i want to pay when there's an alternative that allows me to keep the same code working today on hte same foreach statements.

@gafter
Copy link
Member Author

gafter commented Jul 26, 2016

That code will continue to work just as well tomorrow as it does today.

At an added cost for us.

My proposal does not have that cost. Our existing code works on the same existing constructs that the parser/third-parties produce today. We only need to add support for the deconstruction cases.

Mine too!

In your model the parser would not produce new nodes for foreach statements. But we'd still need to maintain the old code for foreach because customers produce them even if the parser doesn't anymore.

I don't care as much about how we handle foreach loops as I do care about the declaration, as long as we avoid the problematic approaches I outlined. I don't care whether we use new or old nodes for foreach loops for existing source.

That's a maintenance cost for us and it's not something i want to pay when there's an alternative that allows me to keep the same code working today on the same foreach statements.

I'm fine leaving the syntax nodes for the existing foreach loop alone (at the expense of not getting the better edit-and-continue behavior that we could get with the new nodes #1503).

@CyrusNajmabadi
Copy link
Member

I think we're still talking in circles. AFAICT, the decisions that led to the original design still hold just fine. I'm not feeling that the objection to how * can be represented is a satisfactory enough reason to scrap the design. That feels like a case of throwing the baby out with the bathwater to me.

Furthermore, the only other new issue i've seen presented (i.e which wasn't already accepted when we did this originally), is that there's some problem with factory methods. But without actually demonstrating what that problem is, i can't see this as an issue that would prevent us from sticking with the original design.

Without sufficient good enough reasons, i would push back on moving from the original proposal.

@CyrusNajmabadi
Copy link
Member

The former is a useful pattern we use successfully throughout the syntax models. The latter is an anti-pattern that we avoid.

The latter is something you would like to avoid :-)

Whereas the splitting and bleeding approach is a pattern I would like to avoid.

@CyrusNajmabadi
Copy link
Member

We're continuing to go round and round. It's clear where each of us stands. As this is a syntactic model decision, i think we need to have the meeting with the necessary stakeholders to decide where to go from here.

My position (summarized) is that the existing proposal, with some small tweaks**, is the desirable course forward.

Let's see how everyone else feels and go forward based on the collective group decision.

** Tweaks are:

  1. Have a WildcardType.
  2. (Optional) Have a handful of factory methods for easily making the VariableDeclarations.
  3. (Optional) Consider having the different VariableDeclaration forms have different Kinds. Similar to how we use have things like PrefixUnaryExpression but have different kinds so you don't have to examine constituent parts. Or checked/unchecked expressions which are both represented with a single CheckedExpression node.

@gafter
Copy link
Member Author

gafter commented Jul 28, 2016

I'm going to close this and reopen a more complete implementation when ready.

@gafter gafter closed this Jul 28, 2016
@gafter gafter deleted the ddecl-model branch May 24, 2018 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants