Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced C#: a friendly hello #11324

Closed
qwertie opened this issue May 16, 2016 · 92 comments
Closed

Enhanced C#: a friendly hello #11324

qwertie opened this issue May 16, 2016 · 92 comments

Comments

@qwertie
Copy link

qwertie commented May 16, 2016

I'm terribly embarrassed.

For the last few months I've been working on a tool called LeMP that adds new features to C#. I recently published its "macro" reference manual. This month I was going to start publicizing my "Enhanced C#" project when I discovered that the design of C# 7 had already started well before C# 6 was officially released - and even more shocking, that this design work was being done "in public" right on GitHub!

It kills me that I didn't realize I could have participated in this process, and that "my" C# was drifting apart from C# 7 for over a year. Oh well - it is what it is, and I hope that something useful can still be salvaged out of my work.

So, this post is to inform you about Enhanced C# - where it came from, and what it offers that C# 7 does not.

A brief history

As a class project in my final year of university, I extended a compiler with a new feature (unit type inference with implicit polymorphism), but (to make a short story shorter) the authors of the language weren't interested in adding that feature to their language. This got me thinking about our "benevolent dictatorship" model of language development and how it stopped me, as a developer, from making improvements to the languages I relied on. Since I had already been coding for 15 years by that time, I was getting quite annoyed about writing boilerplate, and finding bugs at runtime that a "sufficiently smart compiler" could have found given a better type system.

So in 2007 I thought of a concept for a compiler called "Loyc" - Language of your choice - in which I wanted to create the magical ability to compile different languages with a single compiler, and also allow users to add syntax and semantics to existing languages. This system would democratize language design, by allowing third parties to add features to existing languages, and allowing language prototypes and DSLs to seamlessly interoperate with "grown up" languages like C#. But my ideas proved too hard to flesh out. I wanted to be able to combine unrelated language extensions written by different people and have them "just work together", but that's easier said than done.

After a couple years I got discouraged and gave up awhile (instead I worked on data structures (alt link), among other things), but in 2012 I changed course with a project that I thought would be easier and more fun: enhancing C# with all the features I thought it ought to have. I simply called it Enhanced C#. It started as a simple and very, very long wish list, with a quick design sketch of each new feature. Having done that I reviewed all the feature requests on UserVoice and noticed a big gaping hole: I hadn't satisfied one of the most popular requests, "INotifyPropertyChanged". So at that point I finally went out and spent three weeks learning about LISP (as I should have done years ago), and some time learning about Nemerle macros. At that point (Oct. 2012) I quickly refocused my plans around a macro processor and called it EC# 2.0, even though 1.0 was never written. I realized that many of the features I wanted in C# could be accomplished with macros (and that a macro processor doesn't require a full compiler, which was nice since I didn't have one) so the macro processor became my first priority.

So "Loyc", I eventually decided, would not be a compiler anymore, but just a loose collection of concepts and libraries related to (i) interoperability, (ii) conversions between programming languages, (iii) parsing and other compiler technology, which I now call the "Loyc initiative"; I've had trouble articulating the theme of it... today I'll say the theme of Loyc is "code that applies to multiple languages", because I want to (1) write tools that are embedded in compilers for multiple langauges, and (2) enable people, especially library authors, to write one piece of code that cross-compiles into many langauges. One guy wants to call it acmeism but that doesn't seem like the right name - I'd call it, I dunno, multiglotism or simply, well, loyc.

EC# and Roslyn

Roslyn's timing didn't work out for me. When I conceived EC#, Roslyn was closed source. I researched it a bit and found that it would only be useful for analysis tasks - not to change C# in any way. That wasn't so bad; but I wanted to explore "radical" ideas, which might be difficult if I had to do things the "Roslyn way". That said, I was inspired by Roslyn; for instance the original implementation of "Loyc trees" - the AST of EC# - was a home-grown Red-Green tree, although I found my mutable syntax trees to be inconvenient in practice (probably I didn't design them right the first time) and rewrote them as green-trees-only (immutable - I thought I might rewrite the "red" part later, but I got used to working with immutable trees and now I don't feel a strong need for mutable ones.)

By the time MS announced they were open-sourcing Roslyn (April 2014), I had been working on Enhanced C# and related projects (LLLPG, Loyc trees and LES) for well over a year, and by that point I felt I had gone too far down my own path to consider trying to build on top of Roslyn (today I wish I could have Roslyn as a back-end, but I don't think I have time, nor a volunteer willing to work on it).

LeMP

EC# still is not a "compiler" in the traditonal sense, but it's still useful and usable as-is thanks to its key feature, the Lexical Macro Processor, or LeMP for short. It is typically used as a Visual Studio extension, but is also available as a command-line tool and a Linux-compatible GUI.

Through macros, I implemented (in the past few months) several of the features that you guys have been discussing for more than a year:

  • Creating ‘out’ variables in-situ, e.g. int? Parse(string s) => int.Parse(s, out int x) ? (int?)x : null;
  • Code Contracts via annotations on the method signature
  • Tuples (positional only) with deconstruction
  • Algebraic data types
  • Patern matching

(They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros and because I'm just one guy.)

It also has numerous other features:

  • A maximally flexible alternative to "primary constructors"
  • Code quotations and code pattern matching (comparable to LISP and Nemerle), which is useful for writing macros, for code analysis, code generation and even (potentially) reading/writing JSON and LES files (a use case I haven't written about yet).
  • Method forwarding, for doing the decorator pattern more easily.
  • Declaring variables and writing code sequences in expressions (like the ; operator that was sadly not added to C# 6)
  • The "quick binding" operator
  • on_finally which works like Swift's defer, and related macros (on_return, on_throw)
  • replace and unroll for generating boilerplate (although after reading about Nim, I think there's a better way to do the unroll feature)
  • A with statement based on the With statement in Visual Basic
  • An LL(k) parser generator called LLLPG (massive chicken-and-egg problem there: you write LLLPG grammars in EC#, while the EC# grammar is written in LLLPG)
  • And last but not least, users can write their own macros.

The other parts of EC# that exist - the parser and "pretty printer" - support some interesting additional features such as symbols, triple-quoted string literals, attributes on any expression, etc. However, the majority of the syntactic differences between EC# and C# 6 are designed to support the macro processor.

An important theoretical innovation of Enhanced C# is the use of simple syntax trees internally, vaguely like LISP. This is intended to make it easier to (1) convert code between programming languages and (2) to communicate syntax trees compactly.

What now?

Well, I'm not 100% decided about what to do now, knowing that the C# open design process exists and that C# 7 is shaping up to be really nice.

I don't intend to throw the whole thing away, especially since there are major use cases for EC# that C# 7 doesn't address. So in the coming weeks I will change the pattern matching syntax to that planned for C# 7, implement the new syntax for tuple types (minus named parameters, which cannot be well-supported in a lexical macro), and add those "record class" thingies (even though I don't think the C# team has taken the right approach on those.)

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

In fact, those are far from my only options - I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs. And I'd love to make the world's most widely useful programming language (which is not EC#, because we all know how hard it is to improve a language given the backward compatibility constraint). The main reasons to keep going with EC# are (1) that I have a large codebase already written, and (2) that after 8 years alone I finally have a volunteer that wants to help build it (hi @jonathanvdc!)

I do suspect (hope) there are some developers that would find value in EC# as a "desugaring" compiler that converts much of C# 7 to C# 5. Plus, LeMP is a neat tool for reducing boilerplate, "code find and replace" operations, and metaprogramming, so I really want to polish it up enough that I finally win some users.

There is so much more I could say, would have liked to say, and would still like to say to the C# design team... but in case this is the first you've heard of Enhanced C# or LeMP, you might find this to be a lot to take in - just like for me, C# 7 was a lot to take in! So I'll avoid rambling much longer. I hope that, in time, I can win your respect and that you will not "write me off" in a sentence or two, or without saying a word, an eventuality I have learned to emotionally brace for. I definitely have some opinions that would be opposed by the usual commentators here - but on the other hand, I think the new C# 7 features are mostly really nice and I'll be glad to have them.

So if this wasn't TLDR enough for you, I hope you'll enjoy learning about EC# - think of it as how C# 7 might have looked in a parallel universe.

Links:

@aL3891
Copy link

aL3891 commented May 16, 2016

You did all this stuff by yourself? That's pretty darn impressive! I for one hope you stick around these repos, sounds like you have some good insights!

@dsaf
Copy link

dsaf commented May 16, 2016

You might be interested in https://github.com/JetBrains/Nitra . I think it will eventually allow "extending" C# in an IDEA-grade IDE (https://www.jetbrains.com/rider ?).

...I've been closely following the development of WebAssembly and I'd like to do something related to interoperability and WebAssembly, mainly because .NET has not turned out to be the cross-language interoperability panacea that the world needs.

Sadly, Microsoft has not yet described the future toolchain for C# - WebAssembly development. Saying "we have LLILC" is not really an answer. Hopefully they understand that TypeScript is just a temporary work-around.

@HaloFour
Copy link

But in the long run, is it worthwhile to continue working on EC#, or should I instead devote my time to lobbying the C# team to do the features I want? (beware, I can talk a lot...)

I think that depends a lot on what you want.

For acceptance in the mainstream I'd think that you'd have more impact with Roslyn, both lobbying and participating. While I believe that any feature must be championed by an LDM member to be considered for acceptance, having someone with experience in proving out the feature and who can actually develop it would reduce their burden and likely lower the barrier a bit which may allow for a faster evolution of the language.

But you would have to endure the politics of the committee and for someone who has gone their own for so long that might not be ideal for you. If you wanted to keep it on your terms it might be worthwhile to consider forking Roslyn. You'd have a lot to relearn but at least in theory you can keep your changes up to date with the evolution of C#.

Note that several features that you've mentioned (pattern matching, records) got punted to beyond C# 7.0, and they are very likely to change. So rather than adopting what has already been proposed here I'd suggest using EC# as a proof of concept for an existing syntax which can have an impact on how the feature will shape up for potentially C# 8.0.

@qwertie
Copy link
Author

qwertie commented May 18, 2016

@aL3891 Thanks very much! Though I did it all myself, I'd stress that I didn't want to do it alone (I mean, think about your colleagues, have you learned anything from them? I've missed that by not having any).

@dsaf Thanks for the information! Nitra is an impressive project that maybe I ought to learn about (though I guess it could be hard to fit it in with the work I've already done). I wonder what Rider offers that, say, Xamarin Studio doesn't (because competing directly with VS Community seems ... impractical)

P.S. I don't really get how LLILC is different from the AOT compilation that Mono had already.

@HaloFour I'm definitely looking to have some kind of real-world impact, but I'm not sure if the C# team would be interested in replicating the main feature of EC#: a macro system or a compiler plug-in system. Plus, the design of EC#/LeMP would probably be difficult to adapt to Roslyn, so ... I'm not sure how to actually get a real-world impact. 😕

@aL3891
Copy link

aL3891 commented May 18, 2016

I suggest you open issues for the individual features of EC# that you'd like to see in c# and reference the work you've done in each area and then the discussions can go from there :) It may not always be possible to adapt your implementation directly but i'm sure the team will find it interesting none the less, as @MadsTorgersen said (I think it was) said on channel 9 one time, There aren't a whole lot of people out there designing languages, so its nice to stay together!

@dsaf
Copy link

dsaf commented May 19, 2016

@qwertie

...wonder what Rider offers that, say, Xamarin Studio doesn't...

Built-in ReSharper obviously :).

@qwertie
Copy link
Author

qwertie commented Nov 23, 2016

It did not escape my notice that no one from Microsoft was interested. I took my leave, tail between legs... progress on EC# since then has been minimal, but it's not cancelled, I'm still working on it.

@qwertie qwertie closed this as completed Nov 23, 2016
@CyrusNajmabadi
Copy link
Member

i'm interested :)

But, as Halo pointed out, the entirety of what's going on in this issue is enormous. It's simply too large to do anything with in its current state. Extracting out useful pieces and working toward getting them implemented is likely the best path forward.

--

Note that this bit concerns me:

They aren't as polished as the C# 7 features will be, because of techical limitations of lexical macros

We've looked into areas like this before, and a large issue is that things often work well for more 'toy' scenarios, but fall over when you need to really deal with the full complexity of the language. For us to do anything it really needs to be designed so that it will work well in that context.

Thanks!

@qwertie
Copy link
Author

qwertie commented Nov 23, 2016

@CyrusNajmabadi first of all, thank you very much for saying something (and also thanks to aL3891, dsaf & HaloFour - I appreciated your replies; it's just that I really had my heart set on some kind of response from an 'insider'.)

I am curious what you mean that things "fall over when you need to really deal with the full complexity of the language"? I have found that macros work well for much more than just 'toy' scenarios. Let's see...

  • I have noticed some difficulty in composability and conflict resolution of macros written by different people that operate on the same construct (e.g. two macros modify a method - what order should they run in?), but at least a set of "standard" macros can be designed together and compose in the right way.
  • I'm also aware of the challenge of integrating macros with refactoring, but it seems solvable. Some operations could fail when using macros that do fancy things, though, and renames probably shouldn't be done in realtime like VS2015 does.
  • I expect that a macro system would be much harder to implement in Roslyn than in Enhanced C# due to the complexity of syntax trees in the former. [EDIT: Hmm... there's a good chance I'm wrong about that.] An alternative to actually implementing a macro system would be some sort of change to the compiler to allow alternate front-ends. This could allow interested parties to use my existing macro system by switching the extension on a source file to 'ecs'. I bet someone would also write a VB front-end that converts to a C# syntax tree so that you could mix languages in one project (albeit not seamlessly - if the front end can only deal with syntax, the VB code would end up being case-sensitive).

@dsaf
Copy link

dsaf commented Nov 24, 2016

@qwertie

...I really had my heart set on some kind of response from an 'insider'.)

You have actually received a response from Gafter straight away - marking something as "Discussion" means that a suggestion is being rejected on the spot.

My opinion on this topic:

  1. EC# cannot be widely popular because C# doesn't suck. The situation with TypeScript - JavaScript for example is entirely different and even then TypeScript is kind of "meh" unless a front end is predicted to be quite complex.

  2. It's important to point out that C# is open-source but not community-driven. The only viable way of directly contributing to C# design is reduced to this:

https://github.com/dotnet/roslyn/issues?q=is%3Aopen+is%3Aissue+label%3A%22Up+for+Grabs%22+label%3A%22Feature+Request%22+label%3A%22Area-Language+Design%22

image

  1. Alternatively consider this (not sure if this one is still alive):

https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=3&jid=208941&jlang=EN&pp=SS

image

@CyrusNajmabadi
Copy link
Member

How do you define an 'insider'?

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 24, 2016

I am curious what you mean that things "fall over when you need to really deal with the full complexity of the language"

i mean things like properly working in complex constructs like async/await or 'yield'. Or in constructs where variables are captured into display classes. Or with constructs that need to understand the intricacies of reference/value types, especially across complex generic constraints. etc. etc.

After this, you also have to figure out how this impacts the IDE/editing-cycle. We put many months into exploring a system that would do nothing but just allow tree transforms, along with generating the results of those into files that could be introspected, and the problem space was still enormous. How does debugging work? How do IDE features (like 'rename') work? How do safe transformatoins of code work?

Think about it this way:

We want intellisense to be extremely accurate and very fast. How do you accomplish that in systems that allow arbitrary transformation without bounds on transformation cost?

Finally, arbitrary extensibility is also a major concern for us in terms of being able to rev the language ourselves. Now, anything we do in the language has the potential to stomp on someone's arbitrary extensibility plugin. What if some company internally created their own 'async/await' plugin. What happens now when the next version of C# comes out?

@qwertie
Copy link
Author

qwertie commented Nov 24, 2016

How do you define an 'insider'?

Someone on one of the Roslyn teams. But I would have been happy with any Microsoftie.

constructs that need to understand the intricacies of reference/value types, especially across complex generic constraints. etc. etc.

Well, the beauty of user-defined stuff is that it doesn't have to be perfect because MS isn't responsible for supporting it. Also, many macros do something simple enough that there's little that could go wrong and few feature interactions to consider. Plus, a lot of macros would be one-off things made by one user for one project; those things need not work beyond that one little context they were made for.

We put many months into exploring a system that would do nothing but just allow tree transforms

Interesting. Are discussions about it available to read?

We want intellisense to be extremely accurate and very fast. How do you accomplish that in systems that allow arbitrary transformation without bounds on transformation cost?

In general you can't, but note that we technically have this problem already with WinForms controls. In theory they can misbehave on the design surface; in practice most people are happy, and happier than they would be if the design surface didn't run custom code. There are mitigations:

  • Decouple updating the program tree (the directory of classes, methods, etc.) from most user-facing operations (this is done already, I think)
  • Provide a hint to macros (or other units of custom transformation) that they are running in an IntelliSense context, to help slow macros avoid expensive parts (I'm thinking of my parser generator, which could skip grammar analysis and generate methods without bodies in that case.)
  • Measure the running time of all macros: per-macro aggregate time and slowest single invocation. If there's a performance problem, the IDE can put up tips like "FooMacro is slowing down Intellisense" so VS doesn't take the blame. And of course you'd need to inject a thread abort if a macro enters an infinite loop. You'd want to watch their memory usage too (is there a mechanism in the CLR for that?) The build process would also need some way of informing users about performance problems.
  • Have a dialog box for "intellisense performance" which, in addition to a profile of built-in intellisense, would summarize macro performance and allow users to disable badly-behaved macros at design time.
  • Typically a slow macro would only be used in one or two files, so the IDE could learn to process those files last for the purpose of passive look-up (e.g. dot-completion). Refactoring does require full processing though.

Roslyn doesn't do incremental parsing, does it? I wouldn't know how to mix that with a macro system.

Refactoring is the biggest challenge I know of. Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned. Others (renames and parameter reorder) could update the final output, then map those changes to the original source code. I seems doable, but it requires the transformation be performed again immediately in order to find side effects (things that changed other than the requested thing) and failures (where the requested refactoring didn't work properly) and those problems would have to be brought to the user's attention.

What if some company internally created their own 'async/await' plugin. What happens now when the next version of C# comes out?

Then that company would have two ways to do async, I guess? Sorry for being naïve, but so far I'm not seeing a major practical problem. To me it's like the problem of "what if we allow users to define their own classes, and then we add a new class to the BCL with the same name? Hello ambiguity errors!" I knew that was a risk back when I defined my own WeakReference<T>, but I did it anyway. It seems to me it should be the user's decision whether to take that risk. (BTW my macro system has a prioritization feature for some scenarios like this.)

@CyrusNajmabadi
Copy link
Member

Someone on one of the Roslyn teams.

That would be me :)

@CyrusNajmabadi
Copy link
Member

Well, the beauty of user-defined stuff is that it doesn't have to be perfect because MS isn't responsible for supporting it.

One of the arguments i thought you were making was that by implementing this, we could then provide many of the features we've been working on for C# 7 and onwards by layering on this system. That's only true if this subsystem if capable enough to handle all the complexity that we'd need to manage with all our features.

@CyrusNajmabadi
Copy link
Member

Roslyn doesn't do incremental parsing, does it? I wouldn't know how to mix that with a macro system.

Yes, Roslyn does fairly extremely incremental parsing. It tries to reuse, down to the token level, all the data it can :)

@CyrusNajmabadi
Copy link
Member

Refactoring is the biggest challenge I know of. Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned. Others (renames and parameter reorder) could update the final output, then map those changes to the original source code. I seems doable, but it requires the transformation be performed again immediately in order to find side effects (things that changed other than the requested thing) and failures (where the requested refactoring didn't work properly) and those problems would have to be brought to the user's attention.

Yes. And you've now taken a system that should take a few seconds max, and made it potentially take minutes (depending on how many transformations are being done, and how costly they all are). :)

@CyrusNajmabadi
Copy link
Member

Then that company would have two ways to do async, I guess? Sorry for being naïve, but so far I'm not seeing a major practical problem.

We've now released a new version of C# that they can't use. Or which may break their code.

"what if we allow users to define their own classes, and then we add a new class to the BCL with the same name? Hello ambiguity errors!"

We've actually implemented language features to help avoid that. Both through things like namespaces, as well as aliasing (::) (which people do use to ensure that names won't collide).

--

Allowing for abitrary new syntax to be introduced is problematic. Consider that you introduced something like "out-vars" before we did. But perhaps you did it with different semantics than the ones we're putting the language. Now, what happens when someone upgrades? Does the core language take precedence? Could we subtly change code without anything catching it?

@CyrusNajmabadi
Copy link
Member

Perhaps some refactorings like "extract method" should just operate on the original source, macros be damned.

The problem with this is that features themselves are complex. Extract method, for example, needs a fine-grained understanding of data flow and control flow to make appropriate decisions. How does it do this over code that may change arbitrarily because of macros?

Consider just something simple:

void Foo()
{
    <SomeMacro1>

    Normal...
    CSharp...
    var result = Code...

    <SomeMacro2>
}

The user wants to extract out the code in the middle. But maybe <SomeMacro2> ends up using 'result'. Normally extract method would see that 'result' was unused after the extracted region, and it would pull it entirely into the new method. Now, it would need to know that the value was actually used by <SomeMacro2> in order to make sure the value got passed out.

And that's just a simple case :)

@qwertie
Copy link
Author

qwertie commented Nov 24, 2016

@dsaf Thanks. What's the importance of the 'up for grabs' tag?

consider this (not sure if this one is still alive)

Thanks for the heads up; too bad it has no date on it. If it's more than a few months old, I probably applied for it already.

@CyrusNajmabadi
Copy link
Member

There are mitigations:

For certain. But now the problem space has gotten much larger.

--

This is a primary concern here: the value produced by this has to warrant the enormous amount of work that needs to happen here. And it has to justify all that work and not have major downsides that we would have to absorb.

Or, in other words, there are limited people resources to be able to do all of this. A suggestion like this would take a massive amount of effort to thread through the compiler (just the infrastructure), and then would have all that additional work to get workign properly in the IDE. Just the testing would be massive difficult as each feature would now have to deal with not only arbitrary code, but arbitrary macros.

--

To given some examples. We did something like 'Analyzers', and that was vastly smaller than what you're discussing in scope. Analyzers themselves too several devs an entire product cycle to fit into Roslyn. And it's still getting tons of work because of the deep impact is has on the system, and all the perf issues we need to address.

--

In order for us to take on this work, we'd need clear understanding of exactly what value we'd be getting once we finished. Right now that value isn't clear. For example, as mentioned earlier, we likely would not be able to use this system for our own language features. That would mean we'd be investing in something with very little payoff for our own selves. It also means we wouldn't be directly utilizing (dogfooding) our own features. Which means ensuring a high enough level of quality would be quite difficult. etc. etc.

@CyrusNajmabadi
Copy link
Member

Are discussions available ot read?

Work was done here: https://github.com/dotnet/roslyn/blob/master/docs/features/generators.md
#5292

@CyrusNajmabadi
Copy link
Member

What's the importance of the 'up for grabs' tag?

It means we're happy with anyone taking it on and providing a solution.

Technically, anything is 'up for grabs', but the ones with that particular label are things we think non-full-time developers could reasonably take on.

@qwertie
Copy link
Author

qwertie commented Nov 24, 2016

Yes, Roslyn does fairly extremely incremental parsing.

Wow! Somehow I overlooked the incrementalness of the parser when I looked at its code.

And you've now taken a [refactoring] system that should take a few seconds max, and made it potentially take minutes.

Hmm. If the solution is big enough and the macros are slow enough, yes. But no one has to use macros, and if the user is informed of what's slowing down the process, they will be encouraged to do something about slow macros.

One of the arguments i thought you were making was that by implementing this, we could then provide many of the features we've been working on for C# 7 and onwards by layering on this system.

Ah, I see why you would think that, since I had done exactly that with my system. And if C# were a new language then yes, you'd want to design it so that core features would be part of some grand extensibility scheme. But in the case of EC#, part of the reason I did so many features as macros was so that I'd have a payoff without the trouble of writing an actual compiler! Plus I wanted to explore just how much can be accomplished with lexical macros (= syntax-tree-processor macros) alone. And it's a lot.

While some built-in features of C# could be done as macros, I see a macro system more as

  • an incubator for ideas - just to see what power users do with it
  • as a way of reducing pressure to add new features to the language - you prioritize the features that are served least well by macros
  • a way to give developers features that will never meet the team's famous threshold for adding features to C#, or that don't have a single best solution. Classic examples: things that auto-implement INotifyPropertyChanged; parser generators; and since you mentioned dogfooding, macros for code analysis and generation, which should be handy in Roslyn itself.
  • a replacement for T4 templates that is far more convenient to use.

We've actually implemented language features to help avoid that. Both through things like namespaces, as well as aliasing (::) (which people do use to ensure that names won't collide).

My macro system uses namespaces pretty much the same way (if it had more users, I'd add support for :: too.)

Allowing for arbitrary new syntax to be introduced is problematic.

I agree; Enhanced C# does not allow new syntax. I edited C#'s grammar to make it flexible enough that new syntax wouldn't be needed in most cases. For example, there are several macros now that have the syntax of a method definition, like replace Square($x) => $x * $x;.

Consider that you introduced something like "out-vars" before we did. But perhaps you did it with different semantics than the ones we're putting the language. Now, what happens when someone upgrades? Does the core language take precedence? Could we subtly change code without anything catching it?

Yeah... I recognize the tension. Probably it's better to show an "ambiguity" error rather than risk subtly changing the meaning of existing code. If the macro author knows the new feature is coming (and has the same semantics) he could mark it as having a low priority so that the new feature takes priority when it becomes available; and for end-users there could be another mechanism to prioritize, or at least import selectively.

Now, it would need to know that the value was actually used by in order to make sure the value got passed out.

True, there would be cases where 'extract method' might do the wrong thing... although in this example, if SomeMacro2 does something with result without the variable having been passed to it explicitly, it's probably either a badly designed macro (because why would it do that?) or one for which the dev doesn't need/want the refactoring engine to care, because the change in behavior is expected, like some debug/logging/profiling macro that doesn't affect user-facing behavior.

I understand MS has high standards... but I think if a feature provides a lot of value, it should be done even if interactions with other features is imperfect. I suspect you're looking at this as "if the UX is not 100% rock-solid, we can't do it." Whereas I'm looking at it more like "few things have a worse user experience than generating C# with T4 templates. Let's make something akin to T4 that's pleasant, if not quite perfect, see what people do with it, and learn from that experience when we make our next new language in 10 years." To me, as a 'power user', I hate how repetitive my code often is, and wonder if I'd be happier switching to Rust (though Rust drops OOP and GC, both of which I'd rather have than not have) or Nemerle (which, er, I can't recall why I didn't. Maybe because I wanted so much to write a self-hosting compiler!)

So to me, it would be enough to put up a warning. It could detect if any macros are used within the body of a method and say "Caution: this method uses user-defined macro(s). In the presence of certain macros, 'extract method' could produce code that is invalid, or that behaves differently. You may need to verify manually that the refactored code is correct."

@qwertie
Copy link
Author

qwertie commented Nov 24, 2016

Having said all that, point taken, any kind of compile-time metaprogramming is a big, difficult feature.

I just thought of something that I never think about, because I don't use ASP.NET. You know how you can write blocks of C# code in <% %> in an aspx file and intellisense works in there? How do they do that? Is the solution necessarily tied to the Roslyn C# parser, or could I somehow write a VS plugin that would work like aspx, but use my EC# parser instead? And if so, who out there has the knowledge of how to do that - and may be willing to share it with me?

@jnm2
Copy link
Contributor

jnm2 commented Nov 25, 2016

Before I speak bluntly, LeMP is jaw-droppingly impressive. It has features that pull me, from method forwarding to the accessible implementation of the build-your-own-language philosophy. Even though it's clearly not possible for Roslyn to adopt the same methodology as EC#, I absolutely think that it's worth examining all the concepts that Roslyn can take away from project. The work you've done is cool and highly intelligent.

One thing does bother me. As a consumer of C# and Visual Studio, who dreams of the ability to add my own pet language features like await? in a similar way to writing Roslyn analyzers, I have always imagined hooking into the parser and then transforming an already-parsed syntax tree. The thought of having to implement the language extension as a text preprocessor horrifies me. Text processing is full of edge cases that are factorially hard to forsee and harder to get right in a maintainable way. I want to deal with the purest semantic level possible.
I was similarly frustrated every time I tried to use ReSharper's Custom Pattern search or code analysis. I don't want to operate on text, I have experienced that to be brittle and dangerous and at best a workaround, but rather on a semantic model of the C# language which goes straight to the IL compiler.

@CyrusNajmabadi
Copy link
Member

And you've now taken a [refactoring] system that should take a few seconds max, and made it potentially take minutes.
Hmm. If the solution is big enough and the macros are slow enough, yes. But no one has to use macros, and if the user is informed of what's slowing down the process, they will be encouraged to do something about slow macros.

I think this is oversimplifying things. One doesn't need a slow macro for this. Just a macro that may have a wide effect.

they will be encouraged to do something about slow macros.

This necessitates two things. First, we need a system to be presenting this to the user. That has to be designed and built into the entire products. Second, you are adding features now that can take away from the experience and force users into unpleasant choices. Say, for example, a team takes a dependency on some macro that they find really useful. They're using it for months as they grow their codebase up. Then, at some point they find htat things have gotten slower and slower and it's the fault of this macro. What do they do now? Removing the macro is devastating for them, as they'll have to go and change all of that code in their system that depends on it. And giving up on all these features they care about is equally disappointing for them.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 27, 2016

Final note: It's unclear to me what value these macros have over our original SourceGenerator proposals. The benefit of the SourceGenerator approach was that you could take in C# code, manipulate it (using normal Roslyn APIs) and just produce new trees that the rest of the pipeline would operate on. There was no need for a new 'macro language' for manipulating trees. The macro language was just any .net code that wanted to operate on Roslyn's object model.

Such an approach was possible without adding any new syntax to C# at all. Your proposal seems to indicate that you would be able to do things that would traditionally require new syntax (like primary-constructors, or out-vars), but it's still unclear to me how that would work. And, if your approach does not allow for new syntax, it's unclear to me what value your system would have over what we were looking at.

@qwertie
Copy link
Author

qwertie commented Nov 28, 2016

How did you parse the actual code that contains an 'out var'. i don't care how you transformed it. I just care how you actually parsed it. You said that your system required adding no new syntax.

I feel like you must have missed the message in which I talked about the fact that I added lots of new syntax to EC#. Some of that syntax would make sense without a macro system; some of it would not.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 28, 2016

I'm definitely quite confused (as several messages seem contradictory)**. But, for now, i'm going to go with the explicit claim that new syntax is required and that you introduced new syntax to support these features.

If that's the case, and you required syntax changes to be able to support 'out-var' then why would i need macros in order to support out-var? What do macros buy me? Since i had to introduce the new syntax for out-var in the first place... why would i then use macros to implement out-var?

--

** (Again, this is why i'd lke a new thread that starts with precisely the set of syntactic changes you want in the language to support your proposal).

@qwertie
Copy link
Author

qwertie commented Nov 28, 2016

I probably confused you by saying "the parser knows nothing about macros". Sorry about that. In my own mind the syntax is independent, because the parser can do whatever, it's just making a tree, and whether there's a macro system running after it or some other system doesn't matter to the parser. But understandably you don't think about it the same way - you think of C# as a single integrated thing, where certain changes to the parser were designed for the macro system and therefore the parser "knows" about macros. So, sorry for that. Still, note that in principle the macro processor could work (but not support things like "out var") without changes to the parser. Edit: e.g. one of the things I'd like to do someday is take various other parsers - Python, C++ - and hook them up to the macro processor.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 28, 2016

Still, note that in principle the macro processor could work (but not support things like "out var") without changes to the parser.

HOW? If the parser does not change, then how do you handle things like your INotifyPropertyChanged example?

The syntax you presented would be rejected by the C# parser. And if it was rejected any sort of 'processor' would have a heck of a time trying to do anything with the tree we produced.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 28, 2016

In my own mind the syntax is independent

How can the syntax be independent? If Macros run on the tree hte parser produces, then the parser has to understand some sort of Macro syntax so it can generate the right sort of nodes that the Macro processor will run on. If it doesn't, then the tree is going to be massively broken, and it will be enormously painful for any sort of processor to have to work on that tree.

@qwertie
Copy link
Author

qwertie commented Nov 28, 2016

Without changes to the parser, you'd have to make do and write it with syntax that already exists, maybe something like

class ImplementNotifyPropertyChanged {
	public string CustomerName { get; set; }
	public object AdditionalData { get; set; }
	public string CompanyName { get; set; }
	public string PhoneNumber { get; set; }
}

The replace macro would similarly have to be designed to "make do". It would be pretty ugly, but doable.

@qwertie
Copy link
Author

qwertie commented Nov 28, 2016

To make an analogy, List<T> can hold objects of type Foo without having any awareness of Foo. The macro processor doesn't have generic type parameters, but it does process LNode objects, which are language-independent. So in that sense it processes C# without knowing anything about C#.

@CyrusNajmabadi
Copy link
Member

What are LNode objects? What information do they contain? How does one get one?

@qwertie
Copy link
Author

qwertie commented Nov 28, 2016

LNode is the .NET implementation of Loyc trees. The API is described here.

@iam3yal
Copy link

iam3yal commented Nov 28, 2016

@qwertie I tried to read this post few times myself and while I understand most of what you're saying I really, strongly recommend you to start fresh and create a new issue, explaining things in the following manner:

  1. This is the problem.

  2. This is the solution.

  3. This is the syntax.

  4. This is an example of a macro.

  5. This is how it's used at the callsite.

  6. This is the generated code.

In my opinion you shouldn't even think about EC# when describing this at all, this would make it a lot easier to understand and a allow @CyrusNajmabadi and others to see how this fits within Roslyn, if ever.

@CyrusNajmabadi
Copy link
Member

I'm confused again. Are you saying Roslyn would be translating nodes into some other API and calling into that to do work? That sounds quite expensive. Trees can be huge, and we already do a tremendous amount of work to not realize them, and to be able to throw large parts of them away when possible.

@CyrusNajmabadi
Copy link
Member

Agreed. I'm getting high level ideas and concepts. But when i try to dive deeper, i'm seeing contradictions and not-fully-fleshed-out ideas.

Many of your ideas also seem predicated on a whole host of assumptions. i.e. "we could do X, (with the implication that Y and Z are also done). And to do Y and Z, we'd need these other things as well." I can't wrap my head around a clear set of concepts and work items that you're actually proposing, and how each one of them would work.

Most of this feels like you have grand ideas in your head, and you're giving quick sketches based on assumptions that are scattered around in a whole host of places :)

Condensing and focusing would make this conversation much simpler.

@qwertie
Copy link
Author

qwertie commented Nov 28, 2016

I've been switching back and forth between two tasks - if I thought you were asking me about how EC#/LeMP works then I described EC#/LeMP. But you've also been asking about the IDE experience and things like that, so for those questions I've switched gears and tried to figure out (mostly on the fly) how one would, in broad strokes, translate concepts from LeMP to Roslyn. So this conversation is sort-of two conversations interleaved, and which would be bewildering if you're not mentally distinguishing the two or if you haven't understood the EC#/LeMP side of things. Probably at certain points I didn't explain some things well enough, and I'm sorry about that. This got pretty long so I think we should start a new thread, but right now I need to go on a anniversary trip with my wife.

@CyrusNajmabadi
Copy link
Member

I've been switching back and forth between two task

I think that switch was not clear enough for me :D And it would be better to just discuss specifically what we would want to do with Roslyn and C# here.

but right now I need to go on a anniversary trip with my wife.

Congrats! I look forward to hearing from you once you get back!

@jonathanvdc
Copy link

Hi everyone. I'm a small-time EC# contributor, and I'm currently working on ecsc, a command-line EC# compiler. I'm not as knowledgeable about EC# and LeMP as @qwertie, but I thought I'd try and shed some light on how macros work in EC# – perhaps a different perspective can be helpful. I'll try to explain what LNodes are, what the parser does, and what the macro processor (LeMP) does.

LNodes

EC#'s syntax trees are represented as LNode instances. An LNode can be one of the following:

  • An Id node, which represents an identifier. An identifier can be any string (technically, identifiers are encoded as Symbol instances, but that's not very relevant here). x and foo are valid Id nodes, but so are things like #class, #interface and #import. Identifiers that are prefixed by hashtags are called special identifiers. They don't get special treatment per se, but they are used (by convention) to encode language constructs as call nodes. More on that in the next bullet.
  • A call node. Call nodes are conceptually just a simple call. They consist of a call target LNode, and a list of attribute LNodes. For example, f(x) is a valid call node. But call nodes only get really interesting when a special identifier is used as the call target; they are used to represent all C# language constructs other than identifiers and literals. For example, using System; is represented as #import(System): a call to the #import Id node with the System Id node as its argument.
  • A literal node. These are simple literals, such as 1.0, 0, '\n' and "Hello, world!".

Every LNode also has a list of attributes, which are also encoded as LNode instances. Attriibute lists are empty most of the time, though.

It is worth noting at this point that there is no such thing as an "invalid" LNode. For example, #if(f(x)) makes no sense – it's an if statement with neither a 'then' nor an 'else' clause – but it's a perfectly legal LNode, because an LNode is just a data structure. It does not have some implicit meaning.

In ecsc, nonsensical syntax trees like #if(f(x)) are only caught by the semantic analysis/IRgen phase. This differs from how C# traditionally operates, i.e., every statement has well-defined semantics from the get-go.

The parser

Let me get this out of the way first: you seem to be under the impression that the EC# parser is aware of which macros have been defined. That is not the case; there is no such magic.

The EC# parser is a relatively simple tool. It takes source code as input, and produces a list of LNodes as output. It does this according to a number of rules. These make the statement below legal (though they don't assign any semantics to it).

ImplementNotifyPropertyChanged
{
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

ecsc has a pair of options (-E -syntax-format=les) that can be used to coerce it to print the syntax tree. Technically speaking -E will expand macros first, and then print the syntax tree. But I haven't defined ImplementNotifyPropertyChanged in this context, so it won't get expanded.

$ ecsc ImplementNotifyPropertyChanged.ecs -platform clr -E -syntax-format=les -fsyntax-only
'ImplementNotifyPropertyChanged.ecs' after macro expansion: 
ImplementNotifyPropertyChanged({
    @[#public] #property(#string, CustomerName, @``, {
        get;
        set;
    });
    @[#public] #property(#object, AdditionalData, @``, {
        get;
        set;
    });
    @[#public] #property(#string, CompanyName, @``, {
        get;
        set;
    });
    @[#public] #property(#string, PhoneNumber, @``, {
        get;
        set;
    });
});


ImplementNotifyPropertyChanged.ecs:1:1: error: unknown node: syntax node 'ImplementNotifyPropertyChanged' cannot be analyzed because its node type is unknown. (in this context)

    ImplementNotifyPropertyChanged
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Again, let me stress that ImplementNotifyPropertyChanged gets parsed fine. The compiler only flags it as an error when it notices that all macros have been expanded and it doesn't know what an ImplementNotifyPropertyChanged node's semantics are.

The macro processor, LeMP

LeMP takes a list of LNodes as input, and produces a list of LNodes as output. Macros are used to do this transformation, but it might as well be a black box from a compiler pipeline perspective – it's not tied to any other component in the compiler.

Anyway, the basic idea is that LeMP's input contains nodes which the semantic analysis pass doesn't understand, and macros then transform those nodes. LeMP's output (hopefully) consist of nodes that semantic analysis understands completely. So the way it works is: the parser produces a syntax tree which need not have fixed semantics, and macro expansion is that syntax tree's one and only chance to get its act together before the semantic analysis pass converts it into compiler IR.

I'd love to show you the expanded version of @qwertie's ImplementNotifyPropertyChanged example, but I can't do that at the moment because ecsc relies on the Loyc NuGet package instead of the EC# master branch; replace inline macro definitions are a relatively new feature in LeMP. Sorry about that.

I can show you how an ADT is expanded though. Consider the following example:

public abstract alt class Option<T>
{
    public alt None<T>();
    public alt Some<T>(T Value);
}

Without macro expansion, this gets parsed as:

@[#public, #abstract, @[#trivia_wordAttribute] #alt] #class(#of(Option, T), #(), {
    @[#public] #fn(alt, #of(None, T), #());
    @[#public] #fn(alt, #of(Some, T), #(#var(T, Value)));
});

We can force macro expansion by adding using LeMP; to the top of the file. That'll make LeMP import its standard macros. The resulting syntax tree is

#import(LeMP);
@[#public, #abstract] #class(#of(Option, T), #(), {
    @[#public] #cons(@``, Option, #(), {
        });
});
@[#public] #class(#of(None, T), #(#of(Option, T)), {
    @[#public] #cons(@``, None, #(), {
        });
});
@[#public] #class(#of(Some, T), #(#of(Option, T)), {
    @[#public] #cons(@``, Some, #(#var(T, Value)), {
        #this.Value = Value;
    });
    @[#public] #property(T, Value, @``, {
        get;
        @[#private] set;
    });
    @[#public] #fn(#of(Some, T), WithValue, #(#var(T, newValue)), {
        #return(#new(#of(Some, T)(newValue)));
    });
    @[System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never), #public] #property(T, Item1, @``, {
        get({
            #return(Value);
        });
    });
});
@[#public, #static, @[#trivia_wordAttribute] #partial] #class(Some, #(), {
    @[#public, #static] #fn(#of(Some, T), #of(New, T), #(#var(T, Value)), {
        #return(#new(#of(Some, T)(Value)));
    });
});

How is this in any way relevant to Roslyn?

¯\_(ツ)_/¯

I just thought I'd give you some background. That's all. :)

@CyrusNajmabadi
Copy link
Member

Let me get this out of the way first: you seem to be under the impression that the EC# parser is aware of which macros have been defined. That is not the case; there is no such magic.

I can't reconcile this with the code examples given. If the parser is unware of 'macros' how could it successfully parse:

ImplementNotifyPropertyChanged
{
    public string CustomerName { get; set; }
    public object AdditionalData { get; set; }
    public string CompanyName { get; set; }
    public string PhoneNumber { get; set; }
}

This is not legal C#. If you tried to parse this today then the parser would 'go off the rails', creating tons of skipped tokens and missing tokens. If that's the case, then the transformation step would have a heck of a time trying to figure out what happened. If you want a good tree, then the parser is going to need to know about macros.

Or, alternatively, we can use the approach we took with SourceGEnerators. Namely that we used an existing piece of syntax (i.e. '[attributes]'), to mark where we wanted generators to run. But if it isn't an existing piece of syntax, then i'm not sure how the system can work without the parser having to know about the syntax of these guys.

@jonathanvdc
Copy link

Right. So the thing is that the EC# parser doesn't think about what it's parsing in the same way a traditional parser – like Roslyn's C# parser – does.

IIRC, the EC# grammar defines something called block-calls, and what you're seeing is really just an example of that. Basically, anything that looks like identifier { ... } gets parsed as a call node: identifier({ ... }). The parser doesn't stop and consider if the syntax tree is meaningful: only macros and semantic analysis can define a syntax tree's semantics.

Macros don't define new syntax. They merely transform the parse tree in a way that assigns semantics to constructs that don't have semantics yet. The EC# parser was designed with macros in mind – which is exactly why it successfully parses source code that is meaningless without a macro processor – but it doesn't interact with the macros. It just builds a syntax tree, and leaves the task of transforming said tree to the macros.

So the EC# parser will parse the example you listed as exactly this.

ImplementNotifyPropertyChanged({
    @[#public] #property(#string, CustomerName, @``, {
        get;
        set;
    });
    @[#public] #property(#object, AdditionalData, @``, {
        get;
        set;
    });
    @[#public] #property(#string, CompanyName, @``, {
        get;
        set;
    });
    @[#public] #property(#string, PhoneNumber, @``, {
        get;
        set;
    });
});

And that will work even if no macro called ImplementNotifyPropertyChanged is defined – in fact, the parser isn't even aware of which macros are defined when it is parsing away at the source code.

I understand that this can be hard to wrap your head around. But you should really try to think of the EC# parser as something that parses data rather than code, akin to an XML parser. An XML parser will happily parse <CompilerOption key="out" value="bin/Program.exe" />, despite the fact that it has no idea of what a CompilerOption node's semantics are. It's entirely up to the program that runs the XML parser to make sense of what a CompilerOption node is.

Similarly, the EC# grammar defines legal constructs whose semantics are to be defined by the user, in macro form. The parser mindlessly parses its input according to the grammar, and then hands the syntax tree off to the macro processing phase. That's all there is to it, really. Conceptually, it's a pretty dumb system, but it works beautifully.

@CyrusNajmabadi
Copy link
Member

IIRC, the EC# grammar defines something called block-calls

...

Ok. So there is new syntax defined, and the parser does need to be aware of this :)


Macros don't define new syntax.

I don't understand. You just said the syntax for macros was: identifier { ... }. That's new syntax. C# doesn't have that syntax today.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 28, 2016

The parser doesn't stop and consider if the syntax tree is meaningful:

By and large, neither does Roslyn's parser**. But the parser still needs to know what syntax is valid or not. It needs to know what language constructs are in the language. And so it needs to know what the syntax is for macros. Otherwise, it will be completely thrown off when it see these constructs. I mean, you don't have to take my word for it. Just toss the above syntax into a file and you'll get errors like:

Severity	Code	Description	Project	File	Line	Suppression State
Error	CS1022	Type or namespace definition, or end-of-file expected	
Error	CS1022	Type or namespace definition, or end-of-file expected	
Error	CS0116	A namespace cannot directly contain members such as fields or methods	
Error	CS0116	A namespace cannot directly contain members such as fields or methods	
Error	CS0116	A namespace cannot directly contain members such as fields or methods	

--

** Technically not true. But all the cases where Roslyn's parser does this, should be moved out to higher layers. This is what i did when i wrote the TS parser. There's no need for that stuff to live in hte parser. It's just there for legacy reasons.

@jonathanvdc
Copy link

jonathanvdc commented Nov 28, 2016

Ok. So there is new syntax defined, and the parser does need to be aware of this :)

Yes, absolutely. EC# defines new syntax. But that has nothing to do with the ImplementNotifyPropertyChanged macro in particular.


I don't understand. You just said the syntax for macros was: identifier { ... }. That's new syntax. C# doesn't have that syntax today.

Yeah. As far as I can tell, @qwertie crafted the identifier { ... } syntax specifically for macros. But macros can operate on any syntax node. Heck, a macro can even transform syntax nodes that already have well-defined semantics today. In fact, ecsc implements foreach as a macro.

So I'd much rather say that identifier { ... } is a syntax to make using macros easier, but it's not the syntax, because EC# macros can operate on any syntax.

Does that clarify things a little? :)

@qwertie
Copy link
Author

qwertie commented Nov 29, 2016

Macros don't define new syntax.

I don't understand. You just said the syntax for macros was:

Well, "macros" is not what defined new syntax. It was "Enhanced C#" (a.k.a. me) that defined the new syntax.

As far as I can tell, @qwertie crafted the identifier { ... } syntax specifically for macros.

That's basically true, but indirectly. So here's the whole story.

I decided that, unlike some existing languages with LISP-style macro systems, I wanted a macro system in which macros would not add new syntax, because I believed parsers should be able to succeed without awareness of macros. Also, as a C++ programmer I was well aware that the C++ parser was linked to the symbol table - in general, C++ is ambiguous and requires a symbol table to resolve those ambiguities. Even if C++ didn't have #define macros, the situation would be analogous to languages where macros define syntax. For example, the statement X * Y; may be a multiplication or a pointer declaration depending on whether X is a type. This has at least two disadvantages:

  • An inefficient linear parsing system, where all #include files must be parsed before the machine can parse the main file. Moreover if one source file says #include "X", and another says #include "W" then #include "X", the parser must parse "X" twice, since the contents of "W" can affect the interpretation of "X". (c.f. EC#'s includeFile macro, where the included file is parsed after the main file)
  • If the included files are not available, parsing can't be done properly. In practice IDEs will try to "fake it", but parsing must be repeated if the included files are discovered later, and I was concerned that if macros became a important core feature (unlike in C++ where macros are of limited use and the ambiguity I mentioned about only happens occasionally), their use of custom syntax would be a serious problem for the IDE.

Also, if macros can define new syntax then their meaning can be slightly harder to guess. By analogy, we could view unknown macros the way we view foreign languages. Consider trying to read Spanish vs Tagalog. You don't really understand either language, but Spanish has both words and grammar that are more similar to English, so you can glean more information from a Spanish text than a Tagalog text - perhaps you can even guess the meaning correctly. If macros can add arbitrary syntax, then when you look at an unknown macro you don't even know to what extent custom syntax has been added. So if you see something like "myMacro foo + bar;" then probably the macro accepts an expression, but you can't be sure; it's really just a list of tokens, and usually in these systems, you can't even know whether the semicolon marks the end of the macro or if it keeps going after that.

So instead I decided to preserve C#'s tradition of "context-free" parsing by ensuring every source file can be parsed without knowledge of macros. However, if macros wouldn't be allowed to add syntax then they would require changes to the language, such that the existing syntax was usually sufficient for them. This new syntax should be useful for multiple unforeseen purposes, and also consistent with the existing flavor of C#.

My main strategy was to "generalize" C#. Part of this generalization was taking the existing syntactic ideas of C# and extending their patterns in a logical way. Here are some examples:

  • Numerous statements can begin with "modifiers" like public, abstract, readonly (you can also think of ref and out as being in this category). I noticed that, upon seeing a modifier, the parser cannot know what kind of statement it is attached to. So it is forced to skip past all modifiers, examine whatever comes after, and only then decide whether the modifiers are valid on the construct to which they are applied. Simply by not checking "is this modifier valid on this construct?" (i.e. waiting until semantic analysis) the parser can accept any modifier on any construct (and since my syntax tree has an attribute list on every node, the parser always has a place to save the modifiers.)
  • Properties have get {...} and set {...} while events have add {...} and remove {...}. Generalizing this pattern, we get any_identifier {...}
  • Consider the built-in constructs if (...) {...}, lock (...) {...}, while (...) {...}. Generalizing, we get any_identifier (...) {...}
  • Given contextual keywords like partial and yield, I observed that really any identifier could act as a contextual keyword, so EC# does treat any identifier as a contextual keyword if possible. (However, I actually made a mistake - I didn't notice that, for example, partial public class Foo was illegal. I incorrectly thought of contextual keywords as a kind of modifier rather than as something that comes after modifiers; thus my parser currently accepts partial public class Foo.)

As I designed this, I had very few actual macros in mind. For instance, remember alt class BinaryTree<T>? I generalized "contextual keywords" long before I thought of creating alt class. The historical precedent seemed compelling enough by itself, e.g. partial, yield, async (not to mention add, remove, etc.) demonstrate the value of contextual keywords. And obviously, the C# team would always design new syntax in a way that is consistent with old syntax, so it made sense to "entrench" any obvious patterns that were developing - making them available both to future features in the compiler itself, and macro authors as well.

Another part of "generalizing C#" was "squashing" multiple grammar productions together. In part this was to give macros flexibility, but I also wanted to make the EC# parser simpler, or at least no more complex, than the C# parser. (Currently it totals 2500 lines including about 500 lines of comments - or 5600 lines including 800 comments after LeMP expands it. Roslyn's C# parser is about 10,000 lines with 800 comments, though it's not fair to directly compare since, for example, my parser still lacks LINQ, while Roslyn has more blank lines and is more paranoid due to its use in an IDE.)

  • The existing C# grammar defines "top level of a file", "contents of a class", "contents of a property" and "contents of a method" as separate contexts, each with their own grammar. I observed that I could squash those contexts together so that you could write a class with if statements in it, a property with a statement directly inside it (not inside get { }), or a method with another method inside it.
  • All the "space" constructs - namespace, class, struct, interface, enum - have similar syntax, so I combined them.
  • I figured that macros might want unusual syntax inside the "formal" argument list of a method. I also thought it would be neat if you could define variables in expressions, like in C++ where you can write if (Foo x = y) - note that this is completely unrelated to macros, it's a separate thing that seemed useful in its own right, except that in C# it would have to be if ((Foo x = y) != null) instead. At first I thought this feature wasn't possible in C# because Foo(Dictionary<K,V> x) would be ambiguous (is it a variable declaration or two separate arguments?) But then I realized that if the variable is assigned a value, like Foo(Dictionary<K,V> x = null), it's not "really" ambiguous since the expression V> x = null could never compile. So, I decided to squash the "expression" syntax together with the "formal parameter list" syntax. There would still formally be two different kinds of expressions - one that requires variables to be assigned with =, and another that does not, but a single expression parser can handle both situations. This gave me four birds with one stone:
    • arbitrary expressions in formal argument lists (potentially useful for macros)
    • attributes on any expression (potentially useful for macros)
    • variable declarations in expressions
    • out-variable-declarations like TryParse(s, out int x) (these last two are meant to be directly implemented in a compiler, but are implemented as a macro since there's no complete EC# compiler.)

Finally, I realized that "generalized C#" by itself isn't sufficient for all macros, so I added a few more things:

  • The substitution operator $, which is crucial for macros like replace
  • Token literals: @{ tree of tokens (parens, [square brackets] and {braces} must be balanced.) }. This is not the same as letting macros create syntax, since the whole file is parsed before any macros get involved. It's a literal, like a string.
  • Custom unary and binary operators in `backticks` (e.g. x `<=>` y or pehaps x `X` y for cross-products). I saw the "backquote operator" both as an extension point for macros, and as a generalization of the existing concept of overloaded operators (thus, the parser also accepts static int operator`plus`(int x, int y) { return x+y; }).

@CyrusNajmabadi
Copy link
Member

Macros don't define new syntax.
I don't understand. You just said the syntax for macros was:
Well, "macros" is not what defined new syntax. It was "Enhanced C#" (a.k.a. me) that defined the new syntax.

...

If you defined new syntax for macros... then macros did indeed define new syntax.

c# does not contain this grammar production. In order for the c# parser to parse out macros, it would need to understand this new syntax. I do not see how we can do macros (like you do them) without defining new syntax here.

@CyrusNajmabadi
Copy link
Member

All the "space" constructs - namespace, class, struct, interface, enum - have similar syntax, so I combined them.

There is a tension here. We've avoided overlapping things when there are significant deviations between the forms. For example, namespaces can have dotted names. The rest can't. If the node supports dotted names here, that means that all downstream consumers either need to figure out what to do when they encounter a dotted name in any of these other entities. Alternatively, the parser might never accept dotted names for the rest, but now everyone needs to know that they should assume the name is never dotted. The node is no longer a source for confident information about what you might get.

There's the question of 'when does this end' as well? After all, methods/properties creates 'spaces' (i.e. where locals and whatnot live). Should we merge methods with the above list? You could just have the above list then have an optional parameter list before the braces...

At the end of the day, you could try to merge everything into one type (i've seen systems that do this). Pros are that you only ever deal with one type. Cons are the amount of information you need to handle.

@CyrusNajmabadi
Copy link
Member

Finally, i guess i'm just not seeing what purpose macros actually serve over the SourceGenerator proposal. As you've mentioned, they cannot introduce new syntax. So all they can do is take existing syntax and manipulate it, to produce new syntax. But that's what SourceGenerators did. That's something Roslyn is optimized for, as it allows very extensible transformation of Syntax.

The problem was not in making it possible for people to manipulate syntax (we have plenty of experience and features that do that today). The problems stemmed from how you make a cohesive, fast, and trustworthy set of tools when this is a fundamental building block of your system.

Because source-transformation is now a core primitive, we have to assume it will be used pervasively by many. And that means every single feature we build into the product needs to work well with these features.

@gafter
Copy link
Member

gafter commented Mar 24, 2017

We are now taking language feature discussion in other repositories:

Features that are under active design or development, or which are "championed" by someone on the language design team, have already been moved either as issues or as checked-in design documents. For example, the proposal in this repo "Proposal: Partial interface implementation a.k.a. Traits" (issue 16139 and a few other issues that request the same thing) are now tracked by the language team at issue 52 in https://github.com/dotnet/csharplang/issues, and there is a draft spec at https://github.com/dotnet/csharplang/blob/master/proposals/default-interface-methods.md and further discussion at issue 288 in https://github.com/dotnet/csharplang/issues. Prototyping of the compiler portion of language features is still tracked here; see, for example, https://github.com/dotnet/roslyn/tree/features/DefaultInterfaceImplementation and issue 17952.

In order to facilitate that transition, we have started closing language design discussions from the roslyn repo with a note briefly explaining why. When we are aware of an existing discussion for the feature already in the new repo, we are adding a link to that. But we're not adding new issues to the new repos for existing discussions in this repo that the language design team does not currently envision taking on. Our intent is to eventually close the language design issues in the Roslyn repo and encourage discussion in one of the new repos instead.

Our intent is not to shut down discussion on language design - you can still continue discussion on the closed issues if you want - but rather we would like to encourage people to move discussion to where we are more likely to be paying attention (the new repo), or to abandon discussions that are no longer of interest to you.

If you happen to notice that one of the closed issues has a relevant issue in the new repo, and we have not added a link to the new issue, we would appreciate you providing a link from the old to the new discussion. That way people who are still interested in the discussion can start paying attention to the new issue.

Also, we'd welcome any ideas you might have on how we could better manage the transition. Comments and discussion about closing and/or moving issues should be directed to #18002. Comments and discussion about this issue can take place here or on an issue in the relevant repo.

You may find that the original/replace code generation feature tracked at dotnet/csharplang#107 is related to this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants