-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] enable code generating extensions to the compiler #5561
Comments
Looking forward to seeing where this discussion goes. The last time I remember metaprogramming being discussed, or more generally compile-time hooks, it sounded like the team wanted a little more time to see how things shook out (#98 (comment)). Hopefully now that it's 8 months later and the library and tooling is in the wild it's a good time to revisit. I personally really like the idea of using code diagnostics and fixes to drive this functionality. We already have great tooling around developing and debugging them, and Roslyn already knows how to apply them. Once developed, there's also already a mechanism for explicitly applying them to target code to see how they'll look. I can envision a variety of use cases for this concept, from simple one-off fixes to things like entire AOP frameworks based on applying code fixes in the presence of attributes that guide their behavior. |
I wrote the code generation portions for the C# targets of ANTLR 3 and ANTLR 4 using a pattern similar to XAML. Two groups will shudder at this statement: developers working on MSBuild itself, and the ReSharper team. For end users working with Visual Studio, the experience is actually quite good. There are some interesting limitations in the current strategy. Reference limitationsIt is possible for C# code to reference types and members defined in the files which are generated from ANTLR grammars. In fact, from the moment the rules are added to the grammar (even without saving the file), the C# IntelliSense engine is already aware of the members which will be generated for these rules. However, the code generation step itself cannot use information from the other C# files in the project. Fortunately for ANTLR, we don't need the ability to do this because the information required to generate a parser is completely contained within the grammar files. Undocumented build and IntelliSense integrationsThe specific manner in which a code generator integrates with the IntelliSense engine (using the XAML generator pattern) is undocumented. This led to a complete lack of support for the code completion functionality described in the previous section in other IDEs and even in ReSharper. |
Let's deal with the problem of undefined order of such transformations. In normal OO code, this pattern is conceptually similar to decorator pattern. Take a look at this code:
vs
The nice thing is that we have to manually specify the order of wrapping the class. This seems like the most obvious answer: Let the programmer specify the order. I think this would be simple to do cleanly with attributes which would be somehow wired up to cause this transformations
They could be specified at assembly/class/member level to represent all kinds of transformation scope. The problem is that until now, attributes serve as metadata only option - now instead they modify source code. Maybe user defined class modifiers would be better:
Where |
@ghord That's a good idea. If we limit the code generators to only working on symbols with custom attributes explicitly specified, we can order the code generators by the order of the attributes as specified in the source. |
@mattwar @ghord While I think the use of attributes to guide the code generation process could work (it's worked well for PostSharp, for example), I'd love to see a more general solution that isn't directly tied to a specific language syntax or feature. That's why I mentioned being able to apply analyzers and code fixes as a possible approach. The way I would envision this working is that the compiler would be supplied with a list of analyzers and code fixes to automatically apply just before actual compilation. It would work as if the user had manually gone through the code and applied all of the specified code fixes by hand before compiling. BenefitsI suspect that this could be achieved with a minimal amount of changes to existing Roslyn, at least functionality-wise (though it may take some serious refactoring - I have no idea). Of course, the compiler would need a mechanism for specifying the analyzers and code fixes and applying them during compilation. Note the following:
There would also be some synergy with this approach between existing authors of conventional analyzers and code fixes and those intended to be used for code generation. Existing code fixes could also be adapted or possibly applied wholesale during the code generation stage (if specified). The tooling and process would be the same so skills could be leveraged for either. ChallengesI do see the following questions or complications with this approach:
An example that uses attributesGetting back to the use of attributes, one of the big examples of this approach that I've been thinking about is using it to build out a full AOP framework similar to what PostSharp does. In this case, an analyzer would be written that looks for the presence of specific attributes as defined in a referenced support library. When it finds them, it would output diagnostics that a code fix would then act on. The code fix would then apply whatever code generation is appropriate for the attribute. My favorite PostSharp aspects is You could potentially build up an entire AOP framework by creating analyzers and code fixes that act on pre-defined attributes and their derivatives. The point, though, is that you wouldn't have to. The code generation capability could be as flexible and general as analyzers and code fixes themselves, which because they directly manipulate the syntax tree can do just about anything. |
Very happy to see a proposal for this on the table. Are attributes how you envision applying a Code Injector? I think specifying the syntax for applying them is needed in the proposal. Is the idea that CodeInjections all take place prior to build so that you can see and possibly interact with the members it generates? If so, I think being able to interact with the generated code is a another huge benefit that you should mention in your proposal. When using PostSharp, anything you have it generate doesn't exist until build time, so you can't reference any of it in your code. @ghord The problem with your proposal on ordering is that you might not define the injections in the same place. For example, you could have an code-injection attribute on a class |
Using a property on an attribute to specify order is not new to the framework: DataMemberAttribute.Order Property But I think that, if order is important, then there's something wrong. Notifying property change is something that is expected form the consumers of an object, not that the object expects itself. So, as long as it's done, the order doesn't matter. Logging is the same thing. If you want to log the notification, than that is not logging the object but logging the notification extension. Is there any compeling example where one extension influences the other and order matters and it can still be considered a good architecture? |
@paulomorgado Yes, there are any number of use cases. For example, you want to have some authorization code run before some caching code. PostSharp has several documentation pages about ordering. |
Rather than ordering at use site, why not let injectors specify their dependencies using something akin to those PostSharp attributes @MgSam is linking to? (Or OrderAttribute in VS.) Depending on the order of the attributes at the use site seems very brittle to me and prevent using them at different scopes. |
There are some issues with attributes which we will have to overcome for this to work:
We could make the order alphabetical according to file names. I'm pretty sure that in 99% cases the order won't matter, but leaving undefined behavior such as this in the language is very dangerous - application could crash or not depending on the applying order of transformations. |
@ghord, what in this proposal influences assembly attributes? |
I think code generation support for the compiler would be fantastic. I'd love to be able to do something similar to what PostSharp provides. PostSharp's more limited free version, and the requirement to submit an application to get an open source project license makes me unwilling to look at it for anything but larger projects at work that we would invest money in. I'd like to be able to have great AOP tools for everyday/hobby projects without additional hassle. @daveaglick For debugging, if code generation is only happening after things are sent to the compiler, wouldn't inserting line directives into the syntax tree preserve the integrity of the debugging experience? I did make a syntax tree rewriter for Roslyn to implement a simple method boundary aspect. I used a Roslyn fork to get this hooked in during compile time. Line directives ensured there was no issue with debugging. It was an interesting experience and an example of something I'd like to be able to do without jumping through hoops. One issue I had though was the fact that I was working at the syntax tree stage deprived me of information that was needed from the bound tree stage. Is there a way to know about type relationship information at this point? When you see an attribute on a class how will you know that it subclasses MethodBoundaryAspect or whatever? |
Is this like F#'s type providers? |
It's great to see this being proposed, I remember asking a while back if this was being considered. I think that it should be possible to have the modified source code written out to a temp folder, to make debugging easier. Either by default or controllable via a flag. I also think that having to apply an attribute to the parts of the source code that can be re-written is a nice idea as it makes the feature less magical and it's easier reason about. |
@AdamSpeight2008 I don't think so, I see this feature more as a compiler step that lets you modify code before it's compiled. But crucially this isn't meant to be seen by the person who wrote the code, it happens in the background when the compiler runs. My understanding of type providers is that they integrate more into the IDE and help you when you are writing code that works against a particular data source (by providing intellisense, generate types that match the contents of a live database, etc) |
@Inverness https://github.com/StackExchange/StackExchange.Precompilation already does just that. |
Debugging of generated code inside IDE is important. As far as I understand, this will be unavailable in case of MSBuild. |
I would start with that but still have a hook at the csc level (similar to analyzers). That makes it run at the "right" time. This is essentially what compile modules did (ignored the IDE 😄 because it was hard.). |
^^ saying that though, you could debug them by throwing in a |
That should work fine actually. The MSBuild approach would augment the compilation with additional files. As this isn't done in the compiler these files would need to reside physically on disk (likely in the obj folder). Hence debugging would just work. |
Could something like CallerMemberNameAttirbute be implemented by generators? I believe caller info attributes are just a matter of code generation and can be done outside of the compiler, however, it needs "inspecting" the code, rather than adding a compilation unit and replacing members in declaration-site i.e. replace/origin. I'm sure a lot more interesting AOP scenarios can be implemented with generators if said API exists. |
There are two types of generators to consider:
An modifying generator would be able to implement An augmenting generator would not be able to. It can only add source files hence can't modify the file authored by the user where the Note that when I've discussed generators on this thread I've mostly been talking about an augmenting generator. Those IDE problems I've discussed for augmenting generators pale in comparison to the challenges faced by a modifying generator. How for instance do you design a rational IDE experience around a plugin that can virtually erase the keystroke you are currently typing in the emitted binary? It's quite daunting and likely there is no sane possible experience. These problems are why the compiler team eventually took on a two prong solution: augmenting generators + language features to make generators more powerful. The latter has been done in the past (think partial types and methods). The original / replaces model extended that to allow a lot more flexibility. |
It modifies the AST that is handed to the emitter to generate the final binary. I think this shouldn't go back and forth in the same assembly boundary. For this particular scenario, modifying the invocation wouldn't even affect other parts of the code, so there is no need to know what has been changed. I agree in any other cases that need dramatic changes to members declarations, replace/original can do a better job. |
Sure but what does Intellisense say? How does debugging work? The final syntax tree will be, possibly, very different than what lives in your source repo. When you F5 and step into that file what happens? |
@jaredpar Right, the only observable thing for CallerMemberNameAttribute usages in debugging is the value passed to the method. However, if that was not a constant it wouldn't work well in debugging. |
@alrz it's also affects all call sites, e.g.
where
So when you're debugging the first snippet, and your breakpoint is on the
Visual Studio would highlight @jaredpar I think having something like a structured representation of the source map when rewriting SourceTrees (either on the SourceTree, or on the entire Compilation) would solve a lot of problems we face RE proper debugging support. Having dealt with it in both C# and JavaScript, I must say I think the JS SourceMap approach is superior to what we currently have in C#. I think it shouldn't be to difficult to adjust the map automatically for unmodified parts of the SourceTree on modifications, since we already have a structured representations for changes. |
@jaredpar
Where |
@mattwar There is/was an interesting, but unknown project called Genuilder, written which hooked into a pre-compile event in msbuild, and passed the source tree to custom c# generators (which are in standalone dlls), which could then output extra source code. Here's a repo using it: Magic. The classes would get picked up by VS and you had working intellisense for generated code within the same project (R# intellisense wouldn't pick up the generated classes though). |
Just an idea, talking about modifying generators, as @jaredpar classified them. What if there will be some "special" type of project item, which can be edited by user as regular C# source file, using Intellisense, code analyzers, syntax highlighting, etc, but this project item will be able to run generators to get actual source code to compile? It's some sort of "advanced T4 template". This project item must not allow to modify its source code, typed by user, directly from generator - modification must be applied to the output. Also, debugger must step into actual, modified source code. This could solve problems with IDE experience, since there won't be any code (user typed) modifications "on-the-fly". Also, user can see, what he gets from generators - this will decrease level of "magic", brought by one or another generator. There are some things to think about in context of breakpoints - user can put breakpoint at line, which will be absent in output, but this could be solved by disabling such breakpoints. What do you think about that? |
Which is more or less what this does: https://github.com/AArnott/CodeGeneration.Roslyn /cc @AArnott |
There's also Scripty, which is similar: https://github.com/daveaglick/Scripty |
I'm almost sure, that there are number similar 3rd party tools. The basic problem is that they are 3rd party tools. The probability of abandoning them is rather high. Moreover, if this is non-commercial projects for contributors, I'd afraid to bring them into real projects. E.g., both mentioned projects have less than 70 commits. This is negligibly small. Compare number of commits to alive and popular projects, like Autofac: https://github.com/autofac/Autofac. Remember Code Contracts? IMHO, tools with impact like this should be an official part of .NET ecosystem. |
Thanks for all the time you spent this December patiently explaining your POV on this. It's certainly of value for stakeholders such as me and my company, trying to assess the likeliness of source generators to be realized some time soon. If you'd find a minute or two to spare at some point, I'd highly appreciate your view on my recent question here. |
Wow, I didn't even had the time to post that before you did. 😄 🚀. Thanks a lot. |
This is now tracked at dotnet/csharplang#107. It is championed by @mattwar. |
Often when writing software, we find ourselves repeatedly typing similar logic over and over again, each time just different enough from the last to make generalizing it into an API impractical. We refer to this type of code as boilerplate, the code we have to write around the actual logic we want to have, just to make it work with the language and environment that we use.
One way of avoiding writing boilerplate code, is to have the computer generate it for us. After all, computers are really good at that sort of thing. But in order for the computer to generate code for us it has to have some input to base it on. Typical code generators are design-time tools that we work with outside of our codebase, that generate source that we include with it. These tools usually prefer their input to be XML or JSON files that we either manipulate manually or have some WSIWYG editor that lets us drag, drop and click it into existence. Other tools are build-time, that get run by our build system just before our project is built, but they too are driven by external inputs like XML and JSON files that we must manipulate separately from our code.
These solutions have their merits, but they are often intrusive, requiring us to structure our code in particular ways that allow the merging of the generated code to work well with what we’ve written. The biggest drawback, is that these tools require entire facets of our codebase to be defined in another language outside of the code we use to write our primary logic.
Some solutions, like post-build rewriters, do a little better in this regard, because they operate directly on the code we’ve written, adding new logic into the assembly directly. However, they too have their drawbacks. For instance, post-build rewriters can never introduce new types and API’s for our code to reference, because they come too late in the process. So they can only change the code we wrote to do something else. Even worse, assembly rewriters are very difficult to build because they must work at the level of the IL or assembly language, doing the heavy lifting to re-derive the context of our code that was lost during compilation, and to generate new code as IL and metadata without the luxury of having a compiler to do it. For most folks, choosing this technique to build tools to reduce boilerplate code is typically a non-starter.
Yet the biggest sin of all, is that all of these solutions require us to manipulate our nearly unfathomable build system, and in fact requires us to have a build system in the first place, and who really wants to do that. Am I Right?
Proposal: Code Injectors
Code injectors are source code generators that are extensions to the compiler you are using, as you are using it. When the compiler in instructed to compile the source code you wrote, code injectors are given a chance to exam your code and add new code that gets compiled in along with it.
When you type your code into an editor or IDE, the compiler can be engaged to provide feedback that includes the new code added by the code generators. Thus, it is possible to have the compiler respond to your work and introduce new code as you type that you can directly make use of.
You write a code injector similarly to how you write a C# and VB diagnostic analyzer today. You may choose to think of code injectors as analyzers that instead of reporting new diagnostics after examining the source code, augment the source code by adding new declarations.
You define a class in an assembly that gets loaded by the compiler when it is run to compile your code. This could easily be the same assembly you have used to supply analyzers. This class is initialized by the compiler with a context that you can use to register callbacks into your code when particular compilation events occur.
For example, ignoring namespaces for a moment, this contrived code injector gives every class defined in source a new constant field called ClassName that is a string containing the name of the class.
This works because of the existence of the C# and VB partial class language feature.
Of course, not all code injectors need to be in the business of adding members to the classes you wrote, or especially not adding members to all the classes you wrote indiscriminately. Code injectors can add entirely new declarations, new types and API’s that are meant to simply be used by your code, not to modify your code.
Yet, the prospect of having code injectors modify the code you wrote enables many compelling scenarios that wouldn’t be possible otherwise. A companion proposal for the C# and VB languages #5292 introduces a new feature that makes it possible to have code generators not only add new declarations/members to your code, but also to augment the methods and properties you wrote too.
Now, you can get rid of boilerplate logic like all that INotifyPropertyChanged code you need just to make data binding work. (Or is this that so last decade that I need a better example?)
Subjects not covered in this proposal but open for discussion too
The text was updated successfully, but these errors were encountered: