Skip to content

Macros extended course. Part 2

Alex Zimin edited this page Aug 19, 2011 · 4 revisions

Table of Contents

Expression macro creation

The main thing you have to realize to work with macros is that macros execute not with the main program, but rather at the moment of its compilation. In essence, a macro is a compiler extension. It is an extension that can add anything you want (and, what is important, do it simply).

A macro, being, in essence, a library, can use the compiler's API, as well as external libraries (since it is a library itself). This makes it an incredibly powerful tool.

To demonstrate this claim, lets create the simple macro PrintTime, which prints the time at which it is called:

using System;
using Nemerle.Compiler;

namespace MacroExamples
{
  macro PrintTime()
  {
    Message.Hint($"Now $(DateTime.Now)");
    <[ () ]>
  }
}

If you place this macro inside a macro library, create a test project, add a link to the macro library (the first part of this article describes how to do this) and place a call to this macro in the code:

using System;
using System.Console;
using Nemerle.Utility;
using MacroExamples;

module Program
{
  Main() : void
  {
    PrintTime();    // macro invocation
    _ = ReadLine(); // to prevent the window from closing after program execution
  }
}

If you then build the solution, you will see, among the rest of the compiler messages the following one:

 ...\Main.n(10,5):Warning: hint: Now 19.06.2007 14:42:31

As you can see, macro code was called during source code compilation and within the compiler process, too, since the macro successfully used the compiler's API to output a warning message directly into the IDE console.

The code "<[()]>" might seem mysterious to many. The thing is that an expression level macro must return something to be placed inside the code in place of the macro. The "()" construct is viewed by the compiler as a code fragment that does nothing. This placeholder's type is void. The compiler simply ignores this code when it is not placed within another code block, and will generate an error message if it is. In general, returning the value "<[()]>" from a macro is equivalent to not returning anything at all. In this way, our first macro generates no code, but only performs a service function.

To make this macro not only output a message to the IDE console, but also generate code to output the time of compilation whenever the program is ran (or, more exactly, when program execution reaches the place from which the macro was applied), we need to slightly modify the macro:

def now = DateTime.Now;
<[ WriteLine("The macro was called at {0}", $(now.ToString() : string)) ]>

We should add using for class System.Console.

Now, whenever the test application is ran, we will see the time at which the project was compiled (or, more precisely, the time at which the macro was called).

Examination of this code might raise the following questions:

  1. What does the construct "$(now.ToString() : string)" mean?
  2. Why did we have to convert the value of the variable "now" to a string?
  3. What would have happened, if the variable now in quotation was replaced with DateTime.Now (i.e. property access)?
The answer to the first question is — splicing. It allows us to add to a quotation something from a macro. This can be an expression (generated or received by the macro as parameters as happens most often) and values of ordinary (built-in) types like int and string.

The answer to the second question is - DateTime is not a built-in data type and it cannot be expressed with a literal (that is, a constant). Therefore, it cannot be placed in the generated program just like that. It can be done by assigning a value to DateTime right in the quotation, setting it with one of its numerous constructors, such as the one taking a tick count in Int64 format. However, this is entirely unnecessary for our purposes. All we need to do is display the date as a string.

The answer to the third is that the program would display the current time, since the code would have contained a property access, rather than a cached value. This way, we are free to choose whether to calculate a value or cache the result in the macro, if necessary.

Macro parameters

A macro can have several separate parameters and can terminate with a parameter array of arbitrary length (like a function with a variable number of parameters). Notice the last remark. This makes it impossible to overload macros by the number of parameters. Since almost all macro parameters are expressions (have type PExpr), macros cannot be overloaded by parameter types.

Instance of a macro with a variable number of parameters:

macro printf (format : string, params parms : array[expr])
{
  def (evals, refs) = make_evaluation_exprs (parse_format(format), parms);
  def seq = evals.Append(refs.Map(x => <[ Console.Write ($x) ]>));
  <[ {..$seq } ]>
}

Notice that parameter types are mainly expressions (PExpr). Besides PExpr, a macro can accept parameters of built-in types like int, double, and string. However, this is nothing more than help from the compiler, which folds constants for us. We cannot pass anything other that PExpr or a constant to a macro, because the application does not yet exist. It is only being compiled.

This means that a macro operates on expressions. It receives a list of expressions as input and returns an expression as output (or executes some useful side effect, such as printing the time of compilation in the example above).

Context

We are, of course, unable to interact with the program from a macro, but we can call the compiler itself. To access the compiler's structures, we need to use the special macro Nemerle.Macros.ImplicitCTX(). It returns a reference to a Typer class instance (remember, that expression level macros are expanded during the typing process).

Moreover, with the Typer class we can manually type some expression, we can gain access to important things, such as the GlobalEnv instance (via the Env property), associated with the current method, and through it to the ManagerClass object (Manager property), through which control over the build process of the whole project is carried out, and which contains almost all information about the types in the project. In this article we will repeatedly encounter the context and the data that we can receive through it.

Error recovery mode

If we make a mistake in the above test example, the expression we output to the IDE console will be displayed twice. This happens, because by default the Nemerle compiler works in the so-called fast mode. In this mode it does not keep track of various situations, rationally assuming that the project will be compiled successfully. If the compiler encounters an error, it switches to the error recovery mode. In this mode the compiler is able to produce more informative error diagnostics.

Unfortunately, the compiler processes the code again when it switches to the error recovery mode, which can lead to repeated macro code execution and, therefore, to duplicate messages and other unpleasant business. To avoid this, we should check the IsMainPass property of the Typer object (the reference to which is retrieved by the macro Nemerle.Macros.ImplicitCTX()). This property is "true" during the first compiler pass. If we delve into the details, this property always has the value "true" in IntelliSense mode (that is, when used by the IDE integration engine). If the compiler is functioning in the autonomous mode, the value of this property depends from the property InErrorMode (IsMainPass is "true" when InErrorMode is "false", and vice versa).

As such, to prevent the PrintTime macro from duplicating messages, it should be rewritten in the following manner:

macro PrintTime()
{
  def now = DateTime.Now;

  when (Nemerle.Macros.ImplicitCTX().IsMainPass)
    Message.Hint($"Compilation. Now $now");

  <[ System.Console.WriteLine(
      "The macro was called at {0}", $(now.ToString() : string)) ]>
}
WARNING
The property IsMainPass is missing in the compiler version 0.9.3. Therefore, you need a more recent compiler version to run this example.

Syntax extension macros

Macros that modify syntax are expression level macros extended with a construct defining syntax.

To avoid frightening you with the complexities, lets begin with simple examples located in the file https://github.com/rsdn/nemerle/tree/master/macros/core.n. Macro "if" is one of the simplest:

macro @if (cond, e1, e2)
syntax ("if", "(", cond, ")", e1, Optional (";"), "else", e2)
{
  <[
    match ($cond)
    {
      | true => $e1
      | _    => $e2
    }
  ]>
}

Even the uninitiated can easily see that the macro simply rewrites "if" as a "match". But, at the same time, the macro "if" introduces unique syntax. Using "if" is no longer like a function call. It looks like using an operator from a C-like language.

Here is how application of this macro looks:

if (a == 123)
{
  def msg = " 'a' has value 123";
  MessageBox.Show(msg);
}
else
{
  def msg = " ‘a’ has value other than 123";
  MessageBox.Show(msg);
}

Or, another (functional) method of application:

def msg = if (a == 123) " ‘a’ has value 123"
          else          " ‘a’ has value other than 123";
MessageBox.Show(msg);

We cannot use the given macro without an "else" construct, as we can in C, since "else" is a required keyword, and it is not possible to make it non-required, because it would introduce ambiguity (after all, "if" could be used in an expression!).

Syntax definition

First, lets figure out how exactly to define syntax. The special keyword "syntax" serves for this, and is followed by the list of accepted tokens in round brackets. These can include macro parameters (listed above) and text literals. Besides this, some tokens can be placed inside the construct Optional(), which makes it possible to drop some of the parameters.

In essence, syntax descriptions allow only "flat" syntax without recursive calls, iteration, and other facilities present in EBNF and parser builders. However, one can get around this by using the fact that separate macro parameters can contain expressions of any complexity (the only requirement is that they have to consist of finished token groups, i.e. a parameter cannot contain a construct missing a closing or an opening bracket). These expressions can be parsed already inside the macro by using pattern matching.

To demonstrate this technique, lets look at the following example.

Example: macro forindex

To demonstrate the technique of syntax recognition inside a macro, lets take a look at the implementation of the macro forindex, which is more complex than "if", but still relatively simple. The macro allows iterating through indices specified as "min < x < max", "min <= x < max", "min < x <= max", "min <= x <= max", where "min" is the minimum value of the index, "max" is the maximum, while "x" is the name of the variable which value will increase at each iteration step by a given value (one, by default). Expressions of any complexity can be substituted in place of "min" and "max" in this case. For simplicity, we will not process special cases (like negative ranges leading to infinite loops).

Notice that we cannot define the syntax of this macro linearly, without expression analysis, since inside it there can be four different comparison operator combinations (</<; <=/<; </<=; <=/<=). And we cannot write four macros to avoid this situation, due to programmer laziness. Also, other macros might have orders of magnitude more combinations.

If you still do not understand the meaning of this macro, take a look at the following example:

forindex (0 <= i < 10)
  WriteLine($"i=$i");

This construct is analogous to the following:

for (mutable i = 0; i < 10; i++)
  WriteLine($"i=$i");

Besides this, the macro will support the optional keyword "step", which can be used to specify the increment step.

For instance, the following code:

forindex (0 <= i < 10 step 2)
  forindex (0 <= k <= 2)
    WriteLine($"i=$i  k=$k");

outputs this to the console:

 i=1  k=0
 i=1  k=1
 i=1  k=2
 i=3  k=0
 i=3  k=1
 i=3  k=2
 i=5  k=0
 i=5  k=1
 i=5  k=2
 i=7  k=0
 i=7  k=1
 i=7  k=2
 i=9  k=0
 i=9  k=1
 i=9  k=2

Lets formalize the problem. We need to recognize correct expression syntax and rewrite corresponding code into the appropriate "for" code. By syntax correctness I mean controlling that the expression in brackets contains only three expressions (greater, lesser, variable name), separated by two operators "<" or "<=". Any other user-programmer's stunts should invoke only sympathetic error messages from the compiler.

The time to act is now! :)

Here is the syntax definition for our yet hypothetical macro looks:

macro ForIndex (expr, step, body)
syntax ("forindex", "(", expr, Optional ("step", step), ")", body)
{
  //...
WARNING
Note that all literals in our macro description automatically become keywords (in effect where the namespace containing the macro definition is open). As such, we will not be able to create a variable named "forindex" or "step". Be careful when choosing names!

As you can see, forindex syntax is significantly different from what you see when using the macro. It contains no description for the loop condition syntax. As I mentioned earlier, this has to do with the fact that the compiler's macro subsystem cannot recognize very complicated syntax definitions. Instead of trying to define the expression "expr1, oper1, index, oper2, expr2" I left the loop condition as a single parameter "expr". Here is the full definition of the macro:

macro ForIndex (expr, step, body)
syntax ("forindex", "(", expr, Optional ("step", step), ")", body)
{
  def step = if (step == null) <[ 1 ]> else step;

  match (expr)
  {
    | <[ $minExpr <= $i <= $maxExpr ]> => <[ for (mutable $i = $minExpr;     $i <= $maxExpr; $i += $step) $body ]>
    | <[ $minExpr <  $i <= $maxExpr ]> => <[ for (mutable $i = $minExpr + 1; $i <= $maxExpr; $i += $step) $body ]>
    | <[ $minExpr <  $i <  $maxExpr ]> => <[ for (mutable $i = $minExpr + 1; $i <  $maxExpr; $i += $step) $body ]>
    | <[ $minExpr <= $i <  $maxExpr ]> => <[ for (mutable $i = $minExpr;     $i <  $maxExpr; $i += $step) $body ]>
    | _                                => Message.Error(expr.Location, $"Syntax error '$expr'"); <[ () ]>
  }
}

The first line:

def step = if (step == null) <[ 1 ]> else step;

Checks whether the "step" parameter was used (it is declared as optional), and if the parameter was not defined (in which case it contains the value "null"), then the literal "1" should be used instead. In this way, if the step parameter is omitted, it will be equal to one.

WARNING
Macro parameters can be of two types: 1) expressions 2) constants of built-in types (int, double, string, etc.). But, at least the integer constants cannot have null values. This prevents their use as optional parameters.

From there, the value of the parameter "expr" undergoes parsing by pattern matching.

It is important that, when the macro receives an incorrect expression (for instance, containing operators ">" or ">=" instead of the ones needed), the match branch producing an error message will be executed. The function Message.Error() is used to produce error messages:

  | _ => Message.Error(expr.Location, $"Syntax error '$expr'"); <[ () ]>

With this, the macro returns the empty expression "()". As a result, the code does nothing (in fact, empty expressions are simply dropped by the compiler).

As you can see, we achieved high declarativity and, at the same time, efficient code.

We also managed to create a macro matching a fairly complex condition, despite the limited macro syntax definition. Quotation use for matching complex syntax constructs, in combination with their use for forming new code, grants extraordinary flexibility.

Quotation

Quotation allows code generation by assembly from ready-made pieces. Pieces of code (or even all of it) can be created directly in AST (without using quotation). Separate expressions can be joined into expression lists. Expression lists can be "expanded" inside other expressions with the special notation "..$variable". This notation is only permitted inside brackets (round, square, or curly). Like any other list, the expression list can contain zero or more elements. For instance, the following code creates a list of expressions, after which it expands it inside a code block (curly brackets).

mutable exps = [ <[ printf ("%d ", x) ]>, <[ printf ("%d ", y) ]> ];
exps = <[ def x = 1 ]> :: <[ def y = 2 ]> :: exps;
<[ {.. $exps } ]>
NOTE
Pay attention to the fact that expressions are described in reverse order. This is necessary, because elements are added to the list at the beginning.

This code could be replaced by the following:

<[
  def x = 1;
  def y = 2;
  printf ("%d ", y);
  printf ("%d ", x);
]>

The point is that this approach exists, because a code fragment can be generated by separate functions, or be supplied by external code (through macro parameters, for example).

Access to compiler internals

I have already mentioned that macros can access compiler context (the Typer object, specifically). Besides this, access to the project type list can be accessed, and much more (almost anything). I described the compiler internals in the first part of this article for a reason. This knowledge can help us squeeze the maximum out of the compiler.

Lets begin with a simple example.

using Nemerle.Compiler;
using PT = Nemerle.Compiler.Parsetree;
...
macro PrintVisibleLocalVariables(msg)
{
  def typer = Nemerle.Macros.ImplicitCTX();

  when (typer.IsMainPass)
  {
    Message.Hint("");
    Message.Hint($"Variables in $msg");

    def locals = typer.LocalContext.GetLocals();
    mutable count = 0;

    foreach ((name : PT.Name, value : LocalValue) in locals)
    {
      Message.Hint($"   Variable: $(name.Id) Type: '$(value.Type)' "
        "Kind: $(value.ValKind)");
        count++;
    }

    Message.Hint($"$count variables are visible at this point.");
  }

  <[ () ]>
}

If you add this macro the previous one and change the calling code like this:

Main() : void
{
  PrintTime();

  forindex (0 <= i < 10 step 2)
    forindex (0 <= k <= 2)
    {
      PrintVisibleLocalVariables("inside two loops");
      WriteLine($"i=$i  k=$k");
    }

  PrintVisibleLocalVariables("at the end of code");

  _ = ReadLine();
}

then the VS console will show the following messages:

 ------ Build started: Project: Test, Configuration: Debug Any CPU ------
 ...: Now 28.06.2007 20:05:06
 ...:
 ...: Variables in "inside two loops"
 ...:    Variable: _N_break Type: '?' Kind: a return from a block
 ...:    Variable: _N_continue Type: 'void' Kind: a return from a block
 ...:    Variable: _N_return Type: 'void' Kind: a return from a block
 ...:    Variable: i Type: 'int' Kind: a local value
 ...:    Variable: k Type: 'int' Kind: a local value
 ...:    Variable: _N_for_2262 Type: 'void -&gt; void' Kind: a local function
 ...:    Variable: _N_for_2268 Type: 'void -&gt; void' Kind: a local function
 ...: 7 variables are visible at this point.
 ...:
 ...: variables in "at the end of code"
 ...:    Variable: _N_return Type: 'void' Kind: a return from a block
 ...: 1 variables are visible at this point.
 Build succeeded -- 15 warnings. Build took: 00:00:00.5442024.

As you can see, besides the variables declared by us explicitly, the context also contains variables generated by macros. Macro "for" is converted into a set of functions and blocks. It is their names that we see. Just in case, here is the "for" loop code (which is also an interesting example in itself):

macro @for (init, cond, change, body)
syntax ("for", "(", Optional (init), ";", Optional (cond), ";",
    Optional (change), ")", body)
{
  def init   = if (init != null)   init   else <[ () ]>;
  def cond   = if (cond != null)   cond   else <[ true ]>;
  def change = if (change != null) change else <[ () ]>;

  def loop = Nemerle.Macros.Symbol(Util.tmpname ("for_"));

  <[
    $init;
  $("_N_break" : global):
  {
    def $(loop : name) () : void
    {
      when ($cond)
      {
        $("_N_continue" : global):
        {
          $body : void
        }
        $change;
        $(loop : name)()
      }
    }
    $(loop : name) ();
  }
  ]>
}

Retrieving information about the current method

Access to description of the method in which the macro is expanded can easily be retrieved through the context. A method is described by a MethodBuilder object. Through it one can access the property CurrentMethodBuilder. Here is an example of a macro printing information about the function signature.

using TT = Nemerle.Compiler.Typedtree;
...
macro DisplayMethodInfo()
{
  def typer = Nemerle.Macros.ImplicitCTX();

  when (typer.IsMainPass)
  {
    def mb        = typer.CurrentMethodBuilder;
    def name      = mb.Name;
    def parms     = mb.GetParameters().Map(
      (p : TT.TParameter) => $"$(p.Name) : $(p.ty)");
      def modifiers = mb.Modifiers.Attributes.ToString().ToLower();

      Message.Hint($"$modifiers $name(..$parms) : $(mb.ReturnType)");
  }

  <[ () ]>
}

When ran in the function Main:

...
Main(args : array[string]) : void
{
  DisplayMethodInfo();
  ...
}

the IDE console shows:

 Main.n(11,5):Warning: hint: static Main(args : array [string]) : void

Accessing types

The DeclaringType property of MethodBuilder can be used to access the type in which the method is declared. Furthermore, the method GetTypeBuilder() can be used to get a reference to a TypeBuilder with which the current type can be modified. This can be used, for instance, when you need to create a hidden helper method implementing some complicated logic to be used in your macro. This might be needed when the logic of the generated macro depends on some features of the type in which the method, in which the macro is expanded, is declared.

If you need to access other types in the project, you can use the property Env of the Typer object (the context). This property has type GlobalEnv. As the previous part of the article shows, GlobalEnv describes the current context (the list of keywords, open namespaces, etc.), but, besides this, it contains a reference to the namespace tree (property NameTree). This tree contains the hierarchy of namespaces, types, and macros included in the project or imported into it from libraries. In this way, once we retrieve access to them, we can get information on any type used in the projects, or even modify the tree itself (i.e. add new types), as well as the types in it.

NOTE
Type information can be used for generation of various generic methods, for including cross-cutting concerns (like AOP), and various other things.

Modification of existing types and addition of new ones allows us to create types and helper functions, as well as automate implementation of some patterns, such as the design patterns. Design pattern automation examples can be found in the file https://github.com/rsdn/nemerle/tree/master/macros/DesignPatterns.n.

Using information about the project's types will be described in detail in the next part of this article. Right now I will only say that this information is similar to reflection in .NET.

Typing

Sometimes, a macro developer faces the problem of typing a reference to a type given in untyped AST form. For this purpose, Typer contains the method MonoBindType. It accepts a reference to a PExpr and returns a reference to an FixedType. FixedType is a variant describing a reference to a type in Nemerle. The word Mono and the prefix M mean that this method works only with fully defined and specific types (and has no relation to the Mono project :) ). Because of this property of the function, variables and other expressions of types that have not yet been defined, cannot be passed.

This function could be used when, for some reason, you have only untyped AST, but still want to calculate the types. For instance, this situation can occur during development of some macro attribute working at the stage BeforeTypedMembers. At this stage, information about some method parameters, fields, and properties is not yet accessible, but can be calculated with MonoBindType. Of course, it would be simpler to use the macro at the stage WithTypedMembers, but then many facilities are limited. For instance, at this stage, type modification (especially anything to do with interface implementation and virtual method overriding) often leads to problems. It would be ideal to add one more stage, at which types would be accessible, but all dependencies would be calculated again, but, at least today, there is no such opportunity, and using the method MonoBindType at the stage BeforeTypedMembers is the only legal method for type modification based on information about the types.

Expression typing

Typing expressions is an even more complicated problem. Moreover, it is not merely difficult, but impossible. The problem has to do with that Nemerle infers types from their use. Therefore, if it is made to type a separate expression, the compiler would not be able to infer types correctly (since the code fragment may not contain enough information). However, on one hand, even such local type analysis is often sufficient, on the other — it is possible to defer processing until such moment as the types become fully known.

To type an expression, one can use the following Typer functions:

TypeExpr(e : PT.PExpr) : TExpr
TypeExpr(e : PT.PExpr, expected : TypeVar) : TExpr
TypeExpr(e : PT.PExpr, expected : TypeVar, is_toplevel_in_seq : bool) : TExpr

Typer itself can be retrieved with the specified function for getting the context, or by building it manually. In the latter case, the Typer's constructor must given a MethodBuilder instance as a parameter. Manual Typer construction makes it possible to type expressions in contexts of other methods (other than the one currently processed by the compiler).

The second and the third TypeExpr() function overloads make it possible to specify the expected type.

One must understand the core difference between MonoBindType and TypeExpr. The former method allows typing an expression containing a path to a type used in the project (the path might be relative and specified in a tricky way, but it is only a path). The result of MonoBindType's work is merely a reference to a type descriptor. The TypeExpr method converts an untyped AST (PExpr) into a typed AST (TExpr). Typed AST contains also references to types and other information, but looses, as a consequence of numerous code transformations performed by the compiler, some information retrieved from the code (which sometimes makes it difficult to use TExpr). An interesting feature is that the typing process adds references to the typed AST into untyped AST branches. This makes it possible to avoid messing around with the TExpr after typing, and work with the same PExpr, but referencing type information through the property TypedObject of the PExpr object.

Below we show a code fragment from the StringTemplate library that I developed in parallel with writing this article.

def pExpr = MainParser.ParseExpr (env, expr, loc);
match (pExpr)
{
  // The given pattern matches code consisting of three subexpressions.
  | <[ $seqExpr; $sep; $cnvFuncExpr; ]> =>
    // If cnvFuncExpr is a reference to a StringTemplate method,
    // replace it with a link to method <methodName>__StImpl that
    // works more efficiently (faster).
  
    // We find the __StImpl method corresponding to the current method
    // (if it is in Ast.UserData).
    def corespStImplMethod = (mb.Ast.UserData :> ClassMember.Function).Builder;
    // Type the "fake" expression to determine which type cnvFuncExpr
    // has.
    def expr = <[ NCollectionsUtils.MapLazy($seqExpr, $cnvFuncExpr) ]>;
    // Create Typer for the method described by the MethodBuilder
    // corespStImplMethod
    def typer = Typer(corespStImplMethod);

    _ = typer.TypeExpr(expr); // type the expression

    // After typing, the TypedObject parameter is searched for the
    // corresponding TExpr, that is, typed AST.
    match (cnvFuncExpr.TypedObject)
    {
      // TExpr is a reference to a static method  vvvvvvv   Declared in the same type...
      | TExpr.StaticRef(_, m is MethodBuilder, _) when m.DeclaringType.Equals(mb.DeclaringType) => 

        // If Ast.UserData contains...
        match (m.Ast.UserData)
        {
          // A reference to a function...
          | coresp is ClassMember.Function =>
            // then rewrite the code in such a way that it uses the
            // automatically generated method (named coresp.Name)
            // instead of the given method.
            <[ SB.AppendSeq(_builder, $seqExpr, $sep, _indent, this.$(coresp.Name : usesite)); ]>

           | _ => <[ SB.AppendSeq(_builder, $seqExpr, $sep,
            _indent, $cnvFuncExpr); ]>
        }
      | _ => <[ SB.AppendSeq(_builder, $seqExpr, $sep,
          _indent, $cnvFuncExpr); ]>
    }

This method matches expressions consisting of three subexpressions (statements) separated by semicolons. Then it tries to determine whether the cnvFuncExpr subexpression contains a reference a method declared in the same class, and whether this method is a StringTemplate method (such a method would have a reference to the automatically generated method in its Ast.UserData).

To find out whether cnvFuncExpr contains a reference to the required method, I form "fake" code using a reference to the method in the way I require, and type this expression in the context of the automatically generated method, connected to the current method's Ast.UserData. After typing, I "search" the cnvFuncExpr.TypedObject property (into which typing places a reference to the description of the real method) contents. If this is the required method, then the expression in TypedObject necessarily contains a reference to a static method, in which the method itself is declared in the same class as the current method. After this, it only remains to determine what is contained in the method's Ast.UserData, a function), then it only remains to rewrite the code of the call the way I require (leaving the auto-generated method instead of the one specified) and return it. In all other cases the specified method is used (the code still gets slightly rewritten, but the method is not replaced).

I understand that the intricacies of this algorithm can be difficult to understand to the uninitiated. However, the code is put outside its context. It is not necessary to understand all the intricacies at this point. One only needs to see a real example of how type information helps to choose the more efficient code alternative. In this case, the auto-generated method with which the original reference is replaced, uses the very same StringBuilder as the main method, which makes it possible to make text generation linear and avoid unnecessary calculation.

Deferring macro execution until the moment, at which type information becomes available

It would not be an exaggeration to say that today Nemerle possesses the most sophisticated type inference, at least, among the hybrid languages; that is, programming languages supporting both OOP and FP.

The type inference system in Nemerle can infer types from their use. The use can even be the very last expression in a method. It might even be indirect. The classic example of type inference power in Nemerle is with Dictionary[K,]:

using System;
using System.Console;
using System.Collections.Generic;

def dic = Dictionary();
WriteLine(dic.GetType().FullName);
dic.Add("Current date", DateTime.Now);

This code outputs the following to the console:

 System.Collections.Generic.Dictionary`2[[System.String, mscorlib, Version=2.0.0.0, 
 Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.DateTime, mscorlib,
 Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]

Notice that, even though the example does not specify a single type parameter for Dictionary, the compiler easily infers the variable's type.

This example is still very simple. In practice, the variable might be passed into an overloaded method even before any initialization. At this point, in order for the compiler to determine which of the method overloads should be called, it has to know what type the variable it receives as a parameter has. This makes the compiler infer variable types recursively. There is a list of expressions in Typer that await typing (containing the so-called unresolved type variables). When Typer finishes the first pass over a method, it looks through this list, and if the list is not empty, but modified relative to the previous pass, it runs the typing process again. At each typing pass, types of more variables can be inferred, which makes it possible to use this information in the following passes to infer types of other variables connected to this one. In essence, the compiler builds a set of relations between variables (or, more exactly, their types) and tries to iteratively resolve the typing puzzle.

If code is correct, then sooner or later, the list of expressions awaiting typing becomes empty, and the typing process completes. If the program contains code in which it is impossible to determine all types (because it contains cyclic dependencies or just errors), the compiler shows error messages and stops the typing process.

What does all of this have to do with macro developers? Everything. The thing is that macro expansion occurs during the typing process. The compiler iteratively expands all untyped AST (PExpr variant chain) branches, types them, and, should it encounter a PExpr.MacroCall, tries to expand the macro and type the expression resulting from its execution (again having the type PExpr). If this expression also contains references to macros, they are also expanded.

For expressions preceding the macro being expanded, the compiler attempts to infer the types, but, as has been said, this is not always possible. Therefore, while the macro is expanded, some type information is unavailable.

To demonstrate this, lets create a macro to gather type information and output this information to the IDE console. Here is the code for it:

macro PrintExpressionType(expr)
{
  def typer = Nemerle.Macros.ImplicitCTX();
  def tExpr = typer.TypeExpr(expr);

  def msg = $"The type '$tExpr' is "
    + match (tExpr.Type.Hint)
      {
        | Some(ty) => $"known as $(ty)"
        | None     =>  "unknown"
      }
    + " during macro expansion.";

  Message.Hint(msg);
}

Now lets apply it. In the following case:

mutable x = array[0];
PrintExpressionType(x);

The macro will print to the IDE console:

 The type 'x' is known as array [int-] during macro expansion.

But should the example be modified slightly:

mutable x = null;
PrintExpressionType(x);
x = array[0];

The macro "fails":

 The type 'x' is unknown during macro expansion.

What can we do?

It is possible to force the compiler to defer macro evaluation until, so to say, better times. This is done using the method DelayMacro from the same old macro context (the Typer object).

Lets modify the macro in the following manner:

macro PrintExpressionType(expr)
{
  def typer = Nemerle.Macros.ImplicitCTX();
  def tExpr = typer.TypeExpr(expr);

  def msg = $"The type '$tExpr' is "
    + match (tExpr.Type.Hint)
      {
        | Some(ty) => $"known as $(ty)"
        | None     =>  "unknown"
      }
  + " during macro expansion.";

  Message.Hint(msg);

  def result = typer.DelayMacro(fun (fail_loudly)
  {
    def tExpr = tExpr;
    match (tExpr.Type.Hint)
    {
      | Some(ty) =>
        // do something with the type
        Message.Hint($"The type '$tExpr' is known as $(ty) inside the macro.");
        Some(<[ () ]>)

      | None =>
        when (fail_loudly)
          Message.Error(expr.loc, $"The type for '$expr' cannot be inferred.");
        None()
    }
  });
  result
}

Now, execute the code again:

mutable x = null;
PrintExpressionType(x);
x = array[0];

This time, the macro outputs the following into the IDE console:

 The type 'x' is unknown during macro expansion.
 The type 'x' is known as array [int-] inside the macro.

The DelayMacro method has the following signature:

DelayMacro(resolve : bool -> option[PT.PExpr],
           expected : TypeVar = null) : PT.PExpr

It can be given a type variable describing which type is expected (if it is not specified, then a "fresh" type variable is used, allowing any type), and the function that will be called whenever the compiler performs subsequent typing passes.

In essence, the method DelayMacro returns the layered cake:

PExpr.Typed(TExpr.Delayed(...))

in which we get a reference to the Typer, to the function reference returned by us, the local context, the type variable, and all the necessary data. The compiler, when it finds a PExpr.Typed, simply expands its contents and places it in the typed AST it is building up. At the next typing pass, the compiler discovers TExpr.Delayed and attempts to call the passed reference to a function. If the function returns None(), the compiler tries to call it again at the next pass, and so on. If the function returns Some(PExpr(...)), then the PExpr is expanded and typed. Of course, we are free to use the available type information when constructing this PExpr.

This way, DelayMacro allows us to wait until the necessary type information becomes available and to generate code based on it. This is a very powerful capability, using which is made simpler by lambdas and closures.

Conclusion to the second part

In this part I told you about the general expression-level macro development principles. I hope that this will allow you to create beautiful macro-based solutions.

In the following parts (to be honest, I do not know how many there will be), I will touch on problems of meta-attribute creation and lexical macros. I will also continue to talk about interaction with the compiler's subsystems. I hope that when this article series is over, studying it will enable you to create macros no less capable than the ones written by the Nemerle compiler developers.

If you did not understand something in this article series , do not hesitate to write me about it (by email or at the forum). I will make an effort to consider all your comments, and make information accessible to the widest possible audience.

References

This text is based on an article from RSDN Magazine #2-2007 by Vlad Chistiakov (VladD2).

Clone this wiki locally