From 5c7d3aa0cd4cb977c817884950d6097c72ab2077 Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Sat, 26 Sep 2020 12:01:31 +0100 Subject: [PATCH] PEP 638: Syntactic macros (#1616) Syntactic macro PEP --- pep-0638.rst | 586 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 586 insertions(+) create mode 100644 pep-0638.rst diff --git a/pep-0638.rst b/pep-0638.rst new file mode 100644 index 00000000000..c77a1462768 --- /dev/null +++ b/pep-0638.rst @@ -0,0 +1,586 @@ +PEP: 638 +Title: Syntactic Macros +Author: Mark Shannon +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 24-Sep-2020 + +Abstract +======== + +This PEP adds support for syntactic macros to Python. +A macro is a compile-time function that transforms +a part of the program to allow functionality that cannot be +expressed cleanly in normal library code. + +The term "syntactic" means that this sort of macro operates on the program's +syntax tree. This reduces the chance of mistranslation that can happen +with text-based substitution macros, and allows the implementation +of `hygienic macros`__. + +__ https://en.wikipedia.org/wiki/Hygienic_macro + +Syntactic macros allow libraries to modify the abstract syntax tree during compilation, +providing the ability to extend the language for specific domains without +adding to complexity to the language as a whole. + +Motivation +========== + +New language features can be controversial, disruptive and sometimes divisive. +Python is now sufficiently powerful and complex, that many proposed additions +are a net loss for the language due to the additional complexity. + +Although a language change may make certain patterns easy to express, +it will have a cost. Each new feature makes the language larger, +harder to learn and harder to understand. +Python was once described as `Python Fits Your Brain`__, +but that becomes less and less true as more and more features are added. + +Because of the high cost of adding a new feature, +it is very difficult or impossible to add a feature that would benefit only +some users, regardless of how many users, or how beneficial that feature would +be to them. + +The use of Python in data science and machine learning has grown very rapidly +over the last few years. +However, most of the core developers of Python do not have a background in +data science or machine learning. +This makes it extemely difficult for the core developers to determine if a +language extension for machine learning is worthwhile. + +By allowing language extensions to be modular and distributable, like libraries, +domain specific extensions can be implemented without negatively impacting +users outside of that domain. +A web developer is likely to want a very different set of extensions from +a data scientist. +We need to let the commmunity develop their own extensions. + +Without some form of user defined language extensions, +there will be a constant battle between those wanting to keep the +language compact and fitting their brains, and those wanting a new feature +that suits their domain or programming style. + +__ https://www.linuxjournal.com/article/4731) + + +Improving the expessiveness of libraries for specific domains +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +Many domains see repeated patterns that are difficult or impossible +to express as a library. +Macros can allow those patterns to be expressed in a more concise and less error +prone way. + +Trialing new language features +'''''''''''''''''''''''''''''' + +It is possible to demonstrate potential language extensions using macros. +For example, macros would have enabled the ``with`` statement and +``yield from`` expression to have been trialed. +Doing so might well have lead to a higher quality implementation +at first release, by allowing more testing +before those features were included in the language. + +It is nearly impossible to make sure that a new feature is completely reliable +before it is released; bugs relating to the ``with`` and ``yield from`` +features were still being fixed many years after they were released. + +Long term stability for the bytecode interpreter +'''''''''''''''''''''''''''''''''''''''''''''''' + +Historically new language features have been implemented by naive compilation +of the AST into new, complex bytecode instructions. +Those bytecodes have often had their own internal flow-control, performing +operations that that could, and should, have been done in the compiler. + +For example, +until recently flow control within the ``try``-``finally`` and ``with`` +statments was managed by complicated bytecodes with context dependent semantics. +The control flow within those statements is now implemented in the compiler, making +the interpreter simpler and faster. + +By implementing new features as AST transformations, the existing compiler can +generate the bytecode for a feature without having to modify the interpreter. + +A stable interpreter is necessary if we are to improve the performance and +portability of the CPython VM. + +Rationale +========= + +Python is both expressive and easy to learn; +it is widely recognized as the easiest to learn widely-used programming language. +However, it is not the most flexible. That title belongs to lisp. + +Because lisp is homoiconic, meaning that lisp programs are lisp data-structures, +lisp programs can be manipulated by lisp programs. +Thus much of the language can be defined in itself. + +We would like that ability in Python, +without the many parentheses that characterize lisp. +Fortunately, homoiconicity is not needed for a language to be able to +manipulate itself, all that is needed is the ability to manipulate programs +after parsing, but before translation to an executable form. + +Python already has the components needed. +The syntax tree of Python is available through the ``ast`` module. +All that is needed is a marker to tell the compiler that a macro is present, +and the ability for the compiler to callback into user code to manipulate the AST. + +Specification +============= + +Syntax +'''''' + +Lexical analysis +~~~~~~~~~~~~~~~~ + +Any sequence of identifier characters followed by an exclamation point +(exclamation mark, UK English) will be tokenized as a ``MACRO_NAME``. + +Statement form +~~~~~~~~~~~~~~ + +:: + + macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as" NAME ] [ ":" NEWLINE suite ] + +Expression form +~~~~~~~~~~~~~~~ + +:: + + macro_expr = MACRO_NAME "(" testlist ")" + +Resolving ambiguity +~~~~~~~~~~~~~~~~~~~ + +The statement form of a macro takes precedence, so that the code +``macro_name!(x)`` will be parsed as a macro statement, +not as an expression statement containing a macro expression. + +Semantics +''''''''' + +Compilation +~~~~~~~~~~~ + +Upon encountering a ``macro`` during translation to bytecode, +the code generator will look up the macro processor registered for the macro, +and pass the AST, rooted at the macro to the processor function. +The returned AST will then be substituted for the original tree. + +For macros with multiple names, +several trees will be passed to the macro processor, +but only one will be returned and substituted, +shorting the enclosing block of statements. + +This process can be repeated, +to enable macros to return AST nodes including other macros. + +The compiler will not look up a macro processor until that macro is reached, +so that inner macros do not need to have processors registered. +For example, in a ``switch`` macro, the ``case`` and ``default`` macros wouldn't +need processors registered as they would be eliminated by the ``switch`` processor. + +To enable definition of macros to be imported, +the macros ``import!`` and ``from!`` are predefined. +They support the following syntax: + +:: + + "import!" dotted_name "as" name + + "from!" dotted_name "import" name [ "as" name ] + +The ``import!`` macro performs a compile time import of ``dotted_name`` +to find the macro processor, then registers it under ``name`` +for the scope currently being compiled. + +The ``from!`` macro performs a compile time import of ``dotted_name.name`` +to find the macro processor, then registers it under ``name`` +(using the ``name`` following "as", if present) +for the scope currently being compiled. + +Note that, since ``import!`` and ``from!`` only define the macro for the +scope in which the import is present, all uses of a macro must be preceded by +an explicit ``import!`` or ``from!`` to improve clarity. + +For example, to import the macro "compile" from "my.compiler": + +:: + + from! my.compiler import compile + + +Defining macro processors +~~~~~~~~~~~~~~~~~~~~~~~~~ + +A macro processor is defined by a four-tuple, consisting of +``(func, kind, version, additional_names)`` + +* ``func`` must be a callable that takes ``len(additional_names)+1`` arguments, all of which are abstract syntax trees, and returns a single abstract syntax tree. +* ``kind`` must be one of the following: + + * ``macros.STMT_MACRO`` A statement macro where the body of the macro is indented. This is the only form which is allowed to have additional names. + * ``macros.SIBLING_MACRO`` A statement macro where the body of the macro is the next statement is the same block. The following statement is moved into the macro as its body. + * ``macros.EXPR_MACRO`` An expression macro. + +* ``version`` is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer. +* ``additional_names`` are the names of the additional parts of the macro, and must be a tuple of strings. + +:: + + # (func, _ast.STMT_MACRO, VERSION, ()) + stmt_macro!: + multi_statement_body + + # (func, _ast.SIBLING_MACRO, VERSION, ()) + sibling_macro! + single_statement_body + + # (func, _ast.EXPR_MACRO, VERSION, ()) + x = expr_macro!(...) + + # (func, _ast.STMT_MACRO, VERSION, ("subsequent_macro_part",)) + multi_part_macro!: + multi_statement_body + subsequent_macro_part!: + multi_statement_body + +The compiler will check that the syntax used matches the declared kind. + +For convenience, the decorator ``macro_processor`` is provided in the ``macros`` module to mark a function as a macro processor: + +:: + + def macro_processor(kind, version, *additional_names): + def deco(func): + return func, kind, version, additional_names + return deco + +Which can be used to help declare macro processors, for example: + +:: + + @macros.macro_processor(macros.STMT_MACRO, 1_08) + def switch(astnode): + ... + + +AST extensions +~~~~~~~~~~~~~~ + +Two new AST nodes will be needed to express macros, ``macro_stmt`` and ``macro_expr``. + +:: + + class macro_stmt(_ast.stmt): + + _fields = "name", "args", "importname", "asname", "body" + + class macro_expr(_ast.expr): + + _fields = "name", "args" + +In addition, macro processors will needs a means to express control flow or side effecting code, that produces a value. +To support this, a new ast node, called ``stmt_expr``, that combines a statement and expression will be added. +This new ast node will be a subtype of ``expr``, but include a statement to allow side effects. +It will be compiled to bytecode by compiling the statement, then compiling the value. + +:: + + class stmt_expr(_ast.expr): + + _fields = "stmt", "value" + +Hygiene and debugging +~~~~~~~~~~~~~~~~~~~~~ + +Macros processors will often need to create new variables. +Those variables need to named in such as way as to avoid contaminating the original code and other macros. +No rules for naming will be enforced, but to ensure hygiene and help debugging, the following naming scheme is recommended: + +* All generated variable names should start with a ``$`` +* Purely artificial variable names should start ``$$mname`` where ``mname`` is the name of the macro. +* Variables derived from real variables should start ``$vname`` where ``vname`` is the name of the variable. +* All variable names should include the line number and the column offset, separated by an underscore. + +Examples: + +* Purely generated name: ``$$macro_17_0`` +* Name derived from a variable for an expression macro: ``$var_12_5`` + + +Examples +'''''''' + +Compile time checked data structures +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is common to encode tables of data in Python as large dictionaries. +However, these can be hard to maintain and error prone. +Macros allow this data to be written in a more readable format. +Then, at compile time, it can be verified and converted to an efficient format. + +For example, suppose we have a two dictionary literals mapping codes to names, +and vice versa. +This is error prone, as the dictionaries may have duplicate keys, +or one table may not be the inverse of the other. +A macro could generate the two mappings from a single table and, +at the same time, verify that no duplicates are present. + +:: + + color_to_code = { + "red": 1, + "blue": 2, + "green": 3, + } + + code_to_color = { + 1: "red", + 2: "blue", + 3: "yellow", # error + } + +would become: +:: + + bijection! color_to_code, code_to_color: + "red" = 1 + "blue" = 2 + "green" = 3 + +Domain specific extensions +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Where I see macros having real value is in specific domains, not in general purpose language features. + +For example, parsers. +Here's part of a parser definition for Python, using macros: + +:: + + choice! single_input: + NEWLINE + simple_stmt + sequence!: + compound_stmt + NEWLINE + +Compilers +~~~~~~~~~ + +Runtime compilers, such as ``numba`` have to reconstitute the Python source, or attempt to analyze the bytecode. +It would be simpler and more reliable for them to get the AST directly: + +:: + + from! my.jit.library import jit + + jit! + def func(): + ... + +Matching symbolic expressions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When matching something representing syntax, such a Python ``ast`` node, or a ``sympy`` expression, +it is convenient to match against the actual syntax, not the data structure representing it. +For example, a calculator could be implemented using a domain specific macro for matching syntax: + +:: + + from! ast_matcher import match + + def calculate(node): + if isinstance(node, Num): + return node.n + match! node: + case! a + b: + return calculate(a) + calculate(b) + case! a - b: + return calculate(a) - calculate(b) + case! a * b: + return calculate(a) * calculate(b) + case! a / b: + return calculate(a) / calculate(b) + +Which could be converted to: + +:: + + def calculate(node): + if isinstance(node, Num): + return node.n + $$match_4_0 = node + if isinstance($$match_4_0, _ast.Add): + a, b = $$match_4_0.left, $$match_4_0.right + return calculate(a) + calculate(b) + elif isinstance($$match_4_0, _ast.Sub): + a, b = $$match_4_0.left, $$match_4_0.right + return calculate(a) - calculate(b) + elif isinstance($$match_4_0, _ast.Mul): + a, b = $$match_4_0.left, $$match_4_0.right + return calculate(a) * calculate(b) + elif isinstance($$match_4_0, _ast.Div): + a, b = $$match_4_0.left, $$match_4_0.right + return calculate(a) / calculate(b) + +Zero cost markers and annotations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Annotations, either decorators or PEP 3107 function annotations, have a runtime cost +even if they serve only as markers for checkers or as documentation. + +:: + + @do_nothing_marker + def foo(...): + ... + +can be replaced with the zero cost macro: + +:: + + do_nothing_marker!: + def foo(...): + ... + +Protyping langauge extensions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Although macros would be most valuable for domain specific extensions, it is possible to +demonstrate possible language extensions using macros. + +f-strings: +.......... + +The f-string ``f"..."`` could be implemented as macro as ``f!("...")``. +Which is not quite as nice to read, but would still be useful for experimenting with. + +Try finally statement: +...................... + +:: + + try_!: + body + finally!: + closing + +Would be translated roughly as: + +:: + + try: + body + except: + closing + else: + closing + +Note: + Care must be taken to handle returns, breaks and continues correctly. + The above code is merely illustrative. + +With statement: +............... + +:: + + with! open(filename) as fd: + return fd.read() + +The above would require handling ``open`` specially. +An alternative that would be more explicit, would be: + +:: + + with! open!(filename) as fd: + return fd.read() + +Macro definition macros +~~~~~~~~~~~~~~~~~~~~~~~ + +Languages that have syntactic macros usually provide a macro for defining macros. +This PEP intentionally does not do that, as it is not yet clear what a good design +would be, and we want to allow the community to define their own macros. + +One possible form could be: + +:: + + macro_def! name: + input: + ... # input pattern, defining meta-variables + output: + ... # output pattern, using meta-variables + + +Backwards Compatibility +======================= + +This PEP is fully backwards compatible. + +Performance Implications +======================== + +For code that doesn't use macros, there will be no effect on performance. + +For code that does use macros and has already been compiled to bytecode, +there will be some slight overhead to check that the version +of macros used to compile the code match the imported macro processors. + +For code that has not been compiled, or compiled with different versions +of the macro processors, then there would be the usual overhead of bytecode +compilation, plus any additional overhead of macro processing. + +It is worth noting that the speed of source to bytecode compilation +is largely irrelevant for Python performance. + +Implementation +============== + +In order to allow transformation of the AST at compile time by Python code, +all AST nodes in the compiler will have to be Python objects. + +To do that efficiently, will mean making all the nodes in the ``_ast`` module +immutable, so as not degrade performance by much. +They will need to be immutable to guarantee that the AST remains a *tree* +to avoid having to support cyclic GC. +Making them immutable means they will not have a +``__dict__`` attribute, making them compact. + +AST nodes in the ``ast`` module will remain mutable. + +Currently, all AST nodes are allocated using an arena allocator. +Changing to use the standard allocator might slow compilation down a little, +but has advantages in terms of maintenance, as much code can be deleted. + +Reference Implementation +'''''''''''''''''''''''' + +None as yet. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: +