Skip to content

Latest commit

 

History

History
50 lines (43 loc) · 3.22 KB

CODEGEN.md

File metadata and controls

50 lines (43 loc) · 3.22 KB

Code Generation

I'd initially wanted to hide the code generation aspects of this project and just commit the generated .c and .h files in the src/ directory. However this is disingenuous and probably hinders other people playing with the code, so I've made it official.

There always was a generated/ directory created by make that hosts the generated flex and bison output, so it was simple enough to generate additional code into there instead of in to src/, and remove those generated files from git.

So what's the code generation for? It just removes the need to maintain a ton of boilerplate code around structures used by the project. There are a number of .yaml files in the src/ directory which basically declare C structs and their typedefs. At the time of writing they are:

  • anf.yaml A-Normal form structures input to the bytecode compiler, generated fron the lambda structures.
  • ast.yaml The abstract syntax tree generated by the parser.
  • builtins.yaml Support for registering built-ins.
  • lambda.yaml Lambda calculus-like structures generated from the AST.
  • tc.yaml Type checking support for Algorithm W.
  • tpmc.yaml Term Pattern Matching Compiler support structures, part of lambda conversion.

For example ast.yaml contains the declarations for the abstract syntax tree generated by the parser. A python script makeAST.py is given each of those yaml files and generates the same set of .c and .h files for each. Continuing with the ast.yaml example, from that file will be generated:

  • generated/ast.c a number of different functions for each structure:
    • new<struct>() functions that allocate memory and poulate the allocated structs with argument values.
    • copy<struct>() functions that will make a deep copy of the struct.
    • push<struct>() functions that will push data onto any declared 1-dimensional arrays.
    • mark<struct>() functions that will recursively mark the structures as part of garbage collection.
    • a generic mark function that will switch on the type and call the correct mark function.
    • free<struct> functions that will release unused memory when requested by the garbage collection system
    • a generic free function that dispatches to the correct free<struct> function.
    • a typename function that will return the name of a struct for debugging etc.
  • generated/ast_debug.c debugging utilities, namely:
    • print<struct>() functions that will recursively display a representation of the struct for debugging.
    • eq<struct>() functions that perform deep comparisons for testing and debugging.
  • generated/ast_debug.h header for ast_debug.c
  • generated/ast.h header for ast.c includes the structure declarations themselves.
  • generated/ast_objtypes.h macros collecting the enums and case statements that can then easily be incorporated into the memory management system.
  • docs/ast.md A mermaid graph of the structs and their relationships (WiP, occasionally useful).

This all means that it's relatively easy to make fairly sweeping changes to the various trees without all the tedious re-writing of the above.