Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler restructuring #1

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Compiler restructuring #1

wants to merge 30 commits into from

Commits on Jan 13, 2022

  1. Update README

    Jackojc committed Jan 13, 2022
    Configuration menu
    Copy the full SHA
    763de2e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    42c237a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9c443c1 View commit details
    Browse the repository at this point in the history
  4. Remove old includes

    Jackojc committed Jan 13, 2022
    Configuration menu
    Copy the full SHA
    48124f4 View commit details
    Browse the repository at this point in the history
  5. Update examples

    Jackojc committed Jan 13, 2022
    Configuration menu
    Copy the full SHA
    9d8132d View commit details
    Browse the repository at this point in the history
  6. Major restructuring

    I've reworked most of the compiler in order to make it more
    maintainable and less complex.
    
    The new compiler has a text based IR which allows for a
    modular architecture. This will allow the user to plug
    in their own passes easily and choose the ordering of
    existing passes etc.
    
    Currently, there is no codegen for x86-64 implemented but
    this is just a foundation for the compiler going forward.
    
    Plans are to implement another IR in three-address-code form
    for backends to consume which should ease register allocation
    and lowering to assembly.
    
    - New modular architecture
    - Simplified implementation
    - Stack based IR with textual format
    Jackojc committed Jan 13, 2022
    Configuration menu
    Copy the full SHA
    d1fcad1 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2022

  1. Renamed core words to cp, mv & rm and added stdlib.klx.

    Renamed the core words to try and avoid too many
    naming conflicts but also to describe their intent
    better.
    
    Added a very basic stdlib file which gives access to some
    common kind of arithmetic and stack manipulation words.
    
    Updated the syntax highlighting file in accordance
    with the above and also added a new highlighting file
    for the klaxon IR format (KIR).
    
    Renamed the compile.sh script to klx and we also run
    the m4 preprocessor on the source file before passing
    it to klaxon to enable use of include and macros.
    
    Fixed an issue when parsing type annotations.
    Previously, annotations which had no out values would
    cause the compiler to generate an error that it expected
    an identifier but returning no values is valid.
    
    Added a dedicated locale string for type annotation errors.
    
    Removed extraneous space being printed in the KIR serialiser.
    Jackojc committed Jan 16, 2022
    Configuration menu
    Copy the full SHA
    d3ce1bc View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2022

  1. Add CFG visualiser

    Jackojc committed Jan 17, 2022
    Configuration menu
    Copy the full SHA
    8b231c9 View commit details
    Browse the repository at this point in the history
  2. Update KIR highlighter

    Jackojc committed Jan 17, 2022
    Configuration menu
    Copy the full SHA
    e1ee9f3 View commit details
    Browse the repository at this point in the history
  3. Fix a bug in print formatting

    Print formatting previously would not escape
    sequences of closing braces correctly.
    
    For example: `printlnfmt("{}}}", "foo");` _should_
    have produced `foo}` but instead produced `foo`.
    Jackojc committed Jan 17, 2022
    Configuration menu
    Copy the full SHA
    1c6f083 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2022

  1. Add new ops to stdlib

    Jackojc committed Jan 18, 2022
    Configuration menu
    Copy the full SHA
    5f6fc82 View commit details
    Browse the repository at this point in the history
  2. Remove arg & out instructions and fixed loop code gen

    Removed the arg and out instructions in favour of
    just keeping the cp, mv and rm instructions around
    for the backend to work with.
    
    Loops had incorrect code generation.
    Jackojc committed Jan 18, 2022
    Configuration menu
    Copy the full SHA
    856464d View commit details
    Browse the repository at this point in the history
  3. Add indirect branch elimination pass

    Added a new optimisation pass to collapse indirect
    jumps to a block which then unconditionally jumps
    to another block.
    
    This is very useful for heavily nested if-else chains.
    
    Give up when trying to do constant folding beyond
    a function call. The problem with trying to fold
    beyond the bounds of a call is that the function
    call itself may produce some values that are only
    known at compile time in which case there is nothing
    for the constant folder to reduce at compile time.
    
    Removed any notion of arg and out instructions.
    Jackojc committed Jan 18, 2022
    Configuration menu
    Copy the full SHA
    0e3a088 View commit details
    Browse the repository at this point in the history
  4. Update klx script

    Jackojc committed Jan 18, 2022
    Configuration menu
    Copy the full SHA
    c9acdb6 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5d1540b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    41980ee View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ee4ba78 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    878c39e View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2022

  1. Configuration menu
    Copy the full SHA
    d6aca50 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2022

  1. Configuration menu
    Copy the full SHA
    e088654 View commit details
    Browse the repository at this point in the history
  2. Restructuring optimiser

    Jackojc committed Jan 22, 2022
    Configuration menu
    Copy the full SHA
    eab1444 View commit details
    Browse the repository at this point in the history
  3. Update klx runner script

    Jackojc committed Jan 22, 2022
    Configuration menu
    Copy the full SHA
    9f28215 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2022

  1. Cleanup and simplification of lib.hpp & stack effect annotations in t…

    …he IR
    
    Unified Tokens, Ops and IR Tokens into a single enum class
    so we longer need to do pesky mappings between them. We
    can just use the same enum value right through from the lexer
    to the IR generation.
    
    Merged the lexer implementation for the IR and source
    representations into the same class and now just use
    a templated flag to pick the implementation we want
    which reduces a lot of code duplication.
    
    Added some constructor overloads for Op so that
    we can construct instructions that need both a
    string view and integer field.
    
    Blocks, calls and definitions now have a stack effect
    annotation in the IR for simplifying consumption by
    a backend.
    
    Added instruction_block and instruction_end functions to
    simplify annotating blocks with their stack effect during
    code generation.
    
    Block numbers are now function local and start from zero
    instead of being globally numbered like before.
    
    Use more consistent naming for library functions and types.
    
    Renamed EOF to TERMINATOR to avoid conflicting with
    the standard macro of the same name.
    Jackojc committed Feb 1, 2022
    Configuration menu
    Copy the full SHA
    5bf745b View commit details
    Browse the repository at this point in the history
  2. Rename main.cpp to klx.cpp

    Jackojc committed Feb 1, 2022
    Configuration menu
    Copy the full SHA
    58d2857 View commit details
    Browse the repository at this point in the history
  3. Major restructuring of optimisation passes

    Function inlining and dead code elimination were previously
    broken and the code was awkward to work with due to having
    to try and preserve consistency in the same buffer.
    
    Three issues have been addressed in this commit:
    1. Functioning inlining now renumbers blocks correctly
    2. Dead code elimination now only retains functions with "main" as an ancestor
    3. Iterators to the IR are now stable due to the use of a double buffer approach
    
    Function inlining was broken previously due to not renumbering blocks
    after they were inlined. This would mean that multiple calls to the
    same function which had been inlined woudl result in duplicate
    blocks which would break the control flow of the program.
    
    The new inliner also doesn't count block/def/end/ret instructions.
    
    Dead code elimination was previously broken due to preserving
    functions which were called but not by a common ancestor
    ("main" in this case). All it would take to preserve a function
    was to call it _anywhere_ in the program even if the parent function
    of that call was itself dead.
    
    We now use a double buffering like approach to optimisation passes.
    The original IR is passed in and supposed to remain unchanged while
    the output IR is supposed to be mutated and will become the next
    input buffer for the next pass. This gives us some rather nice
    properties like stability of reference which makes inlining in
    particular very easy.
    
    Constant folding and indirect branch elimination have yet to be
    moved over to the new architecture but should be fairly easy.
    Jackojc committed Feb 1, 2022
    Configuration menu
    Copy the full SHA
    280df65 View commit details
    Browse the repository at this point in the history
  4. Update control flow graph generator to use relative blocks

    Updated the CFG generator to work with relative block numbers
    by concatenating the function name to the block ID.
    
    Also added weights to the nodes to try and make the generated
    graphs look a bit nicer.
    Jackojc committed Feb 1, 2022
    Configuration menu
    Copy the full SHA
    d233401 View commit details
    Browse the repository at this point in the history
  5. Simplified lexing greatly and avoid hard-coding strings

    Using a simpler architecture for the lexer which allows us
    to specify token strings in a single place and have it work
    everywhere. This is in contrast to the previous lexer which
    required updating multiple unrelated pieces of the code in
    order to change tokens.
    
    Switch from shorthand names `cp`, `mv` and `rm` to
    `copy`, `move` and `remove`.
    
    Fixed issue where the IR parser would only accept non-keyword
    identifiers for user defined functions. This meant functions
    named "block" for example would result in a parsing error.
    
    "copy", "move" and "remove" are now considered proper keywords
    and as such cannot be used as identifiers.
    
    Removed any hard-coding of strings in calls to error functions
    and instead look up the appropriate string representation of
    a token instead.
    Jackojc committed Feb 1, 2022
    Configuration menu
    Copy the full SHA
    48688f3 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    37a28dc View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2684e78 View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2022

  1. Better symbol names

    Jackojc committed Feb 2, 2022
    Configuration menu
    Copy the full SHA
    bd65f1a View commit details
    Browse the repository at this point in the history