Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement new IR for vyper (venom IR) #3659

Merged
merged 488 commits into from
Dec 1, 2023

Conversation

harkal
Copy link
Collaborator

@harkal harkal commented Oct 24, 2023

What I did

  • Designed a new IR for Vyper aiming to replace the current s-expr IR. The new IR (Venom) is SSA and oriented towards EVM and stack based machines.
  • Created a basic adapter that translates the SIR to the Venom IR.
  • EVM code generator from Venom IR
  • Use the command line parameter --experimental-codegen to activate the new path

How I did it

This branch contains the ongoing work to update the current IR implementation of Vyper to a new one. The Venom IR is departing from the original s-expression format to a more basic block oriented with strict enforcement of SSA to enable easier optimisations at a later stage.

The implementation is taking as input the original IR and does not stem from the AST.

At it's current state of implementation the code below:

@internal
def foo1(x: uint256, y: uint256) -> uint256:
    res: uint256 = x + y
    a: uint256 = x
    res = res * y + (x + y)
    a = (res - y) * 16
    if a > 10:
        a = 10 + y
        res = res - 10
    return res

@internal
def foo2(x: uint256, y: uint256) -> uint256:
    return x * y

@external
def bar(one: Bytes[16], two: Bytes[16], y: uint256) -> Bytes[16]:
    a: uint256 = self.foo2(self.foo1(10, y), 30)
    if a > 10:
        return one
    return two

Compiles down to the new SSA based IR below:

Expand Venom IR
IRFunction: global
global:  IN=[] OUT=[selector_bucket_0, fallback] 
    %1 = calldataload 0
    %2 = shr 224, %1
    jmp label %selector_bucket_0

selector_bucket_0:  IN=[global] OUT=[1, 2] 
    %3 = xor %2, 1579456981
    %4 = iszero %3
    jnz label %1, label %2, %4

1:  IN=[selector_bucket_0] OUT=[9] 
    jmp label %9

2:  IN=[selector_bucket_0] OUT=[external_bar__Bytes_16__Bytes_16__uint256__common] 
    %5 = callvalue 
    %6 = calldatasize 
    %7 = lt %6, 164
    %8 = or %5, %7
    %9 = iszero %8
    assert %9
    jmp label %external_bar__Bytes_16__Bytes_16__uint256__common

external_bar__Bytes_16__Bytes_16__uint256__common:  IN=[2] OUT=[4, 5, internal_foo1__uint256_uint256__runtime, internal_foo2__uint256_uint256__runtime] 
    %10 = calldataload 4
    %11 = add 4, %10
    %12 = calldataload %11
    %13 = gt %12, 16
    %14 = iszero %13
    assert %14
    %15 = calldataload %11
    %16 = add %15, 32
    %17 = alloca 192, 64
    calldatacopy %17, %11, %16
    %18 = calldataload 36
    %19 = add 4, %18
    %20 = calldataload %19
    %21 = gt %20, 16
    %22 = iszero %21
    assert %22
    %23 = calldataload %19
    %24 = add %23, 32
    %25 = alloca 256, 64
    calldatacopy %25, %19, %24
    %26 = calldataload 68
    %27 = invoke 352, %26, 10, label %internal_foo1__uint256_uint256__runtime
    %28 = invoke 384, 30, %27, label %internal_foo2__uint256_uint256__runtime
    %29 = %28
    %30 = lt %29, 11
    %31 = iszero %30
    jnz label %4, label %5, %31

4:  IN=[external_bar__Bytes_16__Bytes_16__uint256__common] OUT=[7] 
    jmp label %7

5:  IN=[external_bar__Bytes_16__Bytes_16__uint256__common] OUT=[] 
    %32 = 32
    mstore 352, %32
    %33 = add 352, %32
    %34 = alloca 192, 64
    %35 = mload %34
    mstore %33, %35
    %36 = add %33, 32
    %37 = alloca 192, 64
    %38 = add 32, 192
    %39 = mload %38
    mstore %36, %39
    %40 = mload %33
    %41 = add %33, 32
    %42 = add %41, %40
    %43 = calldatasize 
    %44 = sub 0, %40
    %45 = and %44, 31
    calldatacopy %42, %43, %45
    %46 = mload %33
    %47 = add 32, %46
    %48 = ceil32 %47
    %49 = add %32, %48
    %50 = %49
    return 352, %50

7:  IN=[4] OUT=[] 
    %51 = 32
    mstore 352, %51
    %52 = add 352, %51
    %53 = alloca 256, 64
    %54 = mload %53
    mstore %52, %54
    %55 = add %52, 32
    %56 = alloca 256, 64
    %57 = add 32, 256
    %58 = mload %57
    mstore %55, %58
    %59 = mload %52
    %60 = add %52, 32
    %61 = add %60, %59
    %62 = calldatasize 
    %63 = sub 0, %59
    %64 = and %63, 31
    calldatacopy %61, %62, %64
    %65 = mload %52
    %66 = add 32, %65
    %67 = ceil32 %66
    %68 = add %51, %67
    %69 = %68
    return 352, %69

9:  IN=[1] OUT=[fallback] 
    jmp label %fallback

fallback:  IN=[global, 9] OUT=[] 
    revert 0, 0

internal_foo1__uint256_uint256__runtime:  IN=[external_bar__Bytes_16__Bytes_16__uint256__common] OUT=[12, 13] 
    %70 = param  <x>
    %71 = param  <y>
    %72 = param  <return_buffer>
    %73 = param  <return_pc>
    %74 = add %70, %71
    %75 = lt %74, %70
    %76 = iszero %75
    assert %76
    %77 = %74
    %79 = mul %77, %71
    %80 = div %79, %71
    %81 = eq %80, %77
    %82 = iszero %71
    %83 = or %81, %82
    assert %83
    %84 = add %70, %71
    %85 = lt %84, %70
    %86 = iszero %85
    assert %86
    %87 = add %79, %84
    %88 = lt %87, %79
    %89 = iszero %88
    assert %89
    %90 = %87
    %91 = sub %90, %71
    %92 = gt %91, %90
    %93 = iszero %92
    assert %93
    %94 = shl 4, %91
    %95 = shr 4, %94
    %96 = xor %95, %91
    %97 = iszero %96
    assert %97
    %98 = %94
    %99 = lt %98, 11
    %100 = iszero %99
    jnz label %12, label %13, %100

12:  IN=[internal_foo1__uint256_uint256__runtime] OUT=[14] 
    jmp label %14

13:  IN=[internal_foo1__uint256_uint256__runtime] OUT=[14] 
    %101 = add 10, %71
    %102 = lt %101, 10
    %103 = iszero %102
    assert %103
    %105 = sub %90, 10
    %106 = gt %105, %90
    %107 = iszero %106
    assert %107
    %108 = %105
    jmp label %14

14:  IN=[12, 13] OUT=[] 
    %110 = select %90, label %12, %108, label %13
    mstore %72, %110
    %111 = mload %72
    ret %73, %111

internal_foo2__uint256_uint256__runtime:  IN=[external_bar__Bytes_16__Bytes_16__uint256__common] OUT=[] 
    %112 = param  <x>
    %113 = param  <y>
    %114 = param  <return_buffer>
    %115 = param  <return_pc>
    %116 = mul %112, %113
    %117 = div %116, %113
    %118 = eq %117, %112
    %119 = iszero %113
    %120 = or %118, %119
    assert %120
    mstore %114, %116
    %121 = mload %114
    ret %115, %121

Resulting in the opcodes:

Expand

PUSH0 CALLDATALOAD PUSH1 0xE0 SHR PUSH4 0x5E2499D5 SWAP1 XOR ISZERO PUSH2 0x015 JUMPI PUSH0 PUSH0 SWAP1 REVERT JUMPDEST PUSH1 0xA4 CALLDATASIZE LT CALLVALUE OR PUSH2 0x1B9 JUMPI PUSH1 0x10 PUSH1 0x4 CALLDATALOAD PUSH1 0x4 ADD DUP1 CALLDATALOAD SWAP2 SWAP1 SWAP2 GT PUSH2 0x1B9 JUMPI PUSH1 0x20 DUP2 CALLDATALOAD ADD PUSH1 0xC0 DUP1 SWAP3 SWAP1 CALLDATACOPY PUSH1 0x10 PUSH1 0x24 CALLDATALOAD PUSH1 0x4 ADD DUP1 CALLDATALOAD SWAP2 SWAP1 SWAP2 GT PUSH2 0x1B9 JUMPI PUSH1 0x20 DUP2 CALLDATALOAD ADD PUSH2 0x10 DUP1 SWAP3 SWAP1 CALLDATACOPY PUSH1 0xA PUSH1 0x44 CALLDATALOAD PUSH2 0x160 PUSH2 0x06A PUSH2 0x110 JUMP JUMPDEST PUSH1 0x1E PUSH2 0x180 PUSH2 0x077 PUSH2 0x19C JUMP JUMPDEST PUSH1 0xB SWAP1 LT ISZERO PUSH2 0x0C8 JUMPI PUSH1 0x20 PUSH2 0x160 DUP2 SWAP1 MSTORE PUSH2 0x160 DUP2 SWAP1 ADD SWAP2 MLOAD DUP3 MSTORE PUSH1 0x20 DUP3 ADD PUSH2 0x10 PUSH1 0x20 ADD MLOAD SWAP1 MSTORE DUP2 MLOAD PUSH1 0x1F PUSH0 DUP3 SWAP1 SUB AND CALLDATASIZE PUSH1 0x20 DUP6 ADD SWAP3 SWAP1 SWAP3 ADD SWAP1 SWAP2 SWAP1 CALLDATACOPY SWAP1 MLOAD PUSH1 0x20 ADD PUSH1 0x1F ADD PUSH1 0x1F NOT AND SWAP1 ADD PUSH2 0x160 RETURN JUMPDEST PUSH1 0x20 PUSH2 0x160 DUP2 SWAP1 MSTORE PUSH2 0x160 DUP2 SWAP1 ADD SWAP3 MLOAD DUP4 MSTORE PUSH1 0x20 DUP4 ADD PUSH1 0xC0 PUSH1 0x20 ADD MLOAD SWAP1 MSTORE DUP3 MLOAD PUSH1 0x1F PUSH0 DUP3 SWAP1 SUB AND CALLDATASIZE PUSH1 0x20 DUP7 ADD SWAP3 SWAP1 SWAP3 ADD SWAP1 SWAP2 SWAP1 CALLDATACOPY SWAP2 MLOAD PUSH1 0x20 ADD PUSH1 0x1F ADD PUSH1 0x1F NOT AND SWAP1 SWAP2 ADD PUSH2 0x160 RETURN JUMPDEST DUP3 DUP5 ADD DUP5 DUP2 LT PUSH2 0x1B9 JUMPI DUP4 ISZERO SWAP1 DUP5 DUP2 MUL DUP6 DUP2 DIV SWAP2 SWAP1 SWAP2 EQ SWAP2 SWAP1 SWAP2 OR ISZERO PUSH2 0x1B9 JUMPI DUP4 DUP6 ADD DUP1 SWAP6 SWAP1 LT PUSH2 0x1B9 JUMPI DUP1 SWAP5 SWAP1 ADD DUP1 SWAP5 SWAP1 LT PUSH2 0x1B9 JUMPI SWAP3 DUP3 DUP2 SUB DUP2 DUP2 GT PUSH2 0x1B9 JUMPI PUSH1 0x4 DUP2 SWAP1 SHL PUSH1 0x4 DUP2 SWAP1 SHR SWAP2 SWAP1 SWAP2 XOR PUSH2 0x1B9 JUMPI PUSH1 0xB SWAP1 LT ISZERO SWAP4 SWAP2 SWAP1 SWAP4 PUSH2 0x17B JUMPI SWAP1 SWAP2 SWAP3 JUMPDEST DUP2 MSTORE MLOAD SWAP1 JUMP JUMPDEST PUSH1 0xA PUSH1 0xA SWAP4 SWAP1 ADD SWAP3 SWAP1 SWAP3 LT PUSH2 0x1B9 JUMPI PUSH1 0xA DUP4 SUB DUP1 SWAP4 SWAP1 GT PUSH2 0x1B9 JUMPI SWAP2 PUSH2 0x175 JUMP JUMPDEST DUP3 ISZERO DUP4 DUP6 MUL DUP1 SWAP5 SWAP1 DIV SWAP5 SWAP1 SWAP5 EQ SWAP4 SWAP1 SWAP4 OR ISZERO PUSH2 0x1B9 JUMPI SWAP1 DUP2 MSTORE MLOAD SWAP1 JUMP JUMPDEST PUSH0 DUP1 REVERT

The size above is 446 bytes. Compiling with the original codegen the size was 530 bytes.

The above basic block based IR is then passed to the next step that outputs the EVM assembly, that is then again passed to the original compiler's assembler to produce the final opcodes.

The compiler now does a full liveness analysis on the IR to gain the information for optimising it's assembly output. The stack scheduling is performed with the traversal of the DFG transform of the IR, resulting it minimal operant stack manipulation requirements. Further scheduling optimisations are in order.

Currently, we are seeing improvements in the order of 20%-30% reduction in the size of the code emitted, relative to the original SIR path, which should result in equivalent reduction in gas usage.

More optimization passes will be applied to IR, like deadcode elimination, constant propagation, scallar evolution, etc. Additionally the stack scheduler of the code generator can also be optimized further to minimize gas usage and size.

NOTES:

  • Variables that are allocated initially in memory by the previous compiler step, are all promoted to variables, and will be in full context assigned to stack or memory again at the IR -> bytecode passes of the compiler, or even eliminated all together
  • There are two scopes. The global and the function level.
  • The codegen rewrite aims to also changes the calling convention used for internal calls
  • Conversion from SIR to IR is still not complete or final, nor the main aim of this. It's the stepping stone to implementing and testing the IR and eventually have the frontend emit IR directly.
  • The SIR to IR translator currently properly translates code to allow for the following tests to pass: tests/examples/company/test_company.py tests/examples/tokens/test_erc20.py tests/examples/auctions/test_simple_open_auction.py tests/examples/storage/test_storage.py tests/examples/storage/test_advanced_storage.py tests/examples/name_registry/test_name_registry.py tests/examples/crowdfund/test_crowdfund_example.py

Commit Message

feat: implement new IR for vyper (venom IR)

this commit implements a new IR for the vyper compiler. most of the
implementation is self-contained in the `./vyper/venom/` directory.

Venom IR is LLVM-"inspired", although we do not use LLVM on account of:

1) not wanting to introduce a large external dependency
2) no EVM backend exists for LLVM, so we would have to write one
   ourselves. see prior work at https://github.com/etclabscore/evm_llvm.
   fundamentally, LLVM is architected to target register machines; an
   EVM backend could conceivably be implmented, but it would always
   feel "bolted" on.
3) integration with LLVM would invariably be very complex
4) one advantage of using LLVM is getting multiple backends "for free",
   but in our case, none of the backends we are interested in
   (particularly EVM) have LLVM implementations.

that being said, Venom is close enough to LLVM that it would seem fairly
straightforward to pass "in-and-out" of LLVM, converting to LLVM to take
advantage of its optimization passes and/or analysis utilities, and then
converting back to Venom for final EVM emission, if that becomes
desirable down the line. it could even provided as an "extra" -- if LLVM
is installed on the system and enabled for the build, pass to LLVM for
extra optimization, but otherwise the compiler being self-contained.

for more details about the design and architecture of Venom IR, see
`./vyper/venom/README.md`.

note that this commit specifically focuses on the architecture, design
and implementation of Venom. that is, more focus was spent on
architecting the Venom compiler itself. the Vyper frontend does not emit
Venom natively yet, Venom emission is implemented as a translation step
from the current s-expr based IR to Venom. the translation is not
feature-complete, and may have bugs. that being said, vyper compilation
via Venom is experimentally available by passing the
`--experimental-codegen` flag to vyper on the CLI. incrementally
refactoring the codegen to use Venom instead of the earlier s-expr IR
will be the next area of focus of development.

---------

Co-authored-by: Charles Cooper <[email protected]>

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@charles-cooper charles-cooper changed the title feat: new IR for vyper (venom IR) feat: implement new IR for vyper (venom IR) Nov 30, 2023
Comment on lines +230 to +231
# if self.liveness:
# return f"{s: <30} # {self.liveness}"

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +230 to +231
# if self.liveness:
# return f"{s: <30} # {self.liveness}"

Check notice

Code scanning / CodeQL

Commented-out code

This comment appears to contain commented-out code.
@charles-cooper charles-cooper merged commit cbac5ab into vyperlang:master Dec 1, 2023
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants