Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(avm): encode TS AVM instructions as bytecode, especially for testing #4115

Merged
merged 20 commits into from
Jan 18, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
refresh
dbanks12 committed Jan 17, 2024
commit 177c26ad4212a663a3f3de59ef7a70babf0734d2
18 changes: 9 additions & 9 deletions yellow-paper/docs/public-vm/avm.md
Original file line number Diff line number Diff line change
@@ -22,7 +22,7 @@ A **caller** is a contract call's initiator. The caller of an initial contract c

- [**Public contract bytecode**](#public-contract-bytecode) (aka AVM bytecode)
- [**Execution context**](#execution-context), outlining the AVM's environment and state
- [**Execution**](#execution), outlining control flow, gas tracking, halting, and reverting
- [**Execution**](#execution), outlining control flow, gas tracking, normal halting, and exceptional halting
- [**Initial contract calls**](#initial-contract-calls), outlining the initiation of a contract call from a public execution request
- [**Nested contract calls**](#nested-contract-calls), outlining the initiation of a contract call from an instruction as well as the processing of nested execution results, gas refunds, and state reverts

@@ -120,7 +120,7 @@ WorldState {
contracts: AztecAddress => {bytecode, portalAddress}, // read-only from within AVM
blockHeaders: Vector<BlockHeader>, // read-only from within AVM
publicStorage: (AztecAddress, field) => value, // read/write
l1ToL2Messages: (AztecAddress, field) => message, // read-only from within AVM
l1ToL2Messages: field => message, // read-only from within AVM
l2ToL1Messages: Vector<[field; <msg-length>]>, // append-only (no reads) from within AVM
noteHashes: Vector<field>, // append-only (no reads) from within AVM
nullifiers: Vector<field>, // append-only (no reads) from within AVM
@@ -216,7 +216,7 @@ machineState.daGasLeft = 0

An instruction's gas cost is meant to reflect the computational cost of generating a proof of its correct execution. For some instructions, this computational cost changes based on inputs. Here are some examples and important notes:
- [`JUMP`](./instruction-set/#isa-section-jump) is an example of an instruction with constant gas cost. Regardless of its inputs, the instruction always incurs the same `l1GasCost`, `l2GasCost`, and `daGasCost`.
- The [`SET`](./instruction-set/#isa-section-set) instruction operates on a different sized constant (based on its `dst-type`). Therefore, this instruction's gas cost increases with the size of its input.
- The [`SET`](./instruction-set/#isa-section-set) instruction operates on a different sized constant (based on its `dstTag`). Therefore, this instruction's gas cost increases with the size of its input.
- Instructions that operate on a data range of a specified "size" scale in cost with that size. An example of this is the [`CALLDATACOPY`](./instruction-set/#isa-section-calldatacopy) argument which copies `copySize` words from `environment.calldata` to `machineState.memory`.
- The [`CALL`](./instruction-set/#isa-section-call)/[`STATICCALL`](./instruction-set/#isa-section-call)/`DELEGATECALL` instruction's gas cost is determined by its `*Gas` arguments, but any gas unused by the nested contract call's execution is refunded after its completion ([more on this later](#updating-the-calling-context-after-nested-call-halts)).
- An instruction with "offset" arguments (like [`ADD`](./instruction-set/#isa-section-add) and many others), has increased cost for each offset argument that is flagged as "indirect".
@@ -225,11 +225,11 @@ An instruction's gas cost is meant to reflect the computational cost of generati
> An instruction's gas cost takes into account the costs of associated downstream computations. An instruction that triggers accesses to the public data tree (`SLOAD`/`SSTORE`) incurs a cost that accounts for state access validation in later circuits (public kernel or rollup). A contract call instruction (`CALL`/`STATICCALL`/`DELEGATECALL`) incurs a cost accounting for the nested call's complete execution as well as any work required by the public kernel circuit for this additional call.
## Halting
### Halting

A context's execution can end with a **normal halt** or **exceptional halt**. A halt ends execution within the current context and returns control flow to the calling context.

### Normal halting
#### Normal halting

A normal halt occurs when the VM encounters an explicit halting instruction ([`RETURN`](./instruction-set#isa-section-return) or [`REVERT`](./instruction-set#isa-section-revert)). Such instructions consume gas normally and optionally initialize some output data before finally halting the current context's execution.

@@ -245,7 +245,7 @@ results.output = machineState.memory[instr.args.retOffset:instr.args.retOffset+i
> `results.output` is only relevant when the caller is a contract call itself. In other words, it is only relevant for [nested contract calls](#nested-contract-calls). When an [initial contract call](#initial-contract-calls) (initiated by a public execution request) halts normally, its `results.output` is ignored.
### Exceptional halting
#### Exceptional halting

An exceptional halt is not explicitly triggered by an instruction but instead occurs when an exceptional condition is met.

@@ -351,7 +351,7 @@ context = AvmContext {
worldState = <latest world state>,
journal = INITIAL_JOURNAL,
accruedSubstate = INITIAL_ACCRUED_SUBSTATE,
results = INITIAL_MESSAGE_CALL_RESULTS,
results = INITIAL_CONTRACT_CALL_RESULTS,
}
```
@@ -400,7 +400,7 @@ INITIAL_ACCRUED_SUBSTATE = AccruedSubstate {
unencryptedLogs = [], // initialized as empty
}

INITIAL_MESSAGE_CALL_RESULTS = ContractCallResults {
INITIAL_CONTRACT_CALL_RESULTS = ContractCallResults {
reverted = false,
output = [], // initialized as empty
}
@@ -421,7 +421,7 @@ nestedContext = AvmContext {
worldState: callingContext.worldState,
journal: callingContext.journal,
accruedSubstate: INITIAL_ACCRUED_SUBSTATE,
results: INITIAL_MESSAGE_CALL_RESULTS,
results: INITIAL_CONTRACT_CALL_RESULTS,
}
```
434 changes: 254 additions & 180 deletions yellow-paper/docs/public-vm/gen/_InstructionSet.mdx

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions yellow-paper/docs/public-vm/instruction-set.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Instruction Set

This page lists all of the instructions supported by the Aztec Virtual Machine (AVM).

The following notes are relevant to the table and sections below:
- `M[offset]` notation is shorthand for `context.machineState.memory[offset]`
- Any instruction whose description does not mention a program counter change simply increments it: `context.machineState.pc++`
- All instructions update `context.machineState.*GasLeft` as detailed in ["Gas limits and tracking"](./avm#gas-limits-and-tracking)
- Any instruction can lead to an exceptional halt as specified in ["Exceptional halting"](./avm#exceptional-halting)

import GeneratedInstructionSet from "./gen/_InstructionSet.mdx";

<GeneratedInstructionSet />
20 changes: 10 additions & 10 deletions yellow-paper/docs/public-vm/state-model.md
Original file line number Diff line number Diff line change
@@ -52,11 +52,11 @@ Memory addresses must be tagged to be a `u32` type.
- `M[X]`: main memory cell at offset `X`
- `tag`: a value referring to a memory cell's type (its maximum potential value)
- `T[X]`: the tag associated with memory cell at offset `X`
- `in-tag`: an instruction's tag to check input operands against. Present for many but not all instructions.
- `dst-tag`: the target type of a `CAST` instruction, also used to tag the destination memory cell
- `ADD<X>`: shorthand for an `ADD` instruction with `in-tag = X`
- `ADD<X> aOffset bOffset dstOffset`: an full `ADD` instruction with `in-tag = X`. See [here](./instruction-set#isa-section-add) for more details.
- `CAST<X>`: a `CAST` instruction with `dst-tag`: `X`. `CAST` is the only instruction with a `dst-tag`. See [here](./instruction-set#isa-section-cast) for more details.
- `inTag`: an instruction's tag to check input operands against. Present for many but not all instructions.
- `dstTag`: the target type of a `CAST` instruction, also used to tag the destination memory cell
- `ADD<X>`: shorthand for an `ADD` instruction with `inTag = X`
- `ADD<X> aOffset bOffset dstOffset`: an full `ADD` instruction with `inTag = X`. See [here](./instruction-set#isa-section-add) for more details.
- `CAST<X>`: a `CAST` instruction with `dstTag`: `X`. `CAST` is the only instruction with a `dstTag`. See [here](./instruction-set#isa-section-cast) for more details.

### Tags and tagged memory

@@ -80,7 +80,7 @@ The purpose of a tag is to inform the VM of the maximum possible length of an op

#### Checking input operand tags

Many AVM instructions explicitly operate over range-constrained input parameters (e.g. `ADD<in-tag>`). The maximum allowable value for an instruction's input parameters is defined via an `in-tag` (instruction/input tag). Two potential scenarios result:
Many AVM instructions explicitly operate over range-constrained input parameters (e.g. `ADD<inTag>`). The maximum allowable value for an instruction's input parameters is defined via an `inTag` (instruction/input tag). Two potential scenarios result:

1. A VM instruction's tag value matches the input parameter tag values
2. A VM instruction's tag value does _not_ match the input parameter tag values
@@ -95,8 +95,8 @@ It is required that all VM instructions that write into main memory explicitly d

```
# ADD<u32> aOffset bOffset dstOffset
assert T[aOffset] == T[bOffset] == u32 // check inputs against in-tag, revert on mismatch
T[dstOffset] = u32 // tag destination with in-tag
assert T[aOffset] == T[bOffset] == u32 // check inputs against inTag, revert on mismatch
T[dstOffset] = u32 // tag destination with inTag
M[dstOffset] = M[aOffset] + M[bOffset] // perform the addition
```

@@ -110,7 +110,7 @@ T[dstOffset] = T[srcOffset] // preserve tag
M[dstOffset] = M[srcOffset] // perform the move
```

Note that `MOV` does not have an `in-tag` and therefore does not need to make any assertions regarding the source memory cell's type.
Note that `MOV` does not have an `inTag` and therefore does not need to make any assertions regarding the source memory cell's type.

#### `CAST` and tag conversions

@@ -125,7 +125,7 @@ Case 2 is trivial as no additional consistency checks must be performed between

```
# CAST<u64> srcOffset dstOffset
T[dstOffset] = u64 // tag destination with dst-tag
T[dstOffset] = u64 // tag destination with dstTag
M[dstOffset] = cast<to: u64>(M[srcOffset]) // perform cast
```

291 changes: 94 additions & 197 deletions yellow-paper/src/preprocess/InstructionSet/InstructionSet.js

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion yellow-paper/src/preprocess/InstructionSet/genMarkdown.js
Original file line number Diff line number Diff line change
@@ -86,9 +86,10 @@ function markdownInstructionSetSection(pathToGenDir) {
for (let i = 0; i < INSTRUCTION_SET.length; i++) {
const instr = INSTRUCTION_SET[i];
const name = instr['Name'];
let subsection = `### <a id='isa-section-${instr['id']}'/>${name} (${toOpcode(i)})\n`;
let subsection = `### <a id='isa-section-${instr['id']}'/>${name}\n`;
subsection += `${instr['Summary']}\n\n`;
subsection += `[See in table.](#isa-table-${instr['id']})\n\n`;
subsection += `- **Opcode**: ${toOpcode(i)}\n`;
for (let t = 0; t < TOPICS_IN_SECTIONS.length; t++) {
const topic = TOPICS_IN_SECTIONS[t];
let field = instr[topic];