Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(yellowpaper): finish AVM Context definitions #3709

Merged
merged 3 commits into from
Dec 15, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 84 additions & 2 deletions yellow-paper/docs/public-vm/avm.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ A contract's public bytecode is a series of execution instructions for the AVM.
Many terms and definitions here are borrowed from the [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf).
:::

An "Execution Context" includes the information necessary to trigger AVM execution along with the state maintained by the AVM throughout execution:
An "Execution Context" includes the information necessary to initiate AVM execution along with the state maintained by the AVM throughout execution:
```
AVMContext {
environment: ExecutionEnvironment,
Expand All @@ -51,12 +51,13 @@ AVMContext {

The first two entries, "Execution Environment" and "Machine State", share the same lifecycle. They contain information pertaining to a single message call and are initialized prior to the start of a call's execution.

> When a nested message call is made, a new environment and machine state are initialized by the caller. In other words, a nested message call has its own environment and machine state, although their initialization will be partially derived from the caller's context.
> When a nested message call is made, a new environment and machine state are initialized by the caller. In other words, a nested message call has its own environment and machine state which are _partially_ derived from the caller's context.

The "Execution Environment" is fully specified by a message call's execution agent and remains constant throughout a call's execution.
```
ExecutionEnvironment {
address,
storageAddress,
origin,
l1GasPrice,
l2GasPrice,
Expand All @@ -68,6 +69,7 @@ ExecutionEnvironment {
globalVariables: PublicGlobalVariables,
messageCallDepth,
isStaticCall,
isDelegateCall,
dbanks12 marked this conversation as resolved.
Show resolved Hide resolved
}
```

Expand All @@ -77,6 +79,86 @@ MachineState {
l1GasLeft,
l2GasLeft,
pc,
memory: offset => value, // uninitialized at start
dbanks12 marked this conversation as resolved.
Show resolved Hide resolved
}
```

"World State" contains persistable VM state. If a message call succeeds, its world state updates are applied to the calling context (whether that be a parent call's context or the transaction context). If a message call fails, its world state updates are rejected by its caller. When a _transaction_ succeeds, its world state updates persist into future transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The world state structure here is a memory snapshot of persistent data. Could you mention in this lifecycle when the snapshot is made, where modifications happen in memory and when the memory modifications are executed persistently? Maybe no need to describe all details on how we prevent conflicts, etc, but just the model when read and write are performed persistently vs. in memory.

Re-reading what you wrote, I realize it is actually pretty good. Maybe enhancing the above text with explicit mentions about memory changes vs persistent would be great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanmon I don't totally follow 🤔 World State is separate from memory. Something like SSTORE takes a word from memory and stores it into the world state. If a message call fails, any such modifications it made will be rolled back.

So when a caller encounters a nested call, it will take a world state snapshot. If the message call reverts, the caller will reject its world state updates by rolling back to that snapshot.

The circuit doesn't really care about any of this. The circuit just has a storage access trace. Even a reverted message call will have access trace entries for its SSTOREs, but the entire message call will be flagged as reverted, so the kernel knows not to consider those updates for when processing later reads/writes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshots of world state will not be stored in VM memory. In fact they won't even be handled by the VM circuit at all. They are really a higher-level concept that the AVM simulator will use to populate the state traces for each call. And the kernel or rollup circuits will need to know that when a revert happens it shouldn't consider any reverted state updates when validating state accesses that happen later (with higher sideEffectCounter).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I used the word "memory" I meant a memory snapshot of the world state not the memory space of the VM circuit. Sorry for the misunderstanding, the discussion deviated far beyond the original scope. My point was to emphasize a bit more when the memory snapshot is made and when it is written persistently.
Let us align on a call.

```
WorldState {
publicStorage: (address, slot) => value, // read/write
noteHashes: (address, index) => noteHash, // read & append only
nullifiers: (address, index) => nullifier, // read & append only
l1l2messageHashes: (address, key) => messageHash, // read only
contracts: (address) => bytecode, // read only
}
```

> Note: the notation `key => value` describes a mapping from `key` to `value`.

> Note: each member of the world state is implemented as an independent merkle tree with different properties.

The "Accrued Substate", as coined in the [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper), contains information that is accrued throughout transaction execution to be "acted upon immediately following the transaction." These are append-only arrays containing state that is not relevant to other calls or transactions. Similar to world state, if a message call succeeds, its substate is appended to its calling context, but if it fails its substate is dropped by its caller.
```
AccruedSubstate {
logs: [], // append-only
l2toL1Messages: [], // append-only
}
```

Finally, when a message call halts, it sets the context's "Message Call Results" to communicate results to the caller.
```
MessageCallResults {
reverted: boolean,
output: [] | undefined,
}
```

### Context initialization for initial call
This section outlines AVM context initialization specifically for a **public execution request's initial message call** (_i.e._ not a nested message call). Context initialization for nested message calls will be explained in a later section.

When AVM execution is initiated for a public execution request, the AVM context is initialized as follows:
```
context = AVMContext {
environment: INITIAL_EXECUTION_ENVIRONMENT,
machineState: INITIAL_MACHINE_STATE,
accruedSubstate: empty,
worldState: <latest world state>,
results: INITIAL_MESSAGE_CALL_RESULTS,
}
```
> Note: Since world state persists between transactions, the latest state is injected into a new AVM context.

Given a `PublicCallRequest` and its parent `TxRequest`, these above-listed "`INITIAL_*`" entries are defined as follows:
```
INITIAL_EXECUTION_ENVIRONMENT = ExecutionEnvironment {
address: PublicCallRequest.contractAddress,
storageAddress: PublicCallRequest.CallContext.storageContractAddress,
origin: TxRequest.origin,
l1GasPrice: TxRequest.l1GasPrice,
l2GasPrice: TxRequest.l2GasPrice,
calldata: PublicCallRequest.args,
sender: PublicCallRequest.CallContext.msgSender,
portal: PublicCallRequest.CallContext.portalContractAddress,
bytecode: worldState.contracts[PublicCallRequest.contractAddress],
blockHeader: <latest block header>,
globalVariables: <latest global variable values>
messageCallDepth: 0,
isStaticCall: PublicCallRequest.CallContext.isStaticCall,
isDelegateCall: PublicCallRequest.CallContext.isDelegateCall,
}

INITIAL_MACHINE_STATE = MachineState {
l1GasLeft: TxRequest.l1GasLimit,
l2GasLeft: TxRequest.l2GasLimit,
pc: 0,
memory: uninitialized,
}

INITIAL_MESSAGE_CALL_RESULTS = MessageCallResults {
reverted: false,
output: undefined,
}
```

> Note: unlike memory in the Ethereum Virtual Machine, uninitialized memory in the AVM is not readable! A memory cell must be written (and therefore [type-tagged](./state-model#types-and-tagged-memory)) before it can be read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the VM circuit I can constrain that memory is initialized with 0 and flag type 0, so that any initial read at memory location (on which we did not write to) returns to zero. If Noir compiler can ensure that we never read from uninitialized memory, we can simply constrain the memory trace that the first memory operation per address is always a write. My point is that we can support both ways. We should just be on the same page with the noir team.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag type 0 means uninitialized, and I thought that any "loads" from an uninitialized cell would fail. Is that true? Could we get rid of the concept of uninitialized and just say that all cells start tagged as field and initialized to 0?

Copy link
Contributor

@jeanmon jeanmon Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enforcing type field initialized with 0 is a bad trade-off. I would consider 2 options:

  1. Forbid any uninitialized read
  2. Allow read on memory of "uninitialized" words which work with any type and read 0

In the current VM circuit implementation, variant 2) is implemented.
My point is that the circuit should not have any difficulty to support these variants. That's why I would check with Noir team what is preferable.