I provide this annotated version of the Ethereum yellow paper as a tool to help onboard engineers onto Ethereum development. Some sections of the original paper would be clearer with further context and/or with more practical examples.
The format follows the sections as written in the original paper, but with more explanations around equations, and more practical examples around complicated concepts. This annotated version hopes to provide a single document that is able to provide a good overview of the Ethereum protocol minus the hard work in understanding the equations and subtleties present.
Ethereum, taken as a whole, can be viewed as a transaction-based state machine: we begin with a genesis state and incrementally execute transactions to morph it into some current state.
Figure 1: A visual representation of the transaction transition function
Mathematically:
In Ethereum,
$\Upsilon$ , together with$\boldsymbol{\sigma}$ are considerably more powerful than any existing comparable system;$\Upsilon$ allows components to carry out arbitrary computation, while$\boldsymbol{\sigma}$ allows components to store arbitrary state between transactions.
Ethereum is a general purpose computer - hence
- Simple value transfer - these are, as the name suggest, simple value transfers between two addresses
- Smart contract execution - these transactions execute code that lives on the blockchain. The code may hold state - as such - through the execution of the associated code, the develop can mutate the global state
$\boldsymbol{\sigma}$ in any way the develop wishes.
Additionally,
- Names
- Counters
- Mappings
- Custom data structures
** On top of a list of all the account states (more on this later).
Transactions are collated into blocks; blocks are chained together using a cryptographic hash as a means of reference
Figure 2: Block contents and how blocks are chained together
Blocks are split into 3 sections.
- The block header -
$B_H$ - The transaction list -
$B_T$ - The Ommer list -
$B_U$
The block header,
Figure 3: Orphaned blocks and blockchains of various lengths
Even though Block #1 has 3 children (the red, blue, and green block #2), only red block #2 is followed by more blocks. In essence, blue block #2 and green block #2 have been abandoned. Any transaction contained in them is invalid (can be ignored). Miners will always choose the path with the greatest amount of blocks (most work done) to work on (to append their blocks to).
… They (blocks) also punctuate the transaction series with incentives for nodes to mine. This incentivisation takes place as a state-transition function, adding value to a nominated account.
For one’s transaction to end up being mined (put in a block and appended to the longest path) a miner must pick up that transaction and choose to include it in its block. There are many more transactors than miners. Hence, transactors can include a fee to be paid to the miner. This incentivises the miner to include that specific transaction. The bigger the fee, the more likely that more miners will include your transaction, hence, the faster your transaction will end up being mined. (Note that it is only at the point of the block being appended (mined) that the miner receives your transaction. If there are 5 miners working on your transaction, only the miner that succeeds in mining his block will receive your fee. )
The paper gives the following equations to formally define the blockchain paradigm.
(2)
(3)
A block contains
- A list of transactions
$T_0, T_1, ...$ - A block header
$B_H$ - A list of ommer block headers
$B_U$
(4)
The Ethereum block level state transition function is defined by recursively applying the Ethereum transaction state transition function to all the transactions contained in the block (starting with the initial state, and the first transaction). Finally, the resulting state is fed into
This (equations (2), (3), and (4)) is the basis of the blockchain paradigm, a model that forms the backbone of not only Ethereum, but all decentralised consensus-based transaction systems to date
The following sub-denominations of Ether exist:
Multiplier | Name |
---|---|
Wei | |
Szabo | |
Finney | |
Ether |
Transactions are - at their lowest level - always sending a value quoted in Wei. When someone sends 1 Ether, they are sending
The system is decentralised. Participants to the protocol may choose different paths from root to leaf to consider the canonical blockchain. Consider Figure 4
Figure 4: Participants may choose to follow different paths from “Genesis Block” to “Block #6” when working out the system state. The path each user takes is their canonical blockchain and is their view on what the state of the system may be.
User A may choose to follow the path starting from “Genesis Block” and ending at the “Red Block #6”, whereas user B may choose to follow the path starting from “Genesis Block”, but ending at “Green Block #6” - the path a user chooses is their canonical blockchain. All of the assets (coins, NFTs, contracts etc…) prior to the mining of block #4 will exist on both forks. However, the different transactions resulting from the 2 different forks, will interact differently with those existing assets, leading to a completely different state of the blockchain. The paper concludes Section 2.2 by giving various checkpoints - in terms of block number - at which protocol changes have been made. Hence, when travelling through the history of the canonical Ethereum blockchain, one should consult this table to understand how transactions were validated, and what additional information has been added/removed. For example, the GrayGlacier protocol change (activated at block number 15050000) delayed the Difficulty bomb by pushing it back 700 000 blocks (roughly 100 days). The difficulty bomb is a mechanic added to incentivise the Ethereum blockchain to move from Proof-of-Work to Proof-of-Stake (by making it incredibly computationally heavy to mine any additional transactions once the difficulty bomb kicks in). By pushing it back 700 000 blocks, the Ethereum developers gave themselves additional time to complete the work related to migrating the blockchain to Proof-of-Stake. Note that protocol changes cannot be enforced - nodes/miners may choose to ignore protocol changes. If a certain node/miner decides to ignore a protocol change then chances are blocks will appear invalid to the rest of the protocol-following community and hence the rest of the protocol-following community will follow a different canonical blockchain than the nodes/miners that decided to abstain from changing. This is what the community refers to as a hard fork between two factions.
This section is crucial when it comes to understanding the rest of the paper. It introduces all the mathematical language and slang that will be used throughout the rest of the paper to formally define the Ethereum protocol. Here are the rules
- The paper will make reference to only 2 bold lowercase Greek letters. These symbols represent highly structured, top-level state values (Think tuple or list of tuples)
-
$\boldsymbol{\sigma}$ - the World State - introduced in Section 4.1 (For now, think of this as a list of tuples. The individual tuple is the account tuple - contains things such as the nonce of the account, the balance etc…) -
$\boldsymbol{\mu}$ - the Machine State - introduced in Section 9.4.1 (For now, think of this as a tuple. It contains things such as the program counter, the available gas for execution etc…)
-
- The paper will use upper-case Greek letters (
$\Upsilon$ - the Ethereum state transition function) when referring to functions that operate on highly structured values) - The paper will use uppercase letters (non Greek) when referring to most functions (
$C$ - the general cost function (will return the gas cost, in Wei, for executing certain instructions in the Ethereum ecosystem)). Note that some functions might be subscripted to denote specialised variants. - The paper will use typewriter text for externally defined functions ( KEC - the KEC-256 function) - Note that this annotated version will just use plain text, but aim to explain all equations/flows as they occur.
- The paper will use uppercase letters to denote tuples:
$T$ represents a transaction. Subscripting tuples will return a value stored in the tuple.$T_n$ will return the nonce associated with a transaction (the nonce value in the transaction tuple) - The paper will use normal lower-case letters to denote scalars (plain values) and/or arrays. (Special values/arrays will be represented by lower-case greek letters). Scalars will always be non-negative integers. (In mathematical notation this is equivalent to saying that any value
$v \in \mathbb{N}$ ). Additionally, we can say$v \in \mathbb{N}_{256}$ if the scalar value $v$ is a positive number less than$2^{256}$ . We can also say$o \in \mathbb{B}_{32}$ if the array (byte array) given by$o$ has exactly 32 bytes. - The paper will use bold lower-case letters to denote arbitrary-length arrays (arrays whose size is determined by executing code or just general input). For example
$\boldsymbol{o}$ is used to represent the output data of a message call - This is not known before, as it’s a function of the input. - The paper will use square brackets to index into arrays:
$\boldsymbol{o}[0]$ represents the first byte in the$o$ array. The paper also allows for range notations - note that this is an inclusive range:$o[0..31]$ represents the first 32 bytes of the byte array given by$o$ . The paper mentions a single special case for the the World State$\boldsymbol{\sigma}$ - one can do$\boldsymbol{\sigma}[a]$ to refer to the account tuple corresponding to the address$a$ . - The paper also mentions that given a value
$\square$ , intermediate values (i.e. values that result from performing computation on$\square$ - but insignificant enough such as to NOT warrant a symbol of their own) may be denoted by$\square^{'}$ ,$\square^{''}$ ,$\square^{'''}$ . The paper also mentions that on certain occasions, it may denote intermediary values by alpha-numeric subscripts:$\square_{1}$ ,$\square_{2}$ etc… - The paper also mentions that given a function
$f$ we may denote$f^{\star}$ as the element-wise version of the function. This is slightly confusing. Here is some code to illustrate this notion. (Note that$f^{\star}$ is simply the same function but defined for arrays)
def f(a: int) -> int:
a * 3
def f_star(list_of_a: list[int]) -> list[int]:
[f(a) for a in list_of_a]
The world state (state), is a mapping between addresses (160-bit identifiers
$\equiv$ 20 bytes) and account states
Address = Bytes20
class Account:
"""
State associated with an address.
"""
nonce: Uint
balance: U256
code: bytes
world_state: Mapping(Address, Account)
…it is assumed that the implementation will maintain this mapping in a modified Merkle Patricia tree
Let’s take a quick detour here and explore the Merkle Patricia tree (Here and here for some wonderful breakdowns) We aim to have a quick look - just to understand why they are useful and very briefly - how they work. Think of a Merkle Patricia tree as an implementation of a map. For example, java.util.HashMap
is an implementation of the abstract java.util.AbstractMap
. One could think of the Merkle Patricia trees as another implementation of the java.util.AbstractMap
with certain properties that prove particularly advantageous for the purpose of building a blockchain.
Let’s assume we want to represent the following mapping
my_mapping: Mapping(str, int) = { "Ant": 1, "Apple": 2, "Cold": 3, "Car": 4, "Cat": 5}
Visually it will look like this
Figure 5 - A visual representation of the tree holding the mapping { "Ant": 1, "Apple": 2, "Cold": 3, "Car": 4, "Cat": 5}
Leaves contain the hash of the value they store, and nodes contain the hash of the node to their left and the node to their right. As a result, passing the root hash (h9 in this example) is enough to verify the entire mapping. That is not to say that one can derive the mapping from the root hash, but if two people have the underlying mapping, agreeing on the root hash is enough for both parties to trust that the other does indeed have the same mapping they do (it is extremely unlikely two different mappings share the same hash). Changing any of the values of the underlying mapping, will make the leaf representing the mapping change the hash it contains. That will propagate upwards until the root hash itself will change. Here is a visualisation of the change propagating upwards when we change the value of Cat to 10.
Figure 6 - A visual representation of the changes propagating to the root node
Note that it also makes sense to store lists in Merkle Patricia trees. The only difference with the above is that the value would be the position in the list. Consider the following list
my_list = ["Ant", "Apple", "Cold", "Car", "Cat"]
We can convert the list to a map where each value points to its position (note that we consider the first index to be 1 - conveniently arriving at the same mapping that figure 5 shows)
my_list_mapping = {"Ant": 1, "Apple": 2, "Cold": 3, "Car": 4, "Cat": 5}
Hence, if two nodes agree on the root hash of an encoded list, they can be confident that not only does the other party have all the elements they do, but in the same order.
Let’s talk specifics when it comes to advantages:
- If one represents the world state (a mapping between account addresses and account states) as a Merkle Patricia tree, then it is trivial to verify that all participating nodes agree on the current world state. They must only verify that they all agree on the root hash. (The similar tangent to this is that sending mappings between two parties is very efficient - a fixed size bit array (representing the root hash) is passed along for arbitrary size mappings - of course the disadvantage is that you cannot derive the underlying mapping yourself).
- Alice wants to verify that a transaction T has indeed been included in a block. (The block header contains the root hash of a Merkle Patricia tree encoding the list of transactions present in the block -
$H^t$ ). For her to verify the transaction’s inclusion in the block, she does not need to fetch all of the transactions and compute the root hash - she must only fetch$log(N)$ transactions and work her way up the tree to verify. Visually:
Alice wants to verify her transaction - whose hash is h1 (she knows this hash) - has been included in the given block (she also knows the transaction root,
** Here is another brilliant visualisation of verifying transaction inclusion
The only way to compute the current world state (in a trustless manner) is to follow all transactions from the genesis block to the tip of the blockchain. Nodes/miners store the world state (which is only a mapping as discussed earlier) in a local database. When new transactions arrive nodes and miners can confirm the validity by checking their local database. Upon confirmation, the local database will be updated to reflect the new world state.
The paper formally defines what the account state
- Nonce
A scalar value equal to the number of transactions sent from this address or, in the case of accounts with associated code, the number of contract-creations made by this account. For account of address
$a$ in state$\boldsymbol{\sigma}$ , this would be formally denoted$\boldsymbol{\sigma}[a]_n$
Simply put, the number of transactions an account has issued. (Transactions can send value from one account to the other, they can interact with smart-contracts, and/or they can create contracts).
Smart contracts (accounts with associated code in the words of the paper) have a nonce equal to the number of contracts deployed. If a human account sends a transaction to a smart-contract that interacts with another smart-contract, the nonce of the smart-contract account is kept the same. Only contract-creating transactions increment the nonce of contract accounts. (Contracts only interact with other contracts through message-calls; these message-calls do not change the nonce).
- Balance
A scalar value equal to the number of Wei owned by this address. Formally denoted
$\boldsymbol{\sigma}[a]_b$
- Storage Root
A 256-bit hash of the root node of a Merkle Patricia tree that encodes the storage contents of the account (a mapping between 256-bit integer values), encoded into the trie as a mapping from the Keccak 256-bit hash of the 256-bit integer keys to the RLP-encoded 256-bit integer values. The hash is formally denoted
$\boldsymbol{\sigma}[a]_s$
The storage root is only relevant for contract accounts. Human accounts have as the default value the hash of the empty trie for the storage root. Contract account creation will be discussed in Section 7, but it is important to understand that contracts can have persistent storage. The persistent storage model is represented by an array where each element is 256 bits - note that the array size is not defined (users can store as much persistent contents as they choose - only restrained by the associated gas costs - will be discussed later). Visually the persistent storage model looks as follows
Figure 8 - The persistent storage model is a mapping between an index and a value.
Values are packed together, and the next value that’d not fit into the previous block, will start a new block. For example
Figure 9
The storage root
Figure 10
The key is the hash of the index and the value is the RLP encoding of the byte sequence located at the memory location corresponding to the key. (At index 0, we’d pass the int128 bit array concatenated with the int64 bit array padded with 0’s to make the 256 bit array or 32 byte array value at memory location 0).
RLP takes as input a byte array and spits out a byte array. RLP has been developed specifically for Ethereum. The reason for its use is that its implementation is simple and consistent. These two qualities arise from the fact that its input is always treated as a byte array regardless of what that byte array might mean. For example, if a list of numbers has the same byte representation as a list of booleans, then the output will be the same. This is not usually the case with other encoding as the types of the inputs are usually taken into consideration.
Figure 11
- Code Hash
The hash of the Ethereum Virtual Machine (EVM) code of this account—this is the code that gets executed should this address receive a message call. All such code fragments are contained in the state database under their corresponding hashes for later retrieval. This hash is formally denoted
$\boldsymbol{\sigma}[a]_c$ , and thus the code may be denoted as$b$ , given that$KEC(b) = \boldsymbol{\sigma}[a]_c$
Human accounts do not have any code, and hence they store the hash of the empty byte array as their code hash. Smart-contracts have the code that runs every time an interaction occurs as their code hash. Note that the hashing function used is the Keccak256 hashing function. Note also the subtlety the paper mentions. “All such code fragments are contained in the state database under their corresponding hashes for later retrieval”. We mentioned earlier that the world state is stored in a local database. We know that an account (which is part of the world state) contains a code hash
A brilliant visualisation of the world state and the backend databases required is given in figure 12.
Figure 12 - A visualisation of the world state and the associated backend databases needed to hold the world state. Adapted from Ethereum EVM illustrated by Takenobu T. See here
The state consists of many accounts. The mapping between the 160 bit address identifiers (also known as the public key) to the account state is held in a database for every running node. The storage hash and code hash (not relevant for non-contract accounts) are hashes which themselves represent keys in another set of key-value storage means. The values represent the storage of the account and the code respectively.
Section 4.1 concludes with some definitions and mathematical notations that will help explain trickier concepts later down the paper. Let’s examine them:
(7)
To understand equation 7, we must understand equation 8, as
(8)
What equation 8 tells us is that
storage = {0: 12, 1: 13, 2: 100}
Then we’d actually be feeding
storage = [(0,12),(1,13),(2,100)]
Into
[(KEC(0), RLP(12)),(KEC(1), RLP(13)),(KEC(2), RLP(100)]
Finally, we feed this into
{
KEC(0): RLP(12),
KEC(1): RLP(13),
KEC(2): RLP(100
}
If the codeHash field is the Keccak-256 hash of the empty string, i.e.
$\boldsymbol{\sigma}[a]_c = KEC(())$ , then the node represents a simple account, sometimes referred to as a “non-contract” account … we may define a world-state collapse function$L_S$ :
(10)
Again, to understand equation 10, we must understand equation 11
(11)
So
(12)
Equation 12 tells us that
- Is not a 20 byte array
$a \in \mathbb{B}_{20}$ OR - Is not valid
$v(\boldsymbol{\sigma}[a])$
Then
The next question to ask is: What does it mean for an account to be valid? Equation 13 helps us understand this exact question
(13)
We know
- The nonce is a positive integer less than
$2^{256}$ - The balance is a positive integer less than
$2^{256}$ - The storage root is a byte array of 32 bytes exactly
- The code hash is a byte array of 32 bytes exactly
Hence the validity function is quite dumb in that the only thing it checks is that the format is correct.
An account is empty when it has no code, zero nonce and zero balance
(14)
Even callable precompiled contracts can have an empty account state. This is because their account states do not usually contain the code describing its behavior
I’ll refer to here for an excellent explanation of the above statement. In words, precompiled contracts are contracts that exist virtually. They have not been deployed like other normal contracts, but when calls are made to them, the implementation understands what to do. As such, because they have not been deployed the address associated with them is an
An account is dead when its account state is non-existent or empty
(15)
As of EIP-161, no transaction will finalise with SUICIDE
with the beneficiary of the SUICIDE
being an account that was not in existence. At the end of such transactions, the state contained an account that was
Note the slight caveat here.
A transaction (formally, T ) is a single cryptographically-signed instruction constructed by an actor externally to the scope of Ethereum. The sender of a transaction cannot be a contract. While it is assumed that the ultimate external actor will be human in nature, software tools will be used in its construction and dissemination. EIP-2718 by Zoltu [2020] introduced the notion of different transaction types. As of the Berlin version of the protocol, there are two transaction types: 0 (legacy) and 1 (EIP-2930 by Buterin and Swende [2020b]). Further, there are two subtypes of transactions: those which result in message calls and those which result in the creation of new accounts with associated code (known informally as ‘contract creation’)
Although EIP-2718 introduced the concept of different transaction types, only 2 exist at the moment of writing. There is the legacy transaction of type 0, and the EIP-2930 type 1 transaction.
The transaction tuple
- Type -
$T_x \in {0, 1}$ - Nonce -
$T_n$ equal to the sender’s nonce -$T_n \in \mathbb{N}_{256}$ - Gas Price -
$T_p$ - amount paid in$Wei$ per unit of gas -$T_p \in \mathbb{N}_{256}$ - Gas Limit -
$T_g$ - maximum amount of gas to be used by tx - Remember that Ethereum is Turing complete (we cannot know when execution will stop / or if it will stop). The maximum gas price in$Wei$ will hence be$gas * T_p$ and$gas \leq T_g$ -$T_g \in \mathbb{N}_{256}$ - To -
$T_t$ - note that$T_t = \emptyset$ if the transaction is a contract-creating transaction -$T_t \in \mathbb{B}_{20}$ if$T_t \neq \emptyset$ , otherwise,$T_t \in \mathbb{B}_0$ - Value -
$T_v$ - note that in the case of a contract-creating transaction, the contract will contain this value of$Wei$ after (and only if) it has been initated -$T_v \in \mathbb{N}_{256}$ - r, s -
$T_r , T_s$ - Used to verify the transaction has been signed correctly - $T_r \in \mathbb{N}{256} \space \land T_s \in \mathbb{N}{256}$
EIP-2930 (type 1) transactions also have
- Access List -$T_A$ - a list of elements of form
$E \equiv (E_a, E_s)$ - Accounts$E_a$ and storage keys$E_s$ that are part of this list at the beginning of the transaction will allow execution of certain opcodes to execute at a discount. (This is because clients can load the addresses and storage keys before, as opposed to cold fetching them during execution - this is a very heavy IO operation). To understand this better, consider Figure 9. A storage key is nothing more than the index at which a variable resides. If I wanted to efficiently access the int64 from storage key 0, I’d provide in the access list as$E_a$ the address of the contract in question, and for$E_s$ the list of size 1 containing the single storage key 0 (encoded as a 32 byte value). I can do this operation to produce a list of$E \equiv (E_a, E_s)$ for all the contracts I want to access and all the storage keys that I want to access per contract. Here is the type annotation for the access list:
access_list: list[tuple[Address, list[Bytes32]]]
- Chain Id -
$T_c$ - y Parity -
$T_y \in \mathbb{N}_1$
Legacy transactions do not have an accessList (
$T_A$ = ()), while chainId and yParity for legacy transactions are combined into a single value:
- w -
$T_w \in \mathbb{N}_{256}$
Additionally, a contract creation transaction (regardless whether legacy or EIP-2930) contains:
- init -
$T_i$ - EVM-code that gets executed once (during contract creation).$T_i$ gets discarded after the contract has been created - but$T_i$ should return another fragment of EVM-code that gets executed every time the contract receives a transaction. In other words,$T_i$ is the constructor code responsible for returning all the code that correctly routes function calls to the appropiate code -$T_i \in \mathbb{B}$
In contrast, a message call transaction contains:
- data -
$T_d$ - unlimited size byte array specifying the input data -$T_d \in \mathbb{B}$
The block in Ethereum is the collection of relevant pieces of information (known as the block header),
$H$ , together with information corresponding to the comprised transactions,$T$ , and a set of other block headers$U$ that are known to have a parent equal to the present block’s parent’s parent (such blocks are known as ommers).
Mathematically, we describe the block as
(21)
Visually, we have
Figure 13 - Note that ommer is just a gender-neutral term. The set of qualifying blocks can also be referred to as uncles
In terms of the ommer blocks, there is no better explanation than a diagram
Figure 14 - A diagram explaining the relationship between a block and its ommer list
Note that not all blocks that satisfy this condition must be set in the ommer list. The block creator is incentivised to include at most 2 such block headers as each such block header rewards the miner. The decision to include block headers that satisfy this condition has been made to compensate miners that end up mining at the same time. (This is a more common occurrence in Ethereum due to the shorter block time. Note that including ommer blocks also has the added benefit that up to 3 worth of work done can be included at the same level (1 main block and 2 ommers)).
The block header contains the following fields:
-
Parent Hash -
$H_p$ The Keccak 256-bit hash of the parent block’s header, in its entirety
A block header is nothing more than a tuple whose keys are described in this section (One can obtain the
$KEC$ of a tuple). Every block contains the$H_p$ to correctly identify a block’s parent. -
Ommers Hash -
$H_o$ The Keccak 256-bit hash of the ommers list portion of this block
As mentioned before, miners are incentivised to include ommer blocks in their block as they receive rewards for doing so. One can obtain the
$RLP$ of a list, and$H_o$ is the$KEC$ of the$RLP$ of the list of ommer block headers. -
Beneficiary -
$H_c$ The 160-bit address to which all fees collected from the successful mining of this block be transferred
This is the miner of the block - It might be that the miner is not the beneficiary if the miner is part of a pool. In this case, the beneficiary would be the master wallet of the pool.
-
State Root -
$H_r$ The Keccak 256-bit hash of the root node of the state trie, after all transactions are
executed and finalisations applied
Remember that
$\boldsymbol{\sigma}$ represents the world state. This is equivalent to saying that$\boldsymbol{\sigma}$ is the state at the end of the last block (after all the transactions in the last block have been executed).$\boldsymbol{\sigma}$ is the$H_r$ of the current last block -
Transactions Root -
$H_t$ The Keccak 256-bit hash of the root node of the trie structure populated with each
transaction in the transactions list portion of the block
Remember that we can construct tries from lists by considering the key to be the position in the list.
-
Receipts Root -
$H_e$ The Keccak 256-bit hash of the root node of the trie structure populated with the receipts of each transaction in the transactions list portion of the block
We will talk about receipts in more detail in section 4.3.1. For now, it suffices to say that every transaction will generate a transaction receipt - this contains useful information about the state changes and the events a transaction has generated.
-
Logs Bloom -
$H_b$ The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list
We will discuss bloom filters at length in section 4.3.1. For now, it suffices to say that
$H_b$ is present in the block header to allow client applications to easily search for events (across the entire history of the blockchain). Remember how events look like in solidityevent Transfer(address from, address to, uint256 value)
-
Difficulty -
$H_d$ A scalar value corresponding to the difficulty level of this block. This can be calculated from the previous block’s difficulty level and the timestamp
The Proof-of-Work scheme incorporated into the blockchain can be summarised as trying to find a value that when combined with the contents of the block will be hashed to a value less than a certain other value. The smaller the value you want to find the hash to be less than, the more work you need to put in.
$H_d$ is a formal way to define the value you need the hash to be less than. A more formal explanation will be given in section 4.3.4. -
Number -
$H_i$ A scalar value equal to the number of ancestor blocks. The genesis block has a number of zero
-
Gas Limit -
$H_l$ A scalar value equal to the current limit of gas expenditure per block
-
Gas Used -
$H_g$ A scalar value equal to the total gas used in transactions in this block
-
Timestamp -
$H_s$ A scalar value equal to the reasonable output of Unix’s time() at this block’s inception
-
Extra Data -
$H_x$ An arbitrary byte array containing data relevant to this block. This must be 32 bytes or fewer
This can be anything that the miner wishes to put on the blockchain. Some mining pools use this field to log their blocks.
-
Mix Hash -
$H_m$ A 256-bit hash which, combined with the nonce, proves that a sufficient amount of computation has been carried out on this block
-
Nonce -
$H_n$ A 65-bit value which, combined with the mix-hash, proves that a sufficient amount of computation has been carried out on this block.
See here for why we need both the
In order to encode information about a transaction concerning which it may be useful to form a zero-knowledge proof, or index and search, we encode a receipt of each transaction containing certain information from its execution. Each receipt, denoted
$B_R[i]$ for the $i$th transaction, is placed in an index-keyed trie and the root recorded in the headers as$H_e$ .
Each transaction receipt is a tuple as follows
(22)
-
$R_x$ - the type of transaction - See$T_x$ -
$R_z$ - the status code of the transaction -$R_z \in \mathbb{N}$ -
$R_u$ - the cumulative gas usedSo if the transaction with index 0 uses 50 gas, and the transaction with index 1 uses 100 gas, then
$B_R[0]_u$ = 50 and$B_R[1]_u = 50 + 100 = 150$ -
$R_b$ - logs bloom -$R_b \in \mathbb{B}_{256}$ - see below -
$R_l$ - series of log entries$(O_0, O_1, ...)$ where a log entry is of the form:(26)
$O \equiv (O_a, (O_{t_0}, O_{t_1}, ...), O_d)$ With
(27)
$O_a \in \mathbb{B}_{20} \space \land \forall x\in O_{\boldsymbol{t}} : x \in \mathbb{B}_{32} \space \land O_d \in \mathbb{B}$ From equation 27 we deduce that:
-
$O_a$ - logger address - is a 20 byte array - This corresponds to what we already know: Human addresses and contract addresses have, as their public address - a 20 byte array. Note that only contracts produce logs as part of an execution of a transaction sent to said contract.$O_a$ is the contract’s public address that produced the log -
$O_{\boldsymbol{t}}$ - series of log topics - a list (possibly empty) of 32 byte log topics -
$O_d$ - un-indexable data - is an arbitrary size byte array
-
In code we can represent it as follows
@dataclass
class Receipt:
"""
Result of a transaction.
"""
succeeded: Uint
cumulative_gas_used: Uint
bloom: Bloom
logs: list[Log]
@dataclass
class Log:
"""
Data record produced during the execution of a transaction.
"""
address: Address
topics: list[Hash32]
data: bytes
Bloom = Bytes256
So what are the log topics
event CustomEvent(address from, address to, uint256 val, bool flag, uint256 extra)
And consider that we emit it as follows
emit CustomEvent(
0x1dD2C14FE654653AA64575cDb8073e96C9E8e1AC,
0xA4FaB302791C00B21884880446a7Ea4B8Bb570ca,
123456789,
true,
11
)
The first topic
CustomEvent(address, address, uint256, bool, uint256)
The
ee40736e546f59e06859e69a8f70c89ca4efe416415e84b7df851d188d791e64
The next log topic
0x0000000000000000000000001dD2C14FE654653AA64575cDb8073e96C9E8e1AC
We continue like this until we have generated a list of log topics LOG1
- LOG4
set of instructions - see appendix H)(note that we convert the value of uint256
and of the bool
into their hex equivalent.
{
"address": "0xbb9bc244d798123fde783fcc1c72d3bb8c189413",
"topics": [
"0xee40736e546f59e06859e69a8f70c89ca4efe416415e84b7df851d188d791e64",
"0x0000000000000000000000001dD2C14FE654653AA64575cDb8073e96C9E8e1AC",
"0x000000000000000000000000A4FaB302791C00B21884880446a7Ea4B8Bb570ca",
"0x00000000000000000000000000000000000000000000000000000000075BCD15",
"0x0000000000000000000000000000000000000000000000000000000000000001",
]
}
So what happens to the final argument uint256 extra
? All arguments that do not fit end up being concatenated into the 0x000000000000000000000000000000000000000000000000000000000000000B
. If we had an additional argument whose hex value would be A
then 0x000000000000000000000000000000000000000000000000000000000000000B000000000000000000000000000000000000000000000000000000000000000A
(we zero pad all arguments before concatenating). We can then represent the whole log entry as follows
{
"address": "0xbb9bc244d798123fde783fcc1c72d3bb8c189413",
"topics": [
"0xee40736e546f59e06859e69a8f70c89ca4efe416415e84b7df851d188d791e64",
"0x0000000000000000000000001dD2C14FE654653AA64575cDb8073e96C9E8e1AC",
"0x000000000000000000000000A4FaB302791C00B21884880446a7Ea4B8Bb570ca",
"0x00000000000000000000000000000000000000000000000000000000075BCD15",
"0x0000000000000000000000000000000000000000000000000000000000000001",
],
"data": "0x000000000000000000000000000000000000000000000000000000000000000B"
}
Please note that the description above corresponds to 1 implementation of the solidity compiler. The yellow paper makes no note of what happens to solidity code. The yellow paper only clearly defines what happens for the LOG0
,LOG1
, LOG2
, LOG3
,and LOG4
opcodes (defined in the appendix. Hence, the solidity compiler makes use of the opcodes defined in the appendix to achieve the behaviour described above. The fact that the highest value opcode is LOG4
is the reason behind only being able to generate 4 log topics.
With the log entry clearly defined and explained by example, we proceed to take a log entry and reduce it to a single 256-byte hash.
(28)
Equation 28 states that we concatenate the value of the logger’s address
0xbb9bc244d798123fde783fcc1c72d3bb8c189413
concatenate with 0xee40736e546f59e06859e69a8f70c89ca4efe416415e84b7df851d188d791e64
concatenate with 0x0000000000000000000000001dD2C14FE654653AA64575cDb8073e96C9E8e1AC
concatenate with 0x000000000000000000000000A4FaB302791C00B21884880446a7Ea4B8Bb570ca
concatenate with 0x00000000000000000000000000000000000000000000000000000000075BCD15
concatenate with 0x0000000000000000000000000000000000000000000000000000000000000001
This corresponds with
0xbb9bc244d798123fde783fcc1c72d3bb8c189413ee40736e546f59e06859e69a8f70c89ca4efe416415e84b7df851d188d791e640000000000000000000000001dD2C14FE654653AA64575cDb8073e96C9E8e1AC000000000000000000000000A4FaB302791C00B21884880446a7Ea4B8Bb570ca00000000000000000000000000000000000000000000000000000000075BCD150000000000000000000000000000000000000000000000000000000000000001
This is the
(29)
(30)
(31)
(32)
Following through to equation 30, the result of applying
- 2047 -
$m(x,0)$ - 2047 -
$m(x,2)$ - 2047 -
$m(x,4)$
The e0b060be26a18398d0191f7969c0c819b29fe76fc73fbbed242d413ec70cf7c1
To obtain this install the Web3
package pip3 install Web3
and run the following
>>> from Web3 import web3
>>> x = Web3.toBytes(0xbb9bc244d798123fde783fcc1c72d3bb8c189413ee40736e546f59e06859e69a8f70c89ca4efe416415e84b7df851d188d791e640000000000000000000000001dD2C14FE654653AA64575cDb8073e96C9E8e1AC000000000000000000000000A4FaB302791C00B21884880446a7Ea4B8Bb570ca00000000000000000000000000000000000000000000000000000000075BCD150000000000000000000000000000000000000000000000000000000000000001)
>>> kec = Web3.keccak(x)
>>> kec
HexBytes('0xe0b060be26a18398d0191f7969c0c819b29fe76fc73fbbed242d413ec70cf7c1')
Remember that python slicing is non-inclusive to the end - We need to do [i:i+2]
to replicate the
>>> kec[0:1]
HexBytes('0xe0')
>>> kec[0:2]
HexBytes('0xe0b0')
Hence:
>>> Web3.toInt(kec[0:2]) % 2048
176
>>> Web3.toInt(kec[2:4]) % 2048
190
>>> Web3.toInt(kec[4:6]) % 2048
1697
So
- 2047 - 176 = 1871
- 2047 - 190 = 1857
- 2047 - 1697 = 350
This means that
So what was the use of all that? We still need to explain bloom filters
Bloom filters are a specialised data structure that allows the representation of a set of values for the purpose of membership queries. A bloom filter data structure is nothing more than a series of bits (the length does not need to be equal to the length of the set that it encodes). The series starts off by having all the bits set to 0. Through an iterative process, every single element in the underlying set is passed through a function that outputs a number of bit indices that should be turned to 1. For example, consider the following set of bits
Figure 15 - Initial state of the bits when constructing a blooms filter
Consider that we want to represent the following set [1, 2, 5]
Consider that the function that outputs the indices for each element looks like this
def f(elem: int) -> tuple[int,int, int]:
return (elem + 1, elem + 2, elem + 3)
This means
>>> f(1)
(2,3,4)
>>> f(2)
(3,4,5)
>>> f(5)
(6,7,8)
Setting the bits to 1 for all the indices returned results in the following
Figure 16 - Bloom filter representation after setting all bit indices associated with the set stored to 1
Now for the membership queries: A bloom filter will return true
for the inclusion of an element if the indices associated with a certain element are set to true. However, this may result in false positives. Consider
>>> f(4)
(5,6,7)
Note that indices 5, 6, and 7 are already set to 1 from the elements 2 and 5. Hence, bloom filters provide a way to test for inclusion of certain elements with a certain probability of false positives (dictated by the function of choice and number of bits in the initial set). The advantages come when a certain degree of wrong outweighs the major drawback of storing a data structure with 100% correctness. The exercise presented above - where we turned a log entry f
. By turning all log entries into a series of tuples of 3 indices, Ethereum creates a bit series (of size 2048 due to the mod we take) where each log entry sets exactly 3 of those bits to 1 - this is LOG1
, LOG2
, LOG3
, and LOG4
(as they are the ones used to compute which 3 bits are set to 1). That is why we refer to
Using this paper as reference (specifically section 2.1), one can obtain an approximation for the probability of false positives.
We have
- k = 3 - We only set 3 bits to 1 per element
- n = 70 - This is an assumption for the average number of tx per block
- m = 2048 - We
$mod \space 2048$ hence we only set a maximum of 2048 bits to 1
Plugging all this in results in a false positive probability of
A block is valid if the following conditions (all part of equation 33) hold:
(33)
The block header’s state root should be the hash of the root node of the state trie after all transactions have been executed and finalizations applied (rewarding the miner etc…)
(33)
The block header’s ommer hash is the hash of the
(37)
Knowing the definition of
(33)
The transactions root is the hash of the root node of the trie structure populated with the transactions contained in the block. To understand this equation, we must consult equation 34:
(34)
Equation 34 simply returns a tuple where the first element is the RLP encoding of its first argument and the second value in the tuple is either the RLP encoding of the tuple made from all the elements of the transaction (if the transaction type is 0), either the concatenation of the transaction type with the RLP encoding of the tuple made from all the elements of the transaction. Note that
(16)
(33)
The receipt root is similar to the transaction root, except we use the
(33)
The logs bloom of a block, as mentioned in section 4.3.1, is nothing nothing more than a series of 2048 bits. They are all initially set to 0 and each log entry spits 3 bit indices which turns the bits at those indices to 1 (if the bits are already 1 from another previous log entry, we do nothing). In mathematical terms, we use or
between a set of elements. The set is given by or
the logs bloom
In simpler terms, a block has many transaction receipts. A single transaction receipt has a series of logs. Each log outputs 3 bit indices to set to 1. Hence, the logs bloom of a block is produced by setting to 1 all the bit indices returned from all the log entries of all its associated transaction receipts. This is what the equation 33 above says.
This section offers a formal definition of what a valid block looks like. To understand the requirements, we must first understand how to formally define the difficulty associated with each block.
We begin by defining the function that takes as input a block header, and outputs the parent block of the block whose header we fed in
(43)
We define the canonical difficulty of a block of header
(45)
The canonical difficulty is an integer value that starts of as
(46)
We lower bound the canonical difficulty to be
(47)
We define
(48)
(49)
Let’s try to understand how
If we put
This is because
-
$\lfloor \frac{P(H)_{H_d}}{2048} \rfloor$ is a constant - The previous block has been mined. Its canonical difficulty is known - The term
$y$ is not significant as it’s either a 1 or a 2 - The floor function applied around the timestamp difference only makes the function discontinuous
- The division of the timestamp difference by 9 only stretches the function
Figure 17 - Visual representation of the scaled Homestead difficulty parameter in the canonical difficulty equation
The knee of the function is at (-99, -99 * A) - in this case I set
The final term that contributes to the canonical difficulty is the exponential difficulty symbol
(50)
(51)
With
This term adds an exponential term to the canonical difficulty (exponential based on the block period). It has the effect that the canonical difficulty will become too high - making it almost impossible to mine. It has been added as an incentive to transition to PoS (which has happened at the time of writing) as post-transition, the canonical difficulty, will be unused - It’s only used in the PoW scheme.
In order to avoid issues of network abuse and to sidestep the inevitable questions stemming from Turing completeness, all programmable computation in Ethereum is subject to fees.
The Halting problem is undecidable over Turing machines. In other words, a Turing Complete machine (which Ethereum is) is not able to predict if a program will finish running. As a result, one cannot devise a payment schedule ahead of time.
Every transaction has a specific amount of gas associated with it: gasLimit. This is the amount of gas which is implicitly purchased from the sender’s account balance. The purchase happens at the according gasPrice, also specified in the transaction
As we cannot predict if a program will finish, the only way to charge for computation performed is during execution - if the amount provided by the transactor is lower than the amount needed to perform the
The execution of a transaction is the most complex part of the Ethereum protocol: it defines the state transition function
$\Upsilon$ . It is assumed that any transactions executed first pass the initial tests of intrinsic validity.
Most of the conditions are clear. The following need some clarifications
- The sender account has no contract code deployed - In simpler terms, contracts cannot issue transactions. Only human accounts
- The gas limit is no smaller than the intrinsic gas,
$g_0$ , used by the transaction - Every transaction must pay some gas based on factors such as the number of accounts/keys in its access list, number of non-zero bits (This will be defined formally in Section 6.2 - equation 60) - The sender account balance contains at least the cost ,
$v_0$ , required in up-front payment - During a transaction construction, the gas limit$T_g$ and the gas price$T_p$ are specified. Additionally, the endowment$T_v$ is defined. This condition checks that the sender account has enough balance to cover all of these parameters.
As seen before, the state transition function satisfies the following relation
(57)
Where
-
$\Upsilon^g$ - amount of gas used in the execution of the transaction -
$\Upsilon^l$ - the logs generated by the transaction -
$\Upsilon^z$ - the status code resulting from the transaction
Throughout transaction execution, we accrue certain information that is acted upon immediately following the transaction
The paper defines the accrued substate as follows:
(58)
-
$A_s$ - self destruct set -
$A_l$ - logs generated by the transaction -
$A_t$ - set of touched accounts -
$A_r$ - refund balance -
$A_a$ - set of accessed account addresses -
$A_K$ - set of accessed storage keys. Note that this is a tuple where the first element is a 20 byte address, and the second is a 32-byte storage slot.
Transaction execution is a multi step process. As things change across the steps, the execution model needs a way to keep track of these changes and act upon them when necessary. For example, it makes sense to keep track of all the logs generated across the execution model - hence we have
Note Think of the substate as the pieces of data that one needs to keep track of throughout the execution model**.** It is not entirely fair to say that the information is acted upon immediately following the transaction - In the case of the access list, the information is acted upon immediately after the modification to the access list is made.
The paper defines the empty accrued substate
(59)
The empty accrued substate
- No accounts in the self destruct set
- No logs
- No touched accounts
- No gas to be refunded
- The set of all precompiles in the access list
- No accessed storage keys
The set of precompiles refers to a set of contracts that are virtually deployed. This means that they do not exist at a real address - they virtually exist at the following range of addresses 0x0 -> 0x10
(as of the Berlin version - more have been added after 0x10
in an incremental manner). When a node sees one such address they are responsible for executing certain code that is implemented outside of the EVM abstraction. The code can be implemented in C, Python, whatever the node wishes - although the fixed gas charge for calling a precompile means that the node is incentivised to have the most optimal implementation as less efficient ones still cost the same amount of gas. The reasoning behind the existence of precompiles is that some common functionality is better implemented without the overhead (and possible limitations) that an EVM implementation would introduce. Here is a list of the available precompiles in the GRAYGLACIER version (note that back then there were only 9 precompiles). Since the Berlin version, all precompiles are warm. This means that transactions may omit adding these addresses in their access list without being later penalised for using them. The introduction of all precompiles in the access list of the substate does not mean that they are free to use. It just means that their address never needs to be warmed up - one will never pay the more expensive gas fee of calling a cold address for any of the available precompiles.
Note Contrary to the name of this section, this part does not explain how a transaction is executed at its lowest levels (opcodes, program counters etc…). It describes the high-level design of transaction execution. The lower-level implementation comes in section 9.
We define intrinsic gas
$g_0$ , the amount of gas this transaction requires to be paid prior to execution, as follows:
(60)
Let’s look at each line individually
The paper defines
When creating contracts, the sender of one’s transaction is
Every transaction has a base fee
This term charges a base fee of
The up-front cost
$v_0$ is calculated as:
(61)
One way to think about their difference is that
With these concepts, the paper defines the validity of a transaction
(62)
With
The sender of the transaction cannot be the empty address
The sender’s code hash must be the
The nonce of the transaction matches the sender’s nonce
The intrinsic gas
The up-front cost
The transaction’s maximum gas
With the above definitions, the paper describes the multi-part process that is the execution of a transaction.
Part 1
- Validate the transaction according to equation 62
- Increment the nonce of the sender
- Reduce the sender’s balance by the up-front cost
$v_0$ - Set the available gas for execution as
$T_g - g_0$ - Add the sender of the transaction to the substate’s
$A_a$ , as well as all the addresses in the transaction’s access list$T_A$ . - Add the addresses and the associated storage slots from the transaction’s access list
$T_A$ to the set of accessed storage keys$A_K$ (note that we ignore entries in the transaction’s access list$T_A$ that have no storage slot - we add those in$A_a$ in step 5)
A snapshot of the world state after these steps are taken is given by state
(63)
(64)
Note that equation 64 is not entirely correct as it forgets about the endowment that is taken into account for the upfront cost
(65)
For the substate, the modifications are reflected in
(67)
(68)
Notice how we add all the addresses in the transaction’s access list.
(69)
Notice how equation 69 only adds to
Part 2
Pass
-
$\boldsymbol{\sigma}_P$ - the world state after executing the transaction -
$g'$ - remaining gas after executing the transaction -
$A$ - the substate after executing the transaction -
$z$ - the status code of the transaction ($z \in {0,1}$ if it failed or succeeded respectively)
All quantities above will be fully defined in sections 7, 8 and 9 where the framework for executing the transaction is defined.
Part 3
- Refund the transaction sender for any unspent gas
- Pay out the spent gas to the miner
Mathematically, the state after executing part 3 is
(73)
(74)
Refund the transactor
(75)
Pay out the miner
(76)
The miner is labelled in the block header
(71)
The paper defines
(72)
Part 4
- Delete all accounts that appear in the self-destruct set
- Delete all accounts that appear in the touched set and are empty. (Note that this step was added as of EIP-161 - preventing
$EMPTY$ accounts from ever coming into existence)
Mathematically, the paper expresses the final state (the one that is the result state after the transaction execution) as
(77)
(78)
Delete all accounts that are in the self-destruct set
(79)
Delete all accounts that are in the touched set and are SELFDESTRUCT
may declare the beneficiary to be an account that is not in existence. That account will end up in the touched set, but it will be
Finally, the paper defines 3 quantities that are useful in future sections
(80)
The gas left after executing transaction T
(81)
The logs accumulated after executing transaction T
(82)
The status code resulting from executing transaction T
Note that contract creation transactions are just a specialised variant of a transaction. Not to be confused with a message call!
Inputs:
- sender -
$s$ - in solidity this would bemsg.sender
- original transactor -
$o$ - An example of when the original transactor,$o$ , is different to the sender,$s$ , is when a transaction issues a message call to a contract that causes another transaction to be issued in the flow of executing the contract code. In that case, the sender,$s$ , will be the contract, but the original transactor will remain the human address that issued the first call to the contract - in solidity this would betx.origin
- available gas -
$g$ - gas price -
$p$ - endowment -
$v$ - initialization evm code -
$\boldsymbol{i}$ - The initialization code is the collection of opcodes and operands that together construct the contract. The execution of$\boldsymbol{i}$ is responsible for returning a program output$\boldsymbol{o}$ that represents the contract code - present depth of the message-call/contract-creation stack -
$e$ - Every time a transaction spawns a new transaction (be it that a transaction results in a contract creation call or that a transaction results in another transaction) the depth$e$ is increased. There is a maximum depth of 1024. - salt -
$\zeta$ - Used with the opcode$\text{CREATE2}$ . Traditionally, the address of a new contract created by an account was fully determined by the nonce of the account and the account address itself. With the introduction of the salt,$\zeta$ , the address of the new contract can be scrambled. - permissions to make modifications to the state -
$w$ - In solidity, code that does not have$w$ is a static call. The flag does exactly what its name suggests. If not set, a transaction cannot modify state (world state and/or contract state).
Contract-creating transactions determine the address at which the deployment of the contract will happen,
(85)
(86)
The address is the rightmost 160 bits of the KEC of what
(87)
If the creation was caused by
$CREATE2$ , then$\zeta \neq \emptyset$
As already mentioned, if the salt
We define the creation function formally as the function
$\Lambda$ , which evalutes from these values, together with the state$\boldsymbol{\sigma}$ and the accrued substate$A$ , to the tuple containing the new state, remaining gas, new accrued substate, status code and output
(84)
The creation function
Part 1
-
Add the address computed using equation 85 to the set of accessed accounts
(88)
$A^\star \equiv A$ except$A^\star_a\equiv A_a \cup {a}$ Note that
$A$ is the input account substate and$A^\star$ is the account substate as of the end of part 1. -
Initialise the newly created contract account to have
(89)
$\boldsymbol{\sigma}^{\star} = \boldsymbol{\sigma}$ except-
Nonce equal to 1
$\boldsymbol{\sigma}^\star[a]_n = 1$ -
Balance equal to the value passed
$\boldsymbol{\sigma}^\star[a]_b = v +v'$ Note that
$v'$ is the balance of the contract if it was previously in existence - Note that the assumption here is that the account was in existence but it was$EMPTY$ . Consider a smart contract initialization code that first deletes some outdated contracts - (effectively callingSUICIDE
on the outdated contracts), before initializing a contract at the exact address of the beneficiary of all theSUICIDE
instructions from before. All the contract deletions will refund the beneficiary. Hence, when the contract-creating transaction is running, it will have an inherent balance, from all theSUICIDE
calls from before. -
Empty storage
$\boldsymbol{\sigma}^\star[a]_s = TRIE(\emptyset)$ -
Code hash as the
$KEC$ of the empty string$\boldsymbol{\sigma}^\star[a]_c = KEC(())$ Note how at this instance in time, the contract account is a normal human account - no code hash. The paper summarises all the above as
(90)
$\boldsymbol{\sigma}^\star[a] = (1, v + v', TRIE(\emptyset), KEC(()))$
-
-
Reduce the sender’s balance if the sender exists
(91)
$\boldsymbol{\sigma}^\star[s] = \begin{cases} \emptyset & \text{if} & \boldsymbol{\sigma}[s]=\emptyset \space \land v = 0 \\ \boldsymbol{a}^\star & \text{otherwise} \end{cases}$ Where
(92)
$\boldsymbol{a}^\star \equiv (\boldsymbol{\sigma}[s]_n, \boldsymbol{\sigma}[s]_b - v, \boldsymbol{\sigma}[s]_s, \boldsymbol{\sigma}[s]_c)$ Note that the yellow paper hints at the possibility of the sender not existing. This is interesting, because the validity function clearly forbids this.
Part 2
Feed the computed state from part 1
(94)
Note that the code execution function will be discussed at length in section 9. It is nothing more than a function responsible for taking a series of opcodes and operands and running them according to a well-defined set of rules. There are certain opcodes that allow the permanent storage of the account to be modified. The bytes stored are mapped to opcodes.
One way to think about
Note that there are 3 possible scenarios after the code execution function finishes:
- Out of gas exception
This is thrown
-
When the first opcode that costs more than the available gas (at that point in the execution flow) is seen.
-
There is no gas left to pay for the storage of the contract (proportional to the size of the contract body code
(104)
$c \equiv G_{\text{codedeposit}}\times||\boldsymbol{o}||$ -
Several other exceptional cases
-
Execution reverted
This happens if code execution ever encounters the REVERT
opcode - One example where one would include the REVERT
opcode in code is to do condition checking. If the message sender is not the owner of the account, then revert - the concept behind solidity modifier guards).
- Execution finished successfully
This happens when the initialization code completes successfully
The paper proceeds to present equations 105, 106, 107, and 108 - defining the output of the code execution function. The output is defined conditionally on
(109)
Let’s break equation 109 in simple terms
$F \equiv (\boldsymbol{\sigma}[a] \neq \emptyset \space \land (\boldsymbol{\sigma}[a]_c \neq KEC(()) \space \lor \boldsymbol{\sigma}[a]_n\neq 0)) \space$
Note that
$(\boldsymbol{\sigma}^{\star\star} = \emptyset \space \land \boldsymbol{o}=\emptyset)$
This is the scenario of a out-of-gas exception being thrown. The output of the code execution function will be well defined in section 9.
$g^{\star\star} < c$
The remaining gas is less than the contract-creation cost
$||\boldsymbol{o}|| > 24576$
The contract code is bigger than the maximum code size
With
(105)
The remaining gas after the contract-creation flow completes is 0 if an out-of-gas exception occurs. Otherwise, it is the remaining gas after the code executing function minus the contract-creation cost. (A cost proportional to the size of the contract’s body code size). Note that if the code reverted the gas is not depleted.
(106)
If we study section 9,
(107)
If we encounter an out-of-gas exception or the execution reverts, keep the same accrued substate. Otherwise, take the code execution function’s output accrued substate
(108)
If we encounter an out-of-gas exception or the execution reverts, the status code is 0 (failure). Otherwise 1 (success)
The message call section is similar to the contract creation section. Its inputs are largely the same, except
- The initialization code
$\boldsymbol{i}$ is replaced by the input data of the call $\boldsymbol{d}$ - No salt
$\zeta$ is passed
Its outputs are the same except
-
$\boldsymbol{o}$ - the output data - is not ignored (as it’s the case in contract-creating transactions or more generally, in any transaction).
(…
$\boldsymbol{o}$ ) is ignored when executing transactions, however message calls can be initiated due to VM-code execution and in this case this information is used
Here is an excellent explanation of the difference between transactions (simple or contract-creating) and message calls. Visually, consider figure 18
Figure 18 - Difference between message calls and transactions
In brief terms, message calls outside the context of a transaction (see the first two items in figure 17) do not alter the state of the blockchain. They serve a read-only purpose. On the other hand, message calls in the context of a transaction may change the state of the blockchain. Contracts always communicate between themselves using message calls and they only cause state to be changed if the message call is triggered by a transaction. A message call is a dry-run unless called from a transaction; then it becomes an actual real run that consumes gas and triggers state changes.
Message calls always return an output, whereas transactions never return an output. It is helpful to imagine the interactions of a transaction and a message call to understand the reasoning behind this. A transaction will eventually get included and executed whereas a message call must happen immediately. If person A issues a transaction to a contract, the transaction will eventually get executed and hence, it will eventually produce some logs. However, if a transaction calls contract A that calls contract B, then the call to contract B must return within the context of execution of the originating transaction. The transaction execution cannot wait until the eventual transaction to contract B is executed. As such, contract A does not issue a transaction to contract B, but a message call. The execution of this message call is run in the context of the transaction and its output is returned immediately for the originating transaction to proceed.
Note that we need to differentiate between the value that is to be transferred, v, from the value apparent in the execution context,
$\tilde{v}$ , for theDELEGATECALL
instruction
This statement refers to the subtle change in the behaviour of a DELEGATECALL
instruction. DELETEGATECALL
is used to execute the code of another account in the context of the current account. As such, the
Part 1
The first step in executing a message call is to reduce the sender’s balance by the value being sent, and incrementing the recipient’s balance by that same amount. Note that the implementation allows for the recipient to not initially exist. If the recipient address does not exist an
(112)
(113)
Again, it is interesting that the paper defines the behaviour for a non-existing sender as the validity function prohibits it. However, if the sender does exist then his balance is reduced by the value being sent
(114)
The paper defines
(115)
(116)
The first clause of equation 116 shows that the recipient becomes an
(117)
The state
Part 2
In essence, part 2 is executing the account code of the message call account (or the delegated code for instructions DELEGATECALL
, CALLCODE
) using the flow explained in section 9. It is important to notice the existence of precompiles. These are, as explained before, virtual contracts that are executed outside of the EVM abstraction.
The execution model specifies how the system state is altered given a series of bytecode instructions and a small tuple of environmental data
Both the flow of contract-creation and that of the message call use the execution layer to interact with the EVM. Prior to the execution layer being interacted with, nothing changes in the state of the system.
The EVM is a simple stack-based architecture. The word size of the machine (and thus size of stack items) is 256-bit… The stack has a maximum size of 1024 … The machine also has an independent storage model… The memory model is a simple word-addressed byte array … The machine also has an independent storage model; this is similar in concept to the memory but rather than a byte array, it is a word-addressable word array. Unlike memory, which is volatile, storage is non volatile and is maintained as part of the system state… All locations in both storage and memory are well-defined initially as zero.
Visually, we have the stack:
Figure 19 - The stack model of the EVM
It has a maximum size of 1024 words (a word is a 32 byte element). EVM opcodes consume items from the stack as input and output the result back onto the stack. For example the opcode ADD
will add the first two items from the stack (position 0 and position 1) and store the result in position 0.
Visually, we have the memory model:
Figure 20 - The memory of the EVM
The memory is a word-addressable byte array. It is infinite in size but gas must be paid for all memory usage. The gas paid is linear up until 724 bytes - it becomes quadratic after that. Note also that memory is volatile. Anything stored will be discarded after execution finishes.
Visually, we have the storage model:
Figure 21 - The storage of the EVM
The storage is a non-volatile word-addressable word-array that is kept as part of the system state. The storage is also infinite in size, but gas must be paid to permanently store items in the system state.
The machine can have exceptional execution for several reasons, including stack underflows and invalid instructions. Like the out-of-gas exception, they do not leave state changes intact. Rather, the machine halts immediately and reports the issue to the execution agent (either the transaction processor or, recursively, the spawning execution environment) which will deal with it separately
It is interesting to note that the notion of not modifying the state - in the case of an exception being thrown - is not part of the code-executing logic. It is the responsability of what the paper refers to as the execution agent. The execution agent can be thought of as the flow that called the code-executing transaction. Trivially it may be the transaction processor (when a transaction directly calls the code-executing function). However, transactions can result in contracts calling other contracts. The call is dealt with in a recursive manner. All of the input data is defined, and the output is returned as if the execution spawned from the contract call is itself an independent transaction. As such, the paper states that it is the responsability of the contract that called the contract to ensure the state is maintained as before if the call itself was invalid. In other words, if person A calls contract B which issues a call that fails to contract C, person A can assume that the state returned by contract B has not been modified by the call to contract C.
This subsection describes when gas is charged.
- Fee intrinsic to the computation of an operation - This is the fee association with an opcode such as
ADD
- all the fees associated with all available opcodes is given in appendix G of the paper - Gas is charged for the payment associated with a subordinate message call or contract creation - During code-execution, if an opcode is encountered that results in a message call or contraction creation - the call itself is charged (as issuing calls / contract-creations costs gas)
- Increase in the usage of memory - This is for the memory model and storage model only. Usage of the stack is completely free. Note that freeing the non-volatile storage model by deleting contents from it results in gas being refunded.
The input to the code-executing function
-
$I_a$ - the address that owns the code that is being executed. Note that code can be executing inside the context of another account with instructions such asDELEGATECALL
. -
$I_o$ - the sender address of the transaction that originated this execution - note that this is different to$I_s$ . If person A calls contract B that calls contract C, then the code-executing function - in the context of the final call to contract C - will have$I_o$ as contract B, but$I_s$ as person A. -
$I_p$ - the price of gas in the transaction that originated this execution -
$I_{\boldsymbol{d}}$ - the byte array that is the input data to this execution -
$I_s$ - the address of the account which caused the code to be executing - note the difference with$I_o$ -
$I_v$ - the value, in$Wei$ passed to this account -
$I_{\boldsymbol{b}}$ - the byte array that is the machine code to be executed -
$I_H$ - the block header of the present block -
$I_e$ - the depth of the present message-call or contract creation - if person A calls contract B that calls contract C then the code-executing function - in the context of the final call to contract C - will have$I_e$ as 3 -
$I_w$ - the permission to make modifications to the state. This is equivalent to thestatic
keyword in solidity. No call that has$I_w$ asfalse
may modify any persistent state. Usually calls like these are read only.
The code-executing function
-
$O$ - this function defines the result of a single cycle of the EVM -
$Z$ - this function determines if the present state is an exceptional halting state of the machine -
$H$ - this function determines the output if and only if the present state is a normal halting state of the machine - Note that this function evalutes to the empty set$\emptyset$ if the current state is not a normal halting state and to an array (possibly empty) -$()$ - if the present state is indeed a normal halting state. The key take-away is that$() \neq \emptyset$ (an output of$()$ represents a normal halting state output whereas$\emptyset$ does not
Using the definitions from above, the code-executing function
Before formalising
Quantities such as
-
$\boldsymbol{\mu}_g$ - gas available for execution -
$\boldsymbol{\mu}_{pc}$ - the program counter - used to index$I_b$ -
$\boldsymbol{\mu}_{\boldsymbol{m}}$ - the memory -
$\boldsymbol{\mu}_i$ - the active number of words in memory - as the memory is initially well defined as being all 0s - the element$\boldsymbol{\mu}_i$ helps distuinguish what is being held in memory (possibly a series of 0s) and what is not (the series of 0s initially defined like so) -
$\boldsymbol{\mu}_{\boldsymbol{s}}$ - the stack contents -
$\boldsymbol{\mu}_{\boldsymbol{o}}$ - the program output. When contract A calls contract B, the output of the call gets stored in$\boldsymbol{\mu}_{\boldsymbol{o}}$ . For contract A to access this output, it must issue aRETURNDATACOPY
instruction - this copies the contents of$\boldsymbol{\mu}_{\boldsymbol{o}}$ into$\boldsymbol{\mu}_{\boldsymbol{m}}$ . By doing this, contract A can now act on the returned data from the call to contract B
The machine state
(137)
The available gas for execution is the remaining gas fed into the code-executing function
(138)
The program counter starts at the first instruction of the code
(139)
The memory is defined as all 0s
(140)
There are currently no active words in memory
(141)
The stack is empty
(142)
The output is empty.
To understand
(148)
In simple terms, the current instruction is defined as the instruction at index
With the definition of the current instruction the paper formalizes the remaining ideas
-
$Z$ - exceptional halting function
The function is defined in equation 149
(149)
Let’s go through the lines one-by-one. The current state is an exception halting state if
-
$\boldsymbol{\mu}_g < C(\boldsymbol{\sigma}, \boldsymbol{\mu},A,I)$ The gas available is less than the gas required to execute the next instruction
-
$\delta_w = \emptyset$ Every instruction
$w$ has associated with it the following two quantities:-
$\delta_w$ - the number of arguments it will pop from the stack -
$\alpha_w$ - the number of arguments it will push onto the stack
The number of arguments an instruction will pop from the stack
$\delta_w$ corresponds to the arguments an instruction/opcode expects. For example instructionADD
expects two arguments to add. If an instruction$w$ has$\delta_w = \emptyset$ or$\alpha_w = \emptyset$ then the instruction is not defined. It is an invalid instruction. Hence, the current state is an exceptional halting state if the next instruction is an invalid instruction -
-
$||\boldsymbol{\mu}_s|| < \delta_w$ As mentioned above,
$\delta_w$ is the number of items the instruction will pop from the stack. If the stack size is smaller then the current state is an exceptional halting state -
$(w = JUMP \space \land \boldsymbol{\mu}_s[0] \notin D(I_b))$ If the current instruction is
JUMP
and the jump destination$\boldsymbol{\mu}_s[0]$ - the first item on the stack - is an invalid jump destination. This is more formally defined later on once the jump destination validity set$D(I_b)$ is defined. -
$(w = JUMP1) \space \land \boldsymbol{\mu}_s[1]\neq 0 \space \land \boldsymbol{\mu}_s[0] \notin D(I_b))$ JUMP1
is a conditional jump. What the condition above states is that the current state is an exceptional halting state if the condition for jumping is true$\boldsymbol{\mu}_s[1]\neq0$ - the second item on the stack is not equal to 0 - and the jump destination$\boldsymbol{\mu}_s[0]$ - the first item on the stack - is an invalid jump destination. -
$(w = RETURNDATACOPY \space \land \boldsymbol{\mu}_s[1] + \boldsymbol{\mu}_s[2] > ||\boldsymbol{\mu}_o||)$ The current instruction
RETURNDATACOPY
tries to copy$\boldsymbol{\mu}_s[2]$ bytes starting at index$\boldsymbol{\mu}_s[1]$ from the output$\boldsymbol{\mu}_o$ into the internal memory$\boldsymbol{\mu}_m$ . The current state is an exceptional halting state if the instruction results in an out-of-index exception (If one tries to copy outside the range of the array$\boldsymbol{\mu}_o$ . -
$||\boldsymbol{\mu}_s|| - \delta_w + \alpha_w > 1024$ The current instruction would cause a stack overflow.
-
$(\neg I_w \space \land W(w,\boldsymbol{\mu}))$ The current instruction attempts to modify the state when the transaction does not have such a permission (a static call). The function
$W(w, \boldsymbol{\mu})$ returns true if the current instruction is a state-modifying instruction: -
$(w = SSTORE \space \land \boldsymbol{\mu}g \leq G{\text{callstipend}})$
The current instruction tries to save a word (32 bytes) to memory. The current state is an exceptional halting state if the remaining gas is not enough to pay for the memory storage associated with storing a word.
As mentioned above, JUMP
and JUMP1
instructions may result in exceptional halting states in the event that the jump destination is not in the jump destination validity set
(151)
The function
(152)
The function
Figure 22 - A program code with 2 JUMPDEST
instructions
Function PUSH
instructions. A JUMPDEST
instruction many not reside in the data portion of a PUSH
instruction. Consider the set of program code given in figure 23.
Figure 23 - 3 program codes illustrating positions where a JUMPDEST
instruction may not reside as a result of a PUSH
instruction
A JUMPDEST
may not reside in any of the red shaded positions from figure 23. This is because those positions correspond to the data positions of a PUSH
instruction. PUSH1
will push the cell immediate after onto the stack - hence a JUMPDEST
may not reside there. PUSH2
will push the next 2 cells onto the stack - hence a JUMPDEST
may not reside in any of the 2 positions after. To handle this slight caveat the function
(153)
What this function does is it returns the closest index after the input index PUSH
instruction.
-
$H$ - Normal halting function
The paper defines the normal halting function
(154)
The function returns
RETURN
and REVERT
. For both, the returned data is
In words, we return the memory contents in the range given by the first stack element and the second stack element.
-
$O$ - Single cycle function
The paper defines the single cycle function
(155)
(156)
The overall change in items on the stack is the number of items pushed
(157)
The size of the new stack is the size of the previous stack plus the overall change defined in equation 156
(158)
All the items already on the stack will be shifted based on the overall stack change
(159)
The gas will be reduced by the current instruction gas cost
(160)
The program counter will be updated accordingly if the next instruction is JUMPDEST
that resides at PUSH
instruction.
The paper also says that - in general - we may assume the following remain unchanged
(161)
The memory contents remain unchanged
(162)
The number of active words in memory remains unchanged
(163)
The accrued substate remains unchanged
(164)
The world state remains unchanged
Note that exception to this generality are instructions that either change the state, add to memory, or produce logs.
With the exceptional halting state function
(143)
Let’s break it down line by line
$(\emptyset, \boldsymbol{\mu}, A, I, \emptyset) \quad \text{if} \quad Z(\boldsymbol{\sigma}, \boldsymbol{\mu}, A,I)$
In the case of an exceptional halting state, $X$returns an empty world state and an empty output
$(\emptyset, \boldsymbol{\mu}',A,I,\boldsymbol{o}) \quad \text{if} \quad w = REVERT$
In the case of a REVERT
,
$O(\boldsymbol{\sigma}, \boldsymbol{\mu}, A, I) \cdot \boldsymbol{o} \quad \text{if} \quad \boldsymbol{o} \neq \emptyset$
In the case that the normal halting state function returns true (the current state is a normal halting state) we return the output of REVERT
or the empty array
$X(O(\boldsymbol{\sigma},\boldsymbol{\mu}, A, I)) \quad \text{otherwise}$
In all other scenarios, we are not at a halting state (normal or exceptional) so we keep executing the code.
With all the definitions above, the paper has formalised the code execution of the EVM.
Block finalisation is a 4 step process:
As mentioned before, block headers contain a list of ommer block headers. Miners get rewarded for including such ommer block headers as will be seen in section 11.3. To determine if the block headers included in a block are valid one must check two things - given in equation 167
(167)
In words, only 2 ommer block headers may be included and indeed the headers must be valid. To determine their validity, the is-kin function is given in equation 168
(168)
In words, the is-kin function is a function that checks the is-sibling function
(169)
The is-sibling function returns true if the parent of block
Figure 24 - A visualization of the possible Ommer blocks for a given block as determined by the is-sibling function
By crawling up 6 levels, it determines that any of the green shaded blocks may be present in the ommer list of the latest block. However, any blocks above 6 levels are not valid - see the red shaded blocks.
The transaction validation part of the block finalization check is a simple one
(170)
The total gas used in the block
Reward is given to the block beneficiary
(171)
(172)
The block beneficiary is rewarded
(173)
(174)
For the ommer block beneficiaries, a reward
(175)
Interestingly, we see that the further an ommer is from the current block number, the bigger the reward. Consider figure 25.
Figure 25 - Including Ommer B is more valuable than including Ommer A
It is more valuable to a miner to include ommer B than ommer A, as the difference in block number
The quantity
(176)
We see that the reward per block has decreased across the periods of existence.
One reason for having ommer blocks is that their inclusion adds to the total amount of work done on the canonical chain.
The nonce validation is left out of scope for this annotation as Ethereum has transitioned to PoW. (These are equations 178 - 181).
For state validation the paper defines a handful of indexable quantities.
(182)
Equation 182 defines an indexable state array - indexable for each transaction included in the block. For example,
(183)
Equation 183 gives us an indexable cumulative gas usage - indexable for each transaction included in the block. (we remember that
(184)
Equation 184 gives us an indexable log array - indexable for each transaction included in the block. For example,
(185)
Equation 185 gives us an indexable transaction status array - indexable for each transaction included in the block. For example,
The reason why this section is called state validation is because using the quantities defined above, one may validate what one may expect.
This annotated version of the Ethereum yellow paper aims to clarify concepts that, at first, may come across as in need of a further dive. If anything is wrong, or the reader feels that certain parts might be clarified another way then I encourage the reader to do so. Understanding the Ethereum Yellow Paper is an important step in furthering our ecosystem.