Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(anvil): enhance block mining performance in Anvil node for high throughput and efficiency #7039

Closed
Tracked by #8269
mshakeg opened this issue Feb 7, 2024 · 15 comments
Labels
C-anvil Command: anvil T-perf Type: performance

Comments

@mshakeg
Copy link

mshakeg commented Feb 7, 2024

Component

Anvil

Describe the feature you would like

I propose a performance enhancement for the Anvil node, specifically targeting the efficiency of block mining. Through some tests I've observed that while Anvil demonstrates impressive transaction processing capabilities, there's a noticeable disparity in throughput efficiency primarily attributed to the time spent mining blocks. This feature request seeks optimizations in Anvil's block mining to reduce execution time, thereby increasing the overall transactions per second (TPS) throughput and making the node more suitable for applications requiring high transaction processing speeds as well as frequent mining of blocks.

Additional context

Anvil version: 0.2.0 (2cf84d9 2024-02-07T00:15:49.622159000Z)

To illustrate the current performance characteristics and provide a basis for this request, I conducted a test using a Uniswap V3 transaction replay script. The findings highlight a significant potential for performance gains in block mining processes. For instance, when increasing the nullSwapsPerBlock from 1 to 2000, the average TPS improved dramatically(by a factor of 7x), indicating that the node spends a significant portion of time mining blocks vs actual transaction execution. To replicate this test:

  1. clone this repo anvil-backtester, install deps(pnpm i)
  2. start the anvil node: pnpm anvil:start
  3. run the test script: pnpm test:anvil-memory with nullSwapsPerBlock set to 1 and then again set to 2000 and observe results similar to the following indicating significant overhead in mining blocks:
{
  blocksToMine: 25,
  nullSwapsPerBlock: 1,
  totalTxs: 50,
  executionTime: 0.084,
  averageTPS: 595.2380952380952,
  averageTimePerTx: 1.6800000000000002
}
{
  blocksToMine: 25,
  nullSwapsPerBlock: 2000,
  totalTxs: 100000,
  executionTime: 24.747,
  averageTPS: 4040.8938457186728,
  averageTimePerTx: 0.24747000000000002
}
@mshakeg mshakeg added the T-feature Type: feature label Feb 7, 2024
@gakonst gakonst added this to Foundry Feb 7, 2024
@github-project-automation github-project-automation bot moved this to Todo in Foundry Feb 7, 2024
@mattsse
Copy link
Member

mattsse commented Feb 7, 2024

it likely spends most of the time cleaning up / updating old state

could you try with --prune-history if you notice any difference?

There's definitely room for significant improvements here

@mshakeg
Copy link
Author

mshakeg commented Feb 7, 2024

@mattsse I am using --prune-history in the anvil command as shown below

https://github.com/mshakeg/anvil-backtester/blob/main/shell/anvil.sh

Removing --prune-history and --transaction-block-keeper 4 from the above command does not result in any noticeable changes in performance.

@mattsse
Copy link
Member

mattsse commented Feb 7, 2024

hmm, could you perhaps run this with samply https://github.com/mstange/samply and see if anything sticks out

I'll try to investigate shortly

@mshakeg
Copy link
Author

mshakeg commented Feb 7, 2024

@mattsse thanks, don't really know what to make of the profile, but I've attached the trace on evm_mine, maybe GPT4 could be a source of inspiration :)

Based on this call trace, here are a few points to consider for profiling and improving performance:

  1. Database Interactions: The evm_mine operation involves interactions with an in-memory database. Optimizations here could involve reducing the number of reads and writes, caching frequently accessed data, or improving the database's data structures.

  2. State Trie Manipulation: There are multiple calls to trie_db functions, which indicate manipulation of the state trie. This is an area that typically has a significant impact on performance. Optimizing trie algorithms or using a more efficient trie structure could yield performance improvements.

  3. Hash Calculations: The keccak_hasher and tiny_keccak functions suggest that Keccak hashing is part of the operation. Optimizing hashing or reducing the number of hash calculations required could improve performance.

  4. EVM Execution: The revm specific calls such as run_interpreter and preverified_inner imply that EVM bytecode execution is a part of the process. Profiling the EVM's interpreter loop, opcode execution, and context switching could reveal bottlenecks.

  5. Smart Contract Calls: Calls to inspect_call_instruction and Host::call suggest that smart contract function calls are being made. Optimizing the way smart contracts are called and executed, possibly by reducing the overhead of call setup and teardown, could improve performance. This could include minimizing the overhead associated with setting up the environment for a contract call and efficiently handling the stack and memory operations.

  6. Parallelism and Concurrency: Evaluate if any parts of the evm_mine process can be executed in parallel. Some operations, especially state-independent ones, may benefit from concurrent execution.

  7. Memory Management: Functions like drop_in_place suggest that there is active management of memory, possibly with data structures being de-allocated. Improving memory allocation strategies, avoiding unnecessary allocations, and reusing memory buffers could reduce overhead and improve performance.

  8. Opcode Optimization: Within the EVM execution, certain opcodes may be used more frequently or may be more resource-intensive. Profiling at the opcode level could help identify if specific opcodes are bottlenecks and could be optimized.

  9. Caching Strategies: For repetitive operations, especially within the EVM interpreter, caching results of expensive computations could be beneficial if they're likely to be repeated with the same inputs.

  10. Profiling and Instrumentation Tools: Utilize profiling tools that can provide granular insights into CPU and memory usage. Rust's performance tools, such as perf on Linux or DTrace/BPF on BSD/Mac, can help identify hot paths and functions that are taking the most time or consuming the most resources.

  11. Algorithmic Efficiency: Review the algorithms used in the trie manipulation and hashing to ensure they are the most efficient for the use case. Sometimes, algorithmic improvements can yield better performance gains than low-level optimizations.

  12. Code Review and Refactoring: There might be opportunities to refactor the code for efficiency. This could involve combining functions, inlining functions to reduce call overhead, or simplifying complex logic.

  13. Batch Processing: If the evm_mine operation can be batched (i.e., processing multiple transactions or blocks in a single operation), it could reduce the per-operation overhead and take advantage of more efficient bulk processing techniques.

  14. Asynchronous Processing: Look into asynchronous processing where applicable to avoid blocking operations, particularly for I/O bound tasks.

evm_mine profile

@mattsse
Copy link
Member

mattsse commented Feb 8, 2024

thanks!

will investigate, but looks like stateroot

@mshakeg
Copy link
Author

mshakeg commented Feb 8, 2024

@mattsse thanks, might be a good idea to have flags that disable logic not really needed on a local node, similar to how the eth_sendUnsignedTransaction method can be used to send an unsigned transaction.

@zerosnacks zerosnacks added T-perf Type: performance C-anvil Command: anvil and removed T-feature Type: feature labels Jul 11, 2024
@zerosnacks zerosnacks changed the title Enhance Block Mining Performance in Anvil Node for High Throughput and Efficiency perf(anvil): enhance block mining performance in Anvil node for high throughput and efficiency Jul 11, 2024
@zerosnacks zerosnacks changed the title perf(anvil): enhance block mining performance in Anvil node for high throughput and efficiency perf(anvil): enhance block mining performance in Anvil node for high throughput and efficiency Jul 11, 2024
@zerosnacks
Copy link
Member

zerosnacks commented Jul 11, 2024

Relevant conversation in #7546: #7546 (comment)

@zerosnacks zerosnacks added this to the v1.0.0 milestone Jul 26, 2024
@jenpaff jenpaff removed this from the v1.0.0 milestone Sep 26, 2024
@grandizzy
Copy link
Collaborator

grandizzy commented Oct 19, 2024

@mshakeg I retried your test driver with latest anvil and got following results

  • with nullSwapsPerBlock=1
{
  blocksToMine: 10,
  nullSwapsPerBlock: 1,
  totalTxs: 20,
  executionTime: 0.024,
  averageTPS: 833.3333333333334,
  averageTimePerTx: 1.2
}
  • with nullSwapsPerBlock=2000 constantly getting values around
{
  blocksToMine: 10,
  nullSwapsPerBlock: 2000,
  totalTxs: 40000,
  executionTime: 5.987,
  averageTPS: 6681.142475363287,
  averageTimePerTx: 0.149675
}

best result in couple of tries (when reinstalled anvil)

{
  blocksToMine: 10,
  nullSwapsPerBlock: 2000,
  totalTxs: 40000,
  executionTime: 4.806,
  averageTPS: 8322.929671244277,
  averageTimePerTx: 0.12015
}

Note that I have to restart anvil between runs because of which locks test driver on subsequent createPool, ref ethers-io/ethers.js#4224

      const tx = await uniswapV3Factory.createPool(token0, token1, FeeAmount.LOW);
      const rc = await tx.wait(); <- locks here, even tx is mined and includded

Would this be a reasonable enhancement in scope of this ticket? Thank you

Cc @klkvr re #7546 comment

@grandizzy
Copy link
Collaborator

bump @mshakeg please check comment above. thanks!

@mshakeg
Copy link
Author

mshakeg commented Nov 17, 2024

@grandizzy are you referring to the locking issue? if so then I wouldn't say it's related to this issue, so maybe open a new issue if there isn't already one?

@grandizzy
Copy link
Collaborator

@mshakeg not to the locking issue ( that's some not in our control but ethersjs see also #7275 (comment)
#4399 (comment)
, most probably ethers-io/ethers.js#4224) but to the new numbers I posted above using your test driver (that looks beeter than original posted). Thank you

@mshakeg
Copy link
Author

mshakeg commented Nov 17, 2024

@grandizzy sure, though have you done some profiling to determine why the large TPS discrepancy? If much time is spent mining blocks(and computing merkl roots when mined for example) that could be disabled in a local node then agreed with adding an option to skip these computations for an even more performant local node.

@grandizzy
Copy link
Collaborator

grandizzy commented Nov 17, 2024

I just retested the original issue with latest anvil version and noticed different numbers, hence my question if still a problem and continue investigation or if it could be closed

@grandizzy
Copy link
Collaborator

@grandizzy sure, though have you done some profiling to determine why the large TPS discrepancy? If much time is spent mining blocks(and computing merkl roots when mined for example) that could be disabled in a local node then agreed with adding an option to skip these computations for an even more performant local node.

I think this explain it #5499 (comment)

@grandizzy
Copy link
Collaborator

going to merge this one with #5499 and to track potential hardhat behavior, @mshakeg please reopen if you think they should be addresses differently. thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-anvil Command: anvil T-perf Type: performance
Projects
Archived in project
Development

No branches or pull requests

5 participants