Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow execution (in general) #2076

Open
namiloh opened this issue Mar 15, 2024 · 8 comments
Open

Slow execution (in general) #2076

namiloh opened this issue Mar 15, 2024 · 8 comments
Labels

Comments

@namiloh
Copy link

namiloh commented Mar 15, 2024

Earlier, nimbus-eth1 was one of the fastest evms, but in the last months, it has been by far the slowest.
Here are some stats from a 90 hours of fuzzing

INFO [03-15|07:40:39.579] Executing                                tests=5,968,164 time=91h0m40.009s     test/s=18.2 "avg steps"=3098.1  global=5,968,164
INFO [03-15|07:40:39.579] Stats gethbatch-0                        execSpeed=121.2ms longest=40.549808945s    count=1,985,566
INFO [03-15|07:40:39.579] Stats nethbatch-0                        execSpeed=94.7ms  longest=42.301876462s    count=2,275,150
INFO [03-15|07:40:39.579] Stats besubatch-0                        execSpeed=148.2ms longest=38.015906585s    count=2,173,890
INFO [03-15|07:40:39.579] Stats erigonbatch-0                      execSpeed=181ms   longest=56.35799268s     count=1,624,566
INFO [03-15|07:40:39.579] Stats nimbus-0                           execSpeed=298ms   longest=21m44.533083944s count=600,659
INFO [03-15|07:40:39.579] Stats evmone-0                           execSpeed=126.8ms longest=1m2.163421388s   count=2,124,651
INFO [03-15|07:40:39.579] Stats revm-0                             execSpeed=232.3ms longest=1m32.551078339s  count=1,151,849

All other clients performed over 1M tests, nimbus-eth1 only 600K. The slowest of the other clients was nearly double as fast as nimbus-eth1.

I suspect that some regression has been introduced (possibly some kzg initialization?) which adds overhead on startup.

@jangko
Copy link
Contributor

jangko commented Mar 22, 2024

how it s fuzzed? can it be profiled?

@holiman
Copy link

holiman commented Mar 22, 2024

$ docker run -it -v /home/user/workspace/goevmlab/evms/testdata/cases:/testdata holiman/omnifuzz

Running one of the sample tests, 1000 times:

root@37b7711197ca:/# time for i in {1..1000}; do /nimbvm --json --nomemory --noreturndata --nostorage /testdata/00000936-mixed-1.json >/dev/null 2>&1 ; done

real	0m18.159s
user	0m14.998s
sys	0m2.918s

However, if I take that test, and change Shanghai to Cancun, then :

root@37b7711197ca:/# time for i in {1..1000}; do /nimbvm --json --nomemory --noreturndata --nostorage /testdata/00000936-mixed-1.json.cancun >/dev/null 2>&1 ; done

real	4m1.767s
user	3m53.658s
sys	0m4.958s

What geth, besu, nethermind and now eels have implemented, is a batch-mode. Where client client is fed paths via stdin and executes the test back to back.

To demonstrate what I mean: if you had that, I could rewrite the for-loop above as:

yes /testdata/00000936-mixed-1.json | head -n 1000 | /nimbvm --json --nomemory --noreturndata --nostorage >/dev/null 2>&1

It was initially implemented to get around virtual-machine boostrap times (nethermind / besu), but with kzg init, I added it to geth too.

@holiman
Copy link

holiman commented Mar 22, 2024

But there's something more too, I think. Because even if I don't use the batch-mode in geth, it's still a lot faster:

root@37b7711197ca:/# time for i in {1..1000}; do /gethvm --json --nomemory --noreturndata statetest /testdata/00000936-mixed-1.json.cancun  >/dev/null 2>&1 ; done

real	1m8.533s
user	0m54.986s
sys	0m21.417s

Of course, batch-mode is faster still:

root@37b7711197ca:/# time yes /testdata/00000936-mixed-1.json.cancun | head -n1000 | /gethvm --json --nomemory --noreturndata statetest  >/dev/null 2>&1

real	0m1.109s
user	0m1.264s
sys	0m0.101s

@jangko
Copy link
Contributor

jangko commented Mar 22, 2024

Can I have both the Shanghai and Cancun test file? I want to try to see what kind of regression and how to improve it.

@holiman
Copy link

holiman commented Mar 22, 2024

It's in here: https://github.com/holiman/goevmlab/tree/master/evms/testdata/cases .
And the cancun one:

 diff 00000936-mixed-1.json 00000936-mixed-1.json.cancun
53c53
<       "Shanghai": [
---
>       "Cancun": [

@jangko jangko added the EL label Jul 6, 2024
@holiman
Copy link

holiman commented Nov 20, 2024

This still very much happening, btw. Here's a stacktrace from when I ctrl-c:ed an execution,

  | Traceback (most recent call last, using override)
  | /go/nimbus-eth1/nimbus/evm/state.nim(575) main
  | /go/nimbus-eth1/nimbus/evm/state.nim(563) NimMain
  | /go/nimbus-eth1/tools/evmstate/evmstate.nim(267) _ZN8evmstate4mainE
  | /go/nimbus-eth1/tools/evmstate/evmstate.nim(215) _ZN8evmstate13prepareAndRunE3varIN8evmstate12StateContextEEN6config9StateConfE
  | /go/nimbus-eth1/tools/evmstate/evmstate.nim(120) _ZN8evmstate12runExecutionE3varIN8evmstate12StateContextEEN6config9StateConfE3refIN4json11JsonNodeObjEE
  | /go/nimbus-eth1/nimbus/core/eip4844.nim(218) _ZN7eip484419loadKzgTrustedSetupE
  | /go/nimbus-eth1/vendor/nim-kzg4844/kzg4844/csources/bindings/nim/kzg.nim(160) _ZN3kzg26loadTrustedSetupFromStringE6string25range09223372036854775807.constprop.0
  | /go/nimbus-eth1/vendor/nim-kzg4844/kzg4844/csources/bindings/nim/kzg.nim(101) _ZN3kzg16loadTrustedSetupE9openArrayI5uInt8E9openArrayI5uInt8E9openArrayI5uInt8E25range09223372036854775807
  | /go/nimbus-eth1/vendor/nim-kzg4844/kzg4844/csources/src/setup/setup.c(433) load_trusted_setup
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/src/e1.c(294) blst_p1_uncompress
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/src/e1.c(279) POINTonE1_Uncompress_Z
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/src/e1.c(253) POINTonE1_Uncompress_BE
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/src/sqrt.c(79) sqrt_fp
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/src/sqrt.c(50) recip_sqrt_fp_3mod4
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/src/sqrt.c(30) sqr_n_mul_fp
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/build/elf/mulx_mont_384-x86_64.s(2623) sqrx_n_mul_mont_383
  | /go/nimbus-eth1/vendor/nim-blscurve/vendor/blst/build/elf/mulx_mont_384-x86_64.s(2800) 
  | /go/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(648) signalHandler
  | /go/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(330) _ZN6system18rawWriteStackTraceE3varI6stringE
  | /go/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(59) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI6stringE

So, most definitely the loading of the trusted setup.

@jangko
Copy link
Contributor

jangko commented Nov 26, 2024

Ok. now I know what is the problem. Turn out not only the trusted setup loading contribute to the slowness but the culprit is the json tracer. If the json tracer is disabled, nimbus evm run twice as fast as geth evm without tracer.

And geth evm appear faster for non batch mode because geth lazily load the trusted setup. test vector 00000936-mixed-1.json.cancun never trigger kzg4844 functions call, therefore the trusted setup also never loaded.

Another area of optimization is replacing stdlib/json with something faster for both reader and writer.

@holiman
Copy link

holiman commented Nov 26, 2024

test vector 00000936-mixed-1.json.cancun never trigger kzg4844 functions call, therefore the trusted setup also never loaded.

Hm, not sure what you mean. As I reported here, by changing Shanghai to Cancun, the time bumped from 18 seconds to taking 4 minutes instead.

EDIT: Ah, you were talking about geth. Ok, nevermind!

@jangko jangko added the tools Issues related to tools label Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants