-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate the performance of the history replay tool #5707
Comments
@olonho do you want to investigate this yourself? |
I'm looking on that, but second pair of eyes wouldn't hurt. |
Curious stats from about a day of playback:
So we see that playback speed is not uniform and sometimes reaches 305 blocks per second. |
Pure neard top looks like this:
|
Kernel top looks like this
|
so seems |
Interesting, when using Dwarf based stack unwinder profile looks way saner:
With dominators like
|
|
@olonho so basically it is within rocksdb? |
Yes, seems so |
Further investigations show pretty mediocre IO performance on the system.
so we generally get 12M/sec disk drive performance. Compared to https://www.anandtech.com/show/7173/samsung-ssd-840-evo-review-120gb-250gb-500gb-750gb-1tb-models-tested/8 it hints that we get performance 3x worse than slowest of SSD drives in this list. Just for reference compared that to what Samsung T5 external SSD in my box can give and get results around 150M/sec (queue depth 64, using AmorphousDiskMark), or 15-20M/sec with queue depth 1. Queue depth of 8 shown 118M/sec. Internal SSD shows beefy 331M/sec. |
https://cloud.google.com/compute/docs/disks/performance suggests that for GCP instance expected read throughput per instance is between 200MByte/sec and 9GByte/sec. |
And some nodes (archive backup) shows actual peak read performance around 500Mb/sec per gcloud console. |
Interesting. @olonho do you think the issue is specific to the instance you use to run the history playback tool? Is there some disk IO benchmark we can run to see whether the instance itself is problematic? |
Moving to @marcelo-gonzalez for the OKR ownership transfer |
I don't think this is in the OKR? |
According to #5697 (comment), we can only play back 10 blocks per second even with multiple threads. This does not align with our expectations and it blocks the contract runtime team from validating the wasmer2 and wasmer0 give the exact same results on the entire mainnet history, which would allow us to completely remove wasmer0 support.
@olonho did some initial investigation and found that we spend a suspicious amount of time in kernel (see the profile below) 
The text was updated successfully, but these errors were encountered: