Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf/NoWAL during OldBodies #6227

Merged
merged 9 commits into from
Oct 28, 2023
Merged

Perf/NoWAL during OldBodies #6227

merged 9 commits into from
Oct 28, 2023

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Oct 26, 2023

  • Most of the write IO during old bodies are actually due to WAL (Write Ahead Log) file, which is the mechanism for which rocksdb recover on unclean shutdown.
  • This PR disable WAL writes for bodies writes from OldBodies.
  • For recovery, the LowestInsertedBlockNumber is not updated on all request, instead it is updated every 100000 blocks, at which point an explicit flush is also triggered to make sure the memtables are written at that point.
  • This reduces total writes during OldBodies by about 60-70% or so. This is more than 50% because the WAL file are not compressed and the bodies are compressed to about 60-70% on flush.
  • No change in bodies sync time. Unless your SSD can't sustain about 350MB/s of writes before and you have really fast internet.
  • Graph is after, before, after, before.
    Screenshot_2023-10-26_14-07-00

Changes

  • Add WriteFlags to PutSpan.
  • Add WriteFlags to BlockTree.Insert
  • Set nowal during old bodies.

Types of changes

What types of changes does your code introduce?

  • Optimization

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

Notes on testing

  • Manually kill -s KILL nethermind a couple of time during sync. Then run a custom python script to verify all expected blocks is present. Verified, it resumed slightly before the point it was killed

Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 100k? What would happen if we increase or decrease this number?
What is the current bottleneck? CPU? Network?

@asdacap
Copy link
Contributor Author

asdacap commented Oct 26, 2023

The 100k is somewhat arbitrary. The only consideration is to not split the write buffer too much as it will make file size small and increase number of file. Each write buffer size is 256MB during blobfile tune, if I'm not mistaken, so thats probably 2.5k blocks. So, if we set that to 5k, probably half of the write buffer got split, which we don't want.

I'm not sure what is the current limit. I'm guessing its CPU, but its not using 100% CPU. If its network, I've set up 4 geth node running locally as static peer, so thats probably not it. It know that per-connection, there is a single thread limit due to decoding devp2p. I already set network processing thread count to 32, thats not it too.

@asdacap asdacap force-pushed the perf/oldbodies-with-nowal branch from e8c7704 to bb8b894 Compare October 28, 2023 03:16
@asdacap asdacap merged commit 9e3aa25 into master Oct 28, 2023
@asdacap asdacap deleted the perf/oldbodies-with-nowal branch October 28, 2023 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants