Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie: Rework Checkpointing Mechanism #1030

Merged
merged 10 commits into from
Jan 7, 2021

Conversation

holgerd77
Copy link
Member

@holgerd77 holgerd77 commented Jan 4, 2021

Along the work on the VM execution in the client #1028 it became pretty clear that the current trie checkpointing mechanism is one of our main bottlenecks (if not: THE bottleneck) when removing the checkpointing in VM.runBlock() on zero transactions 6f64fa6 (so: where is no additional tx checkpointing -> no checkpointing applied at all) increased processing performance by a factor of 10-100 (and rather the upper bound, in fact processing log messages had to be batched in 50 block chunks (before: 1 log msg per block) in 45c9e9d and log output is still coming somewhat faster-paced than before.

Currently the trie checkpointing mechanism is copying the whole state db on checkpointing which is extremely resource intense and not sustainable. This PR will experiment with a more fine-grained approach by creating an operations stack which can be reverted on a trie.revert() and - simply - deleted on a trie.commit().

This first PR push which is just including the first commit 2926256 which removes all the ScratchDB related logic from CheckpointTrie (so basically: all the checkpointing functionality itself) is for a first test to see what kind of tests are failing within this constellation.

Interestingly enough ALL (!!) tests from checkpoint.spec.ts are still passing, lol. 😜 Couldn't believe this as well at first glance but double checked and really seems to be the case, even the one single revert related test is not triggering anything.

ARgh.

@codecov
Copy link

codecov bot commented Jan 4, 2021

Codecov Report

Merging #1030 (fa36c17) into master (ec0f059) will decrease coverage by 0.01%.
The diff coverage is 95.00%.

Impacted file tree graph

Flag Coverage Δ
block 77.65% <ø> (ø)
blockchain 77.92% <ø> (ø)
client 88.40% <95.00%> (+0.17%) ⬆️
common 91.87% <ø> (-0.25%) ⬇️
devp2p 82.34% <ø> (-0.27%) ⬇️
ethash 82.08% <ø> (ø)
tx 86.25% <ø> (ø)
vm 83.05% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from 13ae04e to e2c10ce Compare January 4, 2021 15:16
@holgerd77
Copy link
Member Author

Woohoo, this is working and it was surprisingly easy, just one day of work. 😄

All tests are passing and this brings similar speed increases (10-50x) as seen in the 0-tx blocks processing.

This is a client run before (with 1 block at a time):

CP_before.mov

And this after (with 50 blocks at a time respectively per log msg):

CP_after.mov

@holgerd77
Copy link
Member Author

Depends on #1028

@jochem-brouwer
Copy link
Member

So a general question here - if you remove the checkpointing logic, then the checkpointing tests still pass? (I got the feeling I am missing something here). That implies that checkpointing is not tested thoroughly, right?

packages/trie/src/db.ts Outdated Show resolved Hide resolved
@holgerd77 holgerd77 force-pushed the client-add-vm-execution-rebase branch from 8f506ae to fb2826d Compare January 5, 2021 10:18
@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from 2373dc2 to b6d8e40 Compare January 5, 2021 10:19
@holgerd77
Copy link
Member Author

Rebased this.

@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from b6d8e40 to e3b4c45 Compare January 5, 2021 11:25
@holgerd77
Copy link
Member Author

Have rebased this and done the fixes.

@jochem-brouwer that's correct, there was not a single checkpoint test which would fail when the functionality was removed, so these tests had not very much of an effect (apart from maybe testing that the added functionality is not introducing additional failures to the base functionality).

A bit strange, but as one can see, these things can also happen. 😛

Checkpointing is now better covered with the new DB-related checkpointing tests. To further increase trust in the mechanism I've now also expanded these tests to run in a Trie context with the last commit.

@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from e3b4c45 to 078c005 Compare January 5, 2021 12:34
@holgerd77 holgerd77 force-pushed the client-add-vm-execution-rebase branch from a8d7cca to 1d4ca4f Compare January 5, 2021 14:25
@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from 078c005 to 83b2fcc Compare January 5, 2021 14:44
@holgerd77
Copy link
Member Author

@jochem-brouwer oh yeah, a cache here is a great idea. I had this in mind when we were brainstorming around performance improvement ideas, but funnily enough I didn't bring this together when working on this here. Feel free to rework or throw away everything I've done here (not in this PR though! 😋 ), I am not attached to this at all and would be glad if this PR would trigger further optimizations. The switch here is definitely a big improvement but on thinking about it an in-memory cache is definitely the more optimal solution. I think we should give this a really high priority, since these optimizations on the MPT are so much felt throughout the whole (minimally EthereumJS) ecosystem.

@jochem-brouwer
Copy link
Member

Yep, the idea is to do this in a follow up PR. Besides a few questions I am fine with this one in general 😄

jochem-brouwer
jochem-brouwer previously approved these changes Jan 5, 2021
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your comments, LGTM! I'll start with the cache design probably tomorrow.

@holgerd77
Copy link
Member Author

@jochem-brouwer thanks, I would want to merge #1028 before though 😄 .

@holgerd77
Copy link
Member Author

(this one is targeted towards that branch)

@holgerd77 holgerd77 force-pushed the client-add-vm-execution-rebase branch from 1d4ca4f to dc2c99e Compare January 6, 2021 17:56
@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from 83b2fcc to 0517e40 Compare January 6, 2021 18:01
Base automatically changed from client-add-vm-execution-rebase to master January 7, 2021 16:02
@holgerd77 holgerd77 dismissed stale reviews from jochem-brouwer and ryanio January 7, 2021 16:02

The base branch was changed.

@holgerd77 holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from 0517e40 to fa36c17 Compare January 7, 2021 16:03
@holgerd77
Copy link
Member Author

Ah, the base branch changed here not on merge of #1028 but at the moment I deleted the branch over there, that's interesting.

Process has dismissed the reviews here though, so this would need a renewed approval. //cc @jochem-brouwer @ryanio or everyone else

@jochem-brouwer
Copy link
Member

Your force push is due to the rebase (because of some changes of the VM full sync branch), right?

@holgerd77
Copy link
Member Author

@jochem-brouwer not 100% sure, after merging #1028 there was still an "Update branch" button here. So I rebased the branch locally towards master and then force-pushed to be sure that everything (hopefully) is correct.

@holgerd77
Copy link
Member Author

@jochem-brouwer scrolled through all the changes, looks everything correct though.

Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks correct to me!

@holgerd77 holgerd77 merged commit b1bcb03 into master Jan 7, 2021
@holgerd77 holgerd77 deleted the more-performant-trie-checkpointing-mechanism branch January 7, 2021 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants