Trie: Rework Checkpointing Mechanism #1030

holgerd77 · 2021-01-04T10:15:46Z

Along the work on the VM execution in the client #1028 it became pretty clear that the current trie checkpointing mechanism is one of our main bottlenecks (if not: THE bottleneck) when removing the checkpointing in VM.runBlock() on zero transactions 6f64fa6 (so: where is no additional tx checkpointing -> no checkpointing applied at all) increased processing performance by a factor of 10-100 (and rather the upper bound, in fact processing log messages had to be batched in 50 block chunks (before: 1 log msg per block) in 45c9e9d and log output is still coming somewhat faster-paced than before.

Currently the trie checkpointing mechanism is copying the whole state db on checkpointing which is extremely resource intense and not sustainable. This PR will experiment with a more fine-grained approach by creating an operations stack which can be reverted on a trie.revert() and - simply - deleted on a trie.commit().

This first PR push which is just including the first commit 2926256 which removes all the ScratchDB related logic from CheckpointTrie (so basically: all the checkpointing functionality itself) is for a first test to see what kind of tests are failing within this constellation.

Interestingly enough ALL (!!) tests from checkpoint.spec.ts are still passing, lol. 😜 Couldn't believe this as well at first glance but double checked and really seems to be the case, even the one single revert related test is not triggering anything.

ARgh.

codecov · 2021-01-04T10:17:32Z

Codecov Report

Merging #1030 (fa36c17) into master (ec0f059) will decrease coverage by 0.01%.
The diff coverage is 95.00%.

Flag	Coverage Δ
block	`77.65% <ø> (ø)`
blockchain	`77.92% <ø> (ø)`
client	`88.40% <95.00%> (+0.17%)`	⬆️
common	`91.87% <ø> (-0.25%)`	⬇️
devp2p	`82.34% <ø> (-0.27%)`	⬇️
ethash	`82.08% <ø> (ø)`
tx	`86.25% <ø> (ø)`
vm	`83.05% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

holgerd77 · 2021-01-04T17:03:11Z

Woohoo, this is working and it was surprisingly easy, just one day of work. 😄

All tests are passing and this brings similar speed increases (10-50x) as seen in the 0-tx blocks processing.

This is a client run before (with 1 block at a time):

CP_before.mov

And this after (with 50 blocks at a time respectively per log msg):

CP_after.mov

holgerd77 · 2021-01-04T17:14:00Z

Depends on #1028

packages/vm/lib/state/stateManager.ts

packages/trie/test/db.spec.ts

jochem-brouwer · 2021-01-04T19:35:29Z

So a general question here - if you remove the checkpointing logic, then the checkpointing tests still pass? (I got the feeling I am missing something here). That implies that checkpointing is not tested thoroughly, right?

packages/trie/src/checkpointTrie.ts

packages/trie/src/db.ts

packages/trie/test/stream.spec.ts

holgerd77 · 2021-01-05T10:20:26Z

Rebased this.

holgerd77 · 2021-01-05T11:29:04Z

Have rebased this and done the fixes.

@jochem-brouwer that's correct, there was not a single checkpoint test which would fail when the functionality was removed, so these tests had not very much of an effect (apart from maybe testing that the added functionality is not introducing additional failures to the base functionality).

A bit strange, but as one can see, these things can also happen. 😛

Checkpointing is now better covered with the new DB-related checkpointing tests. To further increase trust in the mechanism I've now also expanded these tests to run in a Trie context with the last commit.

holgerd77 · 2021-01-05T22:33:11Z

@jochem-brouwer oh yeah, a cache here is a great idea. I had this in mind when we were brainstorming around performance improvement ideas, but funnily enough I didn't bring this together when working on this here. Feel free to rework or throw away everything I've done here (not in this PR though! 😋 ), I am not attached to this at all and would be glad if this PR would trigger further optimizations. The switch here is definitely a big improvement but on thinking about it an in-memory cache is definitely the more optimal solution. I think we should give this a really high priority, since these optimizations on the MPT are so much felt throughout the whole (minimally EthereumJS) ecosystem.

jochem-brouwer · 2021-01-05T22:40:12Z

Yep, the idea is to do this in a follow up PR. Besides a few questions I am fine with this one in general 😄

jochem-brouwer

After your comments, LGTM! I'll start with the cache design probably tomorrow.

holgerd77 · 2021-01-05T22:47:03Z

@jochem-brouwer thanks, I would want to merge #1028 before though 😄 .

holgerd77 · 2021-01-05T22:47:29Z

(this one is targeted towards that branch)

The base branch was changed.

…ferentiation towards stateDB

…eckpointTrie, basic follow-up VM test fixes

…to DB

… batch, added DB checkpointing tests

…g msg combining zero and non-zero tx blocks in client

…files and tests

…g to the DB checkpoint tests

…() test coverage

holgerd77 · 2021-01-07T16:07:07Z

Ah, the base branch changed here not on merge of #1028 but at the moment I deleted the branch over there, that's interesting.

Process has dismissed the reviews here though, so this would need a renewed approval. //cc @jochem-brouwer @ryanio or everyone else

jochem-brouwer · 2021-01-07T16:09:44Z

Your force push is due to the rebase (because of some changes of the VM full sync branch), right?

holgerd77 · 2021-01-07T16:22:58Z

@jochem-brouwer not 100% sure, after merging #1028 there was still an "Update branch" button here. So I rebased the branch locally towards master and then force-pushed to be sure that everything (hopefully) is correct.

holgerd77 · 2021-01-07T16:23:51Z

@jochem-brouwer scrolled through all the changes, looks everything correct though.

jochem-brouwer

Also looks correct to me!

holgerd77 added PR state: WIP type: refactor package: vm package: mpt labels Jan 4, 2021

holgerd77 force-pushed the more-performant-trie-checkpointing-mechanism branch from 13ae04e to e2c10ce Compare January 4, 2021 15:16

holgerd77 requested review from ryanio, jochem-brouwer and cgewecke January 4, 2021 17:13

holgerd77 added PR/Issue state: blocked PR state: needs review and removed PR state: WIP labels Jan 4, 2021