Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/state/snapshot: write snapshot generator in batch #22163

Merged
merged 4 commits into from
Jan 18, 2021

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Jan 13, 2021

This PR fixes a flaw in the snapshot.

Scenario Description

0. Have a good snapshot
1. Start geth without --snapshot, import a few blocks
2. Start geth with --snapshot, it starts deleting it
3. Crash
4. Restart Geth

WARN [01-12|20:47:02.323] Loaded snapshot journal                  diskroot="fc6ca8…0440f3" diffs=unmatched
[...]
BAD BLOCK

Analysis

In the snapshot, we have three different components:

  • snapshot root: indicates the state hash of the disk layer
  • snapshot generator: indicates the status of the snapshot disk layer(generating, wiping, done)
  • snapshot diff layer journal

In step 0, we have a complete snapshot, so the snapshot generator is marked as "DONE"
In step 1, we introduce the gap between the snapshot and chain, the snapshot becomes useless
In step 2, we try to wipe the entire stale snapshot

The critical code of step 2

    // Wipe any previously existing snapshot from the database if no wiper is
    // currently in progress.
    if wiper == nil {
        wiper = wipeSnapshot(diskdb, true)
    }
    // Create a new disk layer with an initialized state marker at zero
    rawdb.WriteSnapshotRoot(diskdb, root)

    base := &diskLayer{
        diskdb:     diskdb,
        triedb:     triedb,
        root:       root,
        cache:      fastcache.New(cache * 1024 * 1024),
        genMarker:  []byte{}, // Initialized but empty!
        genPending: make(chan struct{}),
        genAbort:   make(chan chan *generatorStats),
    }
    go base.generate(&generatorStats{wiping: wiper, start: time.Now()})
    log.Debug("Start snapshot generation", "root", root)
    return base

Here the NEW ROOT is written as the snapshot root maker without updating the snapshot generator.

What's more, during this long wiping procedure, maybe there is no new diff layer created, which means the snapshot generator is not updated. So the generator is still the old one, which is marked as DONE.

Crash happens

Then restart, we have an invalid snapshot, with the snapshot root aligned with the chain but the status of snapshot generator
is DONE.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and looks like it's backed by quite rigorous testing aswell 👍

@holiman holiman merged commit 5e9f5ca into ethereum:master Jan 18, 2021
@holiman holiman added this to the 1.10.0 milestone Jan 18, 2021
bulgakovk pushed a commit to bulgakovk/go-ethereum that referenced this pull request Jan 26, 2021
* core/state/snapshot: write snapshot generator in batch

* core: refactor the tests

* core: update tests

* core: update tests
tony-ricciardi pushed a commit to tony-ricciardi/go-ethereum that referenced this pull request Jan 20, 2022
Cherry pick bug fixes from upstream for snapshots, which will enable higher transaction throughput. It also enables snapshots by default (which is one of the commits pulled from upstream).

Upstream commits included:

68754f3 cmd/utils: grant snapshot cache to trie if disabled (ethereum#21416)
3ee91b9 core/state/snapshot: reduce disk layer depth during generation
a15d71a core/state/snapshot: stop generator if it hits missing trie nodes (ethereum#21649)
43c278c core/state: disable snapshot iteration if it's not fully constructed (ethereum#21682)
b63e3c3 core: improve snapshot journal recovery (ethereum#21594)
e640267 core/state/snapshot: fix journal recovery from generating old journal (ethereum#21775)
7b7b327 core/state/snapshot: update generator marker in sync with flushes
167ff56 core/state/snapshot: gethring -> gathering typo (ethereum#22104)
d2e1b17 snapshot, trie: fixed typos, mostly in snapshot pkg (ethereum#22133)
c4deebb core/state/snapshot: add generation logs to storage too
5e9f5ca core/state/snapshot: write snapshot generator in batch (ethereum#22163)
18145ad core/state: maintain one more diff layer (ethereum#21730)
04a7226 snapshot: merge loops for better performance (ethereum#22160)
994cdc6 cmd/utils: enable snapshots by default
9ec3329 core/state/snapshot: ensure Cap retains a min number of layers
52e5c38 core/state: copy the snap when copying the state (ethereum#22340)
a31f6d5 core/state/snapshot: fix panic on missing parent
61ff3e8 core/state/snapshot, ethdb: track deletions more accurately (ethereum#22582)
c79fc20 core/state/snapshot: fix data race in diff layer (ethereum#22540)

Other changes
Commit f9b5530 (not from upstream) fixes an incorrect default DatabaseCache value due to an earlier bad merge.

Tested
Automated tests
Testing on a private testnet
Backwards compatibility
Enabling snapshots by default is a breaking change in terms of the CLI flags, but will not cause backwards incompatibility between the node and other nodes.

Co-authored-by: Péter Szilágyi <[email protected]>
Co-authored-by: gary rong <[email protected]>
Co-authored-by: Melvin Junhee Woo <[email protected]>
Co-authored-by: Martin Holst Swende <[email protected]>
Co-authored-by: Edgar Aroutiounian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants