-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core, eth/protocols/snap, trie: implement gentrie #29313
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
c17ee5d
core, eth/protocols/snap, trie: implement gentrie
rjl493456442 0863aea
eth/protocols/snap: add test
rjl493456442 0befeaa
eth/protocols/snap, trie: improve comments
rjl493456442 ebd5fc7
eth/protocols/snap: improve code comment
rjl493456442 490a664
eth, trie: improve code comment
rjl493456442 e24280f
eth/protocols/snap: improve comment
rjl493456442 e635c8c
eth/protocols/snap: improve tests
rjl493456442 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,287 @@ | ||
// Copyright 2024 The go-ethereum Authors | ||
// This file is part of the go-ethereum library. | ||
// | ||
// The go-ethereum library is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU Lesser General Public License as published by | ||
// the Free Software Foundation, either version 3 of the License, or | ||
// (at your option) any later version. | ||
// | ||
// The go-ethereum library is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU Lesser General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU Lesser General Public License | ||
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
package snap | ||
|
||
import ( | ||
"bytes" | ||
|
||
"github.com/ethereum/go-ethereum/common" | ||
"github.com/ethereum/go-ethereum/core/rawdb" | ||
"github.com/ethereum/go-ethereum/ethdb" | ||
"github.com/ethereum/go-ethereum/trie" | ||
) | ||
|
||
// genTrie interface is used by the snap syncer to generate merkle tree nodes | ||
// based on a received batch of states. | ||
type genTrie interface { | ||
// update inserts the state item into generator trie. | ||
update(key, value []byte) error | ||
|
||
// commit flushes the right boundary nodes if complete flag is true. This | ||
// function must be called before flushing the associated database batch. | ||
commit(complete bool) common.Hash | ||
} | ||
|
||
// pathTrie is a wrapper over the stackTrie, incorporating numerous additional | ||
// logics to handle the semi-completed trie and potential leftover dangling | ||
// nodes in the database. It is utilized for constructing the merkle tree nodes | ||
// in path mode during the snap sync process. | ||
type pathTrie struct { | ||
owner common.Hash // identifier of trie owner, empty for account trie | ||
tr *trie.StackTrie // underlying raw stack trie | ||
first []byte // the path of first committed node by stackTrie | ||
last []byte // the path of last committed node by stackTrie | ||
|
||
// This flag indicates whether nodes on the left boundary are skipped for | ||
// committing. If set, the left boundary nodes are considered incomplete | ||
// due to potentially missing left children. | ||
skipLeftBoundary bool | ||
db ethdb.KeyValueReader | ||
batch ethdb.Batch | ||
} | ||
|
||
// newPathTrie initializes the path trie. | ||
func newPathTrie(owner common.Hash, skipLeftBoundary bool, db ethdb.KeyValueReader, batch ethdb.Batch) *pathTrie { | ||
tr := &pathTrie{ | ||
owner: owner, | ||
skipLeftBoundary: skipLeftBoundary, | ||
db: db, | ||
batch: batch, | ||
} | ||
tr.tr = trie.NewStackTrie(tr.onTrieNode) | ||
return tr | ||
} | ||
|
||
// onTrieNode is invoked whenever a new node is committed by the stackTrie. | ||
// | ||
// As the committed nodes might be incomplete if they are on the boundaries | ||
// (left or right), this function has the ability to detect the incomplete | ||
// ones and filter them out for committing. | ||
// | ||
// Additionally, the assumption is made that there may exist leftover dangling | ||
// nodes in the database. This function has the ability to detect the dangling | ||
// nodes that fall within the path space of committed nodes (specifically on | ||
// the path covered by internal extension nodes) and remove them from the | ||
// database. This property ensures that the entire path space is uniquely | ||
// occupied by committed nodes. | ||
// | ||
// Furthermore, all leftover dangling nodes along the path from committed nodes | ||
// to the trie root (left and right boundaries) should be removed as well; | ||
// otherwise, they might potentially disrupt the state healing process. | ||
func (t *pathTrie) onTrieNode(path []byte, hash common.Hash, blob []byte) { | ||
// Filter out the nodes on the left boundary if skipLeftBoundary is | ||
// configured. Nodes are considered to be on the left boundary if | ||
// it's the first one to be committed, or the parent/ancestor of the | ||
// first committed node. | ||
if t.skipLeftBoundary && (t.first == nil || bytes.HasPrefix(t.first, path)) { | ||
if t.first == nil { | ||
// Memorize the path of first committed node, which is regarded | ||
// as left boundary. Deep-copy is necessary as the path given | ||
// is volatile. | ||
t.first = append([]byte{}, path...) | ||
|
||
// The left boundary can be uniquely determined by the first committed node | ||
// from stackTrie (e.g., N_1), as the shared path prefix between the first | ||
// two inserted state items is deterministic (the path of N_3). The path | ||
// from trie root towards the first committed node is considered the left | ||
// boundary. The potential leftover dangling nodes on left boundary should | ||
// be cleaned out. | ||
// | ||
// +-----+ | ||
// | N_3 | shared path prefix of state_1 and state_2 | ||
// +-----+ | ||
// /- -\ | ||
// +-----+ +-----+ | ||
// First committed node | N_1 | | N_2 | latest inserted node (contain state_2) | ||
// +-----+ +-----+ | ||
// | ||
// The node with the path of the first committed one (e.g, N_1) is not | ||
// removed because it's a sibling of the nodes we want to commit, not | ||
// the parent or ancestor. | ||
for i := 0; i < len(path); i++ { | ||
t.delete(path[:i], false) | ||
} | ||
} | ||
return | ||
} | ||
// If boundary filtering is not configured, or the node is not on the left | ||
// boundary, commit it to database. | ||
// | ||
// Note: If the current committed node is an extension node, then the nodes | ||
// falling within the path between itself and its standalone (not embedded | ||
// in parent) child should be cleaned out for exclusively occupy the inner | ||
// path. | ||
// | ||
// This is essential in snap sync to avoid leaving dangling nodes within | ||
// this range covered by extension node which could potentially break the | ||
// state healing. | ||
// | ||
// The extension node is detected if its path is the prefix of last committed | ||
// one and path gap is larger than one. If the path gap is only one byte, | ||
// the current node could either be a full node, or a extension with single | ||
// byte key. In either case, no gaps will be left in the path. | ||
if t.last != nil && bytes.HasPrefix(t.last, path) && len(t.last)-len(path) > 1 { | ||
for i := len(path) + 1; i < len(t.last); i++ { | ||
t.delete(t.last[:i], true) | ||
} | ||
} | ||
t.write(path, blob) | ||
|
||
// Update the last flag. Deep-copy is necessary as the provided path is volatile. | ||
if t.last == nil { | ||
t.last = append([]byte{}, path...) | ||
} else { | ||
t.last = append(t.last[:0], path...) | ||
} | ||
} | ||
|
||
// write commits the node write to provided database batch in path mode. | ||
func (t *pathTrie) write(path []byte, blob []byte) { | ||
if t.owner == (common.Hash{}) { | ||
rawdb.WriteAccountTrieNode(t.batch, path, blob) | ||
} else { | ||
rawdb.WriteStorageTrieNode(t.batch, t.owner, path, blob) | ||
} | ||
} | ||
|
||
func (t *pathTrie) deleteAccountNode(path []byte, inner bool) { | ||
if inner { | ||
accountInnerLookupGauge.Inc(1) | ||
} else { | ||
accountOuterLookupGauge.Inc(1) | ||
} | ||
if !rawdb.ExistsAccountTrieNode(t.db, path) { | ||
return | ||
} | ||
if inner { | ||
accountInnerDeleteGauge.Inc(1) | ||
} else { | ||
accountOuterDeleteGauge.Inc(1) | ||
} | ||
rawdb.DeleteAccountTrieNode(t.batch, path) | ||
} | ||
|
||
func (t *pathTrie) deleteStorageNode(path []byte, inner bool) { | ||
if inner { | ||
storageInnerLookupGauge.Inc(1) | ||
} else { | ||
storageOuterLookupGauge.Inc(1) | ||
} | ||
if !rawdb.ExistsStorageTrieNode(t.db, t.owner, path) { | ||
return | ||
} | ||
if inner { | ||
storageInnerDeleteGauge.Inc(1) | ||
} else { | ||
storageOuterDeleteGauge.Inc(1) | ||
} | ||
rawdb.DeleteStorageTrieNode(t.batch, t.owner, path) | ||
} | ||
|
||
// delete commits the node deletion to provided database batch in path mode. | ||
func (t *pathTrie) delete(path []byte, inner bool) { | ||
if t.owner == (common.Hash{}) { | ||
t.deleteAccountNode(path, inner) | ||
} else { | ||
t.deleteStorageNode(path, inner) | ||
} | ||
} | ||
|
||
// update implements genTrie interface, inserting a (key, value) pair into the | ||
// stack trie. | ||
func (t *pathTrie) update(key, value []byte) error { | ||
return t.tr.Update(key, value) | ||
} | ||
|
||
// commit implements genTrie interface, flushing the right boundary if it's | ||
// considered as complete. Otherwise, the nodes on the right boundary are | ||
// discarded and cleaned up. | ||
// | ||
// Note, this function must be called before flushing database batch, otherwise, | ||
// dangling nodes might be left in database. | ||
func (t *pathTrie) commit(complete bool) common.Hash { | ||
// If the right boundary is claimed as complete, flush them out. | ||
// The nodes on both left and right boundary will still be filtered | ||
// out if left boundary filtering is configured. | ||
if complete { | ||
// Commit all inserted but not yet committed nodes(on the right | ||
// boundary) in the stackTrie. | ||
hash := t.tr.Hash() | ||
if t.skipLeftBoundary { | ||
return common.Hash{} // hash is meaningless if left side is incomplete | ||
} | ||
return hash | ||
} | ||
// Discard nodes on the right boundary as it's claimed as incomplete. These | ||
// nodes might be incomplete due to missing children on the right side. | ||
// Furthermore, the potential leftover nodes on right boundary should also | ||
// be cleaned out. | ||
// | ||
// The right boundary can be uniquely determined by the last committed node | ||
// from stackTrie (e.g., N_1), as the shared path prefix between the last | ||
// two inserted state items is deterministic (the path of N_3). The path | ||
// from trie root towards the last committed node is considered the right | ||
// boundary (root to N_3). | ||
// | ||
// +-----+ | ||
// | N_3 | shared path prefix of last two states | ||
// +-----+ | ||
// /- -\ | ||
// +-----+ +-----+ | ||
// Last committed node | N_1 | | N_2 | latest inserted node (contain last state) | ||
// +-----+ +-----+ | ||
// | ||
// Another interesting scenario occurs when the trie is committed due to | ||
// too many items being accumulated in the batch. To flush them out to | ||
// the database, the path of the last inserted node (N_2) is temporarily | ||
// treated as an incomplete right boundary, and nodes on this path are | ||
// removed (e.g. from root to N_3). | ||
// However, this path will be reclaimed as an internal path by inserting | ||
// more items after the batch flush. New nodes on this path can be committed | ||
// with no issues as they are actually complete. Also, from a database | ||
// perspective, first deleting and then rewriting is a valid data update. | ||
for i := 0; i < len(t.last); i++ { | ||
t.delete(t.last[:i], false) | ||
} | ||
return common.Hash{} // the hash is meaningless for incomplete commit | ||
} | ||
|
||
// hashTrie is a wrapper over the stackTrie for implementing genTrie interface. | ||
type hashTrie struct { | ||
tr *trie.StackTrie | ||
} | ||
|
||
// newHashTrie initializes the hash trie. | ||
func newHashTrie(batch ethdb.Batch) *hashTrie { | ||
return &hashTrie{tr: trie.NewStackTrie(func(path []byte, hash common.Hash, blob []byte) { | ||
rawdb.WriteLegacyTrieNode(batch, hash, blob) | ||
})} | ||
} | ||
|
||
// update implements genTrie interface, inserting a (key, value) pair into | ||
// the stack trie. | ||
func (t *hashTrie) update(key, value []byte) error { | ||
return t.tr.Update(key, value) | ||
} | ||
|
||
// commit implements genTrie interface, committing the nodes on right boundary. | ||
func (t *hashTrie) commit(complete bool) common.Hash { | ||
if !complete { | ||
return common.Hash{} // the hash is meaningless for incomplete commit | ||
} | ||
return t.tr.Hash() // return hash only if it's claimed as complete | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the
last
wasIs this
last
, here, using the same semantics, or is there some difference?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use "last committed", not "last inserted". I find that bit confusing, because the actual last element will never be committed by the stacktrie, unless an explicit commit is triggered.
Hence, it's better to track the 'last inserted', since the 'last committed' will for the most part be just a regular node in the middle of the range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't get it. Won't every single value at some point be
last
, and thus the filtering will prevent every single parent-node from being written??There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the
last
was the key of last inserted entry;In this pr,
last
is the key of last committed entry, namely it's not on the right boundary;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right boundary filtering logic is changed a bit.
Previously, we invoke
Commit
function anyway regardless it's right complete or not, therefore the nodes on the path of "last inserted" node are filtered out;In this pull request, we won't invoke
Commit
function if it's regarded as right-incomplete. The last inserted node along with its parent/ancestor on the right boundary are dropped without commit.