Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie - async walk generator #2904

Merged
merged 11 commits into from
Jul 26, 2023
Merged

Trie - async walk generator #2904

merged 11 commits into from
Jul 26, 2023

Conversation

ScottyPoi
Copy link
Contributor

@ScottyPoi ScottyPoi commented Jul 20, 2023

Trie -- walkTrieIterable

QuickStart:

from: packages/trie
run: npx vitest test/util/asyncWalk.spec.ts

observe console logs to see details of walk.
test includes walk of "Sparse" Trie and comparison to walk controller API.

Intention:

  • To establish more user friendly API for walking a Trie
    • And do so without altering the walk controller
  • To enable users to walk a Sparse* Trie
    • *Sparse meaning a valid merkle trie built from a proof
    • "Missing" nodes in a sparse trie cause the walk controller to throw an Error, and end the walk.
    • So users cannot use the walk controller to navigate sparse Tries.

Feature

This PR introduces a new approach to Trie walking, via AsyncIterable

This is a new method, independent from the existing WalkController class.

**It does not presume to replace the WalkController in any active context at this time. **
details on why in comments

Main differences from WalkController:

  1. Iterator API is more straightforward and intuitive. User create the async iterator object, and then proceed via for...await
  2. Sparse Trie walking
  3. "CurrentKey" is yielded by the iterator along with each node, instead of just _nibbles
  4. Inner and Outer OnFound functions, combined with a NodeFilter parameter enable very customizable walks.

Async Generator / Async Iterable / Async Iterator

Uses the iteration protocols described in the MDN docs here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols

Function

the _walkTrie function is exported form trie/src/util/asyncWalk.ts, but also internalized as a bound method for the Trie class: trie.walkTrieIterablle

Do not fear the bind

This is not bound to an interface, just to an exported function. This is so that the new code will not add length to the already very long trie.ts file. Hyperlinks is VSCode, etc. will link to the function implementation iteself with one jump. You will not have to search for it. The class method should inherit the typedoc comments as well.

binding the function context to this allows the external function to access private class methods.
For example:

  • trie.hash() is a private method, that can only be used inside the class itself.
  • by binding the _walkTrie function to a Trie instance, _walkTrie can access this.hash()
  • This avoids patterns like (trie as any).hash() and other hacks

_walkTrie

_walkTrie generates the Async Iterable:

export async function* _walkTrie(
  this: Trie,
  nodeHash: Uint8Array = this.root(),
  currentKey: number[] = [],
  onFound: OnFound = async (_trieNode: TrieNode, _key: number[]) => {},
  filter: NodeFilter = async (_trieNode: TrieNode, _key: number[]) => true,
  visited: Set<string> = new Set<string>()
): AsyncIterable<{ node: TrieNode; currentKey: number[] }>

OnFound

This is a callback that will be called on every node encountered on the walk ("pre-filter").
Useful when NodeFilter is defined, as it will be called on all nodes, not only the filtered nodes that are yielded.

NodeFilter

Pass a filter function that will return true for desired nodes. The walk will actually yield only nodes that pass the filter.

NodeHash / CurrentKey / Visited

With these params left undefined, the walk will begin at the root, and later use these parameters to call itself recursively during the walk. If defined from the start, they can be used to begin the walk from another place.

Use

trie: Trie

const walk = trie.walkTrieIterable(trie.root())

for await ( const { node, currentKey } of walk )  {
  // ... do something i.e. console.log( { node, currentKey } )
}

Helpers

Included are 2 examples of helper functions that setup and execute the walk iterator for specific common uses. These are simply two of many possible wrapper functions that could be designed for common uses. I have listed a few more ideas below.

walkAllNodes
Walk the Trie from the root, and execute onFound on every node in the Trie.
This will run through the whole walk with default parameters and no filter.

trie.walkAllNodes(onFound: OnFound) => Promise<void>

walkAllValueNodes
Walk the Trie from the root, and execute onFound on all Trie nodes that store a VALUE
This will run through the whole walk from the root with a pre-set filter for VALUE_NODES ( LeafNodes and BranchNodes with non-null values )

trie.walkAllValueNodes(onFound: OnFound) => Promise<void>

additional filter-based helpers could include:

  • allLeafNodes
  • allBranchNodes
  • allExtensionNodes
  • getSubTree

possible OnFound-based helpers could include:

  • logNodes
  • copyNodes
  • sortNodes
  • garbageCollect
  • getAllPaths
  • getAllKeys

@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #2904 (4be9931) into master (191faf5) will increase coverage by 0.01%.
The diff coverage is 81.72%.

Additional details and impacted files

Impacted file tree graph

Flag Coverage Δ
block 88.66% <ø> (ø)
blockchain 92.58% <ø> (ø)
client 88.40% <ø> (+0.12%) ⬆️
common 98.64% <ø> (ø)
ethash ∅ <ø> (∅)
evm 69.95% <ø> (ø)
rlp ∅ <ø> (∅)
statemanager 85.16% <ø> (ø)
trie 89.88% <81.72%> (-0.69%) ⬇️
tx 95.96% <ø> (ø)
util 86.77% <ø> (ø)
vm 79.21% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@acolytec3
Copy link
Contributor

Do not fear the bind

It's not fear but great dislike. If this behaves as described, I fear no longer.

@acolytec3
Copy link
Contributor

Just to make sure I understand how this works, this isn't changing the walk strategy, which is depth first, left to right, correct? It's more or less the same as the current walk controller, except that it's an async generator that allows the library consumer to trigger the next step, right?

Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. One error message I think needs fixing (see comment)

@ScottyPoi
Copy link
Contributor Author

ScottyPoi commented Jul 21, 2023

Just to make sure I understand how this works, this isn't changing the walk strategy, which is depth first, left to right, correct? It's more or less the same as the current walk controller, except that it's an async generator that allows the library consumer to trigger the next step, right?

yes exactly. same walk pattern, different API.
the walk controller API can be hard to grasp, and this should be slightly more intuitive to use.

The walk controller method is designed to throw Errors, and certain parts of the Trie and other libraries that use the Trie rely on those Errors. Those expectations run deep, so the walk controller needs to stay as it is.

This way, we can build up a more user friendly Trie walking API without touching the walk controller

@ScottyPoi
Copy link
Contributor Author

Looks good so far. One error message I think needs fixing (see comment)

where is comment? it's not showing up for me

}
if (node instanceof BranchNode) {
for (const [nibble, childNode] of node._branches.entries()) {
if (childNode === null) {
Copy link
Contributor Author

@ScottyPoi ScottyPoi Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  - reported as untouched by coverage


    // if (childNode === null) {
    //   continue
    // }

TODO:
conditional removed
see comment below

  • empty branches seem to be empty Uint8Array, not null.
  • why does union type for node._branches include null?
  • a new branch node's ._branches array is initiallly all null values:
const branch = new BranchNode()
branch._branches = [null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null]
  • when a branch node is serialized, the empty branches are included as a single byte each, basically the RLP encoding of [].
  • deserialization does not convert that byte back to null, of course. it deserializes to the empty array ([]).
  • so while null is a possible type for the _branches array of a branch node, in practice, the walk function should not encounter the null type, and this conditional should be unneccesary.

@acolytec3
Copy link
Contributor

Looks good so far. One error message I think needs fixing (see comment)

where is comment? it's not showing up for me

Sorry, was commenting on the wrong PR 🤦

Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic as an addition. The one thing I think would be helpful would be to add some of the commentary from the PR to the docs for trie so people are aware of this option. I know we have library consumers who use trie as is and this will definitely be helpful for them and we need to surface it so they know it exists.

@holgerd77
Copy link
Member

Yeah, cool, also a big fan of such API additions to the Trie library! 🤩 Would be great if the originator of this feature request could also have a look before we merge, so this would be @TimDaub and this #2856 issue, right?

@holgerd77
Copy link
Member

Updated this via UI

@ScottyPoi
Copy link
Contributor Author

Yeah, cool, also a big fan of such API additions to the Trie library! star_struck Would be great if the originator of this feature request could also have a look before we merge, so this would be @TimDaub and this #2856 issue, right?

Yep, @TimDaub -- This does not explicitly offer the feature that you suggested, but may offer a simpler approach to your use case from #2856

@TimDaub
Copy link
Contributor

TimDaub commented Jul 24, 2023

I‘ll try to take a look in an hour or so

const nextKey = [...currentKey, ...node._nibbles]
yield* _walkTrie.bind(this)(childNode, nextKey, onFound, filter, visited)
}
} catch (e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why apply this try catch on the entire function and not log the error. Don't want to evolve myself into coding standards of ethereumjs too much but IMO a good practice is to do 1 statement per try catch and then match the statement's respective errors or allow errors to bubble up further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really the only potential error here is something like
"DB: Node Missing". We want the walk to ignore these errors and keep going, and I would prefer it happened silently.

I don't want the error to throw, because that stops the walk from happening. I want it to simply continue on.

The reason here is that we want to walk through a sparse trie -- a Trie that has been built from a proof.
A walk through a trie like this will find a hash on the branch of a proof node for a node that is not in the database,

said node was not part of the proof, but it's hash was part of the RLP encoding of one of the proof nodes. So the walk will look for that hash in the DB, and the DB will throw an error. Which in turn would stop the walk prematurely, even though it is a valid Trie with nodes remaining.

*/
async walkAllValueNodes(onFound: OnFound): Promise<void> {
for await (const { node, currentKey } of this.walkTrieIterable(
this.root(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make this user-definable so that e.g. we could start pass in a branch root on a lower level to skip a bunch of upper-level walking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- this is one example of a method that can be built using walkTrieIterable

The walk iterator can be set up to start from any node. However note that the currentKey value returned for each node will be rooted from the starting node. so to string together a node's "full key", you would still need the nibbles from the path to the starting node somehow.

if (node === undefined || visited.has(toHex(this.hash(node!.serialize())))) {
return
}
visited.add(toHex(this.hash(node!.serialize())))
Copy link
Contributor

@TimDaub TimDaub Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From working with trie.ts, I think it'd be awesome if there were a canonical representation of the trie's and node's hash values and if trie.ts would provide this serialization of values directly instead of having "users" implement their own thing. (not necessarily actionable in this PR specifically)

@TimDaub
Copy link
Contributor

TimDaub commented Jul 24, 2023

Hey, so I left some high-level feedback that may not be super actionable right now. It's more what came to mind when I was reading the code. Thank you so much for improving the library!

This module is interesting to me, especially providing a reliable way to get back keys per node and walkAllValueNodes. I could imagine replacing (and hence) simplifying the current logic of my reconciliation algorithm with it!

I won't be able to integrate it e.g. this week, and I'm expecting to raise more issues once I'm implementing it as I learn more about it, but if this finds its way into a release, I'd be more than happy to integrate it and report back how it went!

@ScottyPoi
Copy link
Contributor Author

@TimDaub

Thanks for the feedback. I responded to a couple comments, but would love to continue the conversation.

Re:

From working with trie.ts, I think it'd be awesome if there were a canonical representation of the trie's and node's hash values and if trie.ts would provide this serialization of values directly instead of having "users" implement their own thing. (not necessarily actionable in this PR specifically)

Can you be more specific? This is certainly for a different PR, but if you can help me understand what you mean, I can look into it.

Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@ScottyPoi ScottyPoi merged commit e984704 into master Jul 26, 2023
@holgerd77 holgerd77 deleted the trie-walk branch July 31, 2023 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants