-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trie - async walk generator #2904
Conversation
Codecov Report
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. |
It's not fear but great dislike. If this behaves as described, I fear no longer. |
Just to make sure I understand how this works, this isn't changing the walk strategy, which is depth first, left to right, correct? It's more or less the same as the current walk controller, except that it's an async generator that allows the library consumer to trigger the next step, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. One error message I think needs fixing (see comment)
yes exactly. same walk pattern, different API. The This way, we can build up a more user friendly Trie walking API without touching the |
where is comment? it's not showing up for me |
packages/trie/src/util/asyncWalk.ts
Outdated
} | ||
if (node instanceof BranchNode) { | ||
for (const [nibble, childNode] of node._branches.entries()) { | ||
if (childNode === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- reported as untouched by coverage
// if (childNode === null) {
// continue
// }
TODO:
conditional removed
see comment below
- empty branches seem to be empty Uint8Array, not null.
- why does union type for node._branches include null?
- a new branch node's
._branches
array is initiallly all null values:
const branch = new BranchNode()
branch._branches = [null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null]
- when a branch node is serialized, the empty branches are included as a single byte each, basically the RLP encoding of
[]
. - deserialization does not convert that byte back to
null
, of course. it deserializes to the empty array ([]
). - so while
null
is a possible type for the_branches
array of a branch node, in practice, the walk function should not encounter thenull
type, and this conditional should be unneccesary.
Sorry, was commenting on the wrong PR 🤦 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic as an addition. The one thing I think would be helpful would be to add some of the commentary from the PR to the docs for trie
so people are aware of this option. I know we have library consumers who use trie
as is and this will definitely be helpful for them and we need to surface it so they know it exists.
Updated this via UI |
Yep, @TimDaub -- This does not explicitly offer the feature that you suggested, but may offer a simpler approach to your use case from #2856 |
I‘ll try to take a look in an hour or so |
const nextKey = [...currentKey, ...node._nibbles] | ||
yield* _walkTrie.bind(this)(childNode, nextKey, onFound, filter, visited) | ||
} | ||
} catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why apply this try catch on the entire function and not log the error. Don't want to evolve myself into coding standards of ethereumjs too much but IMO a good practice is to do 1 statement per try catch and then match the statement's respective errors or allow errors to bubble up further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really the only potential error here is something like
"DB: Node Missing". We want the walk to ignore these errors and keep going, and I would prefer it happened silently.
I don't want the error to throw, because that stops the walk from happening. I want it to simply continue on.
The reason here is that we want to walk through a sparse trie -- a Trie that has been built from a proof.
A walk through a trie like this will find a hash on the branch of a proof node for a node that is not in the database,
said node was not part of the proof, but it's hash was part of the RLP encoding of one of the proof nodes. So the walk will look for that hash in the DB, and the DB will throw an error. Which in turn would stop the walk prematurely, even though it is a valid Trie with nodes remaining.
*/ | ||
async walkAllValueNodes(onFound: OnFound): Promise<void> { | ||
for await (const { node, currentKey } of this.walkTrieIterable( | ||
this.root(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this user-definable so that e.g. we could start pass in a branch root on a lower level to skip a bunch of upper-level walking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- this is one example of a method that can be built using walkTrieIterable
The walk iterator can be set up to start from any node. However note that the currentKey
value returned for each node will be rooted from the starting node. so to string together a node's "full key", you would still need the nibbles from the path to the starting node somehow.
if (node === undefined || visited.has(toHex(this.hash(node!.serialize())))) { | ||
return | ||
} | ||
visited.add(toHex(this.hash(node!.serialize()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From working with trie.ts, I think it'd be awesome if there were a canonical representation of the trie's and node's hash values and if trie.ts would provide this serialization of values directly instead of having "users" implement their own thing. (not necessarily actionable in this PR specifically)
Hey, so I left some high-level feedback that may not be super actionable right now. It's more what came to mind when I was reading the code. Thank you so much for improving the library! This module is interesting to me, especially providing a reliable way to get back keys per node and I won't be able to integrate it e.g. this week, and I'm expecting to raise more issues once I'm implementing it as I learn more about it, but if this finds its way into a release, I'd be more than happy to integrate it and report back how it went! |
Thanks for the feedback. I responded to a couple comments, but would love to continue the conversation. Re:
Can you be more specific? This is certainly for a different PR, but if you can help me understand what you mean, I can look into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm!
Trie --
walkTrieIterable
QuickStart:
observe console logs to see details of walk.
test includes walk of "Sparse" Trie and comparison to
walk controller
API.Intention:
walk controller
walk controller
to throw an Error, and end the walk.walk controller
to navigate sparse Tries.Feature
This PR introduces a new approach to Trie walking, via
AsyncIterable
This is a new method, independent from the existing
WalkController
class.**It does not presume to replace the
WalkController
in any active context at this time. **details on why in comments
Main differences from
WalkController
:for...await
_nibbles
OnFound
functions, combined with aNodeFilter
parameter enable very customizable walks.Async Generator / Async Iterable / Async Iterator
Uses the iteration protocols described in the MDN docs here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols
Function
the
_walkTrie
function is exported formtrie/src/util/asyncWalk.ts
, but also internalized as a bound method for the Trie class:trie.walkTrieIterablle
Do not fear the
bind
This is not bound to an interface, just to an exported function. This is so that the new code will not add length to the already very long
trie.ts
file. Hyperlinks is VSCode, etc. will link to the function implementation iteself with one jump. You will not have to search for it. The class method should inherit the typedoc comments as well.binding the function context to
this
allows the external function to access private class methods.For example:
trie.hash()
is a private method, that can only be used inside the class itself._walkTrie
function to a Trie instance,_walkTrie
can accessthis.hash()
(trie as any).hash()
and other hacks_walkTrie
_walkTrie
generates theAsync Iterable
:OnFound
This is a callback that will be called on every node encountered on the walk ("pre-filter").
Useful when
NodeFilter
is defined, as it will be called on all nodes, not only the filtered nodes that are yielded.NodeFilter
Pass a filter function that will return true for desired nodes. The walk will actually
yield
only nodes that pass the filter.NodeHash / CurrentKey / Visited
With these params left
undefined
, the walk will begin at the root, and later use these parameters to call itself recursively during the walk. If defined from the start, they can be used to begin the walk from another place.Use
trie: Trie
Helpers
Included are 2 examples of helper functions that setup and execute the walk iterator for specific common uses. These are simply two of many possible wrapper functions that could be designed for common uses. I have listed a few more ideas below.
walkAllNodes
Walk the Trie from the root, and execute
onFound
on every node in the Trie.This will run through the whole walk with default parameters and no filter.
trie.walkAllNodes(onFound: OnFound) => Promise<void>
walkAllValueNodes
Walk the Trie from the root, and execute
onFound
on all Trie nodes that store a VALUEThis will run through the whole walk from the root with a pre-set filter for VALUE_NODES ( LeafNodes and BranchNodes with non-null values )
trie.walkAllValueNodes(onFound: OnFound) => Promise<void>
additional filter-based helpers could include:
possible
OnFound
-based helpers could include: