-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve proof size #23
Comments
I spoke with @drskalman about this: I incorrectly estimated the savings here because the trie frequently gets used sparsely, so many decedents get omitted entirely. We should consider if more modern sparse Merkle tree designs would provide better performance. In particular, there are designs with clever compromises on collision resistance, like h(x,0) = x || "L" and h(0,y) = y || "R", but not entirely sure about their security properties. We should also be careful about the actual sparsity assumptions since maybe some hybrid design exists that auto-magically gives us an almost optimal approach under almost all conditions. |
It is probably out of scope of this library, because library is about base 16 trie. Totally agree about SMT-s in general though, we should just calculate all possible scenarios, what theoretical benefits we might get, then I can implement it in separate library and experiment in substrate |
I was thinking that for sparse trie (branch where number of children is less eq than 4) we could use existing proof and only apply the binary when the number of child makes it more efficient, without giving it much thought it does not look insecure. I guess 4 treshold is probably not the best (depending on how we value computational cost), but it is just a fix treshold to define. |
Oh, h(x,0) = x || "L" may be about replacing part of the hash to encode L or R or 0 and would not involve an additional round of hash, that would makes it interesting indeed. |
If your hash inputs enough before running another round then yes the h(x,0) = x || "L" and h(0,y) = y || "R" approach can be "free". If your hash prefers
We occasionally pay for hashing 128 bytes when 64 maybe sufficed, but that's rare, and more often we hash only 64. In fact, we expect "L" and "R" chains much shorter, so we should optimize this somewhat differently. We could even alter the behavior at different depths if that improved performance, presumably meaning close to the root it assumed a full binary tree. We have not yet analyzed if this is secure, but if your hash has strong enough collision resistance then sure it looks fine. I think Vitalik has some blog post about wanting separation between storage and the hashing, but regardless we still need storage of intermediate nodes for performance reasons, meaning the radix 16 storage. In fact, if you were using a snark friendly hash for a chain like zcash then you might want more frequent storage because your snark friendly hash is extremely slow. |
Also @cheme you make a good point about changing the threshold dynamically. We might've some situations where the trie starts dense, then goes sparce, and then goes denser again after entering some particular area? |
It can happen if/when concatenating two hashed keys without hashing afterward (latest substrate storage change allows it in double key storage iirc), this introduce some unbalance in the trie and there might be plan to switch to child trie in such case. |
At some point, I think the trie produced inclusion proofs which spelled out all other 15 nodes of the radix 16 tree layers, which waists like half the space in PoV blocks. We instead want Merkle roots to be computed as a a binary tree, so the proofs can be sent for the binary tree form, but the tree can still be stored on disk as a radix 16 tree.
The text was updated successfully, but these errors were encountered: