Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor the node key as version + local nonce(seq id) #676

Merged
merged 52 commits into from
Mar 13, 2023

Conversation

cool-develope
Copy link
Collaborator

No description provided.

@@ -105,6 +106,13 @@ func (kf *KeyFormat) Key(args ...interface{}) []byte {
return kf.KeyBytes(segments...)
}

func (kf *KeyFormat) NodeKey(nodeKey int64) []byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find usages of this function anywhere, am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@alexanderbez alexanderbez self-requested a review March 2, 2023 14:59
@catShaark
Copy link
Contributor

catShaark commented Mar 3, 2023

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

@ValarDragon
Copy link
Contributor

ValarDragon commented Mar 3, 2023

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

Shouldn't hardware prefetching mostly eliminate the tree traversal as a time overhead?

@yihuang
Copy link
Collaborator

yihuang commented Mar 3, 2023

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

Sounds like a very clever idea, but this may not serve what you want because you don't know what exact version the key is modified, unless also maintain some other indexes.

@catShaark
Copy link
Contributor

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

Sounds like a very clever idea, but this may not serve what you want because you don't know what exact version the key is modified, unless also maintain some other indexes.

But we always know the version since when we load iavl.Store, we always specify the version

@yihuang
Copy link
Collaborator

yihuang commented Mar 3, 2023

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

Shouldn't hardware prefetching mostly eliminate the tree traversal as a time overhead?

Tree nodes are scattered at different versions, prefetching don't help much I think.

@yihuang
Copy link
Collaborator

yihuang commented Mar 3, 2023

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

Sounds like a very clever idea, but this may not serve what you want because you don't know what exact version the key is modified, unless also maintain some other indexes.

But we always know the version since when we load iavl.Store, we always specify the version

That is the version you want to query in, not the version the key is last modified, the version contained in the node is the version the node is created in, aka. when the key is modified or inserted.

@cool-develope
Copy link
Collaborator Author

This maybe out of scope for this PR but I think we should change the key format for leaf nodes to be version | l | key where key is the value's key. Doing so, we can fetch key/value fair from iavl.Store and iterate iavl.Store without tree traversal.

Sounds like a very clever idea, but this may not serve what you want because you don't know what exact version the key is modified, unless also maintain some other indexes.

But we always know the version since when we load iavl.Store, we always specify the version

@yihuang is right, we have different versions, how is key-iteration query possible? since the version prefix is different

Makefile Outdated Show resolved Hide resolved
node.go Show resolved Hide resolved
Copy link
Member

@kocubinski kocubinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically things look OK to me. Maybe it's a big nit, but I wonder if this diff and implementation would be cleaner and more readable without the NodeKey struct and the key just represented as []byte (same as it is now as hash).

To that I end I did a little refactoring PoC to see what it looks like. Let me know what you think, since you've spent time on this problem here than me.

https://github.com/cosmos/iavl/compare/592/refactor-nonce-new...kocubinski/592?expand=1

node.go Show resolved Hide resolved
}
}

// MakeNode constructs an *Node from an encoded byte slice.
//
// The new node doesn't have its hash saved or set. The caller must set it
// afterwards.
func MakeNode(buf []byte) (*Node, error) {
// Read node header (height, size, version, key).
func MakeNode(nodeKey *NodeKey, buf []byte) (*Node, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want/need an API breaking change here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this API is not exposed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is exported, see: https://go.dev/tour/basics/3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely, it is exported. I believe there is no usecase of this api outside.

node.go Outdated Show resolved Hide resolved
if node.subtreeHeight == 0 {
if bytes.Equal(node.key, key) {
return node, nil
}
return node, errors.New("key does not exist")
}

nodeVersion := version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just use the field version on the Node? There is no valid case where a node does not have a db key right?

Copy link
Collaborator Author

@cool-develope cool-develope Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is related to new design, we have new added nodes while uncommitting stage, and these nodes will have no nodekey and version, in that case, we should indicate the current version for those nodes.


The version of a node is the first version of the IAVL tree that the node gets added in. Future versions of the IAVL may point to this node if they also contain the node, however the node's version itself does not change.

Size is the number of leaves under a given node. With a full subtree, `node.size = 2^(node.height)`.

### Marshaling

Every node is persisted by encoding the key, version, height, size and hash. If the node is a leaf node, then the value is persisted as well. If the node is not a leaf node, then the leftHash and rightHash are persisted as well.
Every node is persisted by encoding the key, height, and size. If the node is a leaf node, then the value is persisted as well. If the node is not a leaf node, then the hash, leftNodeKey and rightNodeKey are persisted as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be some explanation of why hash is now written for inner nodes since this is a key departure from the previous design.

}
cause = encoding.EncodeBytes(w, node.leftHash)
cause = encoding.EncodeVarint(w, int64(node.leftNodeKey.nonce))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to cast this to int64 anyway on write why not type it as int64?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only cares about how much is the number, not about the integer variable type in EncodeVarint.
the result is same for 2(int64) and 2(int32), we will reduce the memory usage with int32 nonce

@cool-develope cool-develope requested a review from kocubinski March 8, 2023 14:41
@cool-develope
Copy link
Collaborator Author

cool-develope commented Mar 8, 2023

Here is the rocksDB benchmarking result, there is not much improvement in RocksDB

iavl: 0.20.0-alpha1-23-gde1532f
goos: linux
goarch: amd64
pkg: github.com/cosmos/iavl/benchmarks
cpu: DO-Premium-Intel
                                                                     │   old.txt          │   new.txt    │    
                                                                     │    sec/op          │    sec/op    │ 
Larger/memdb-1000000-100-16-40/query-no-in-tree-guarantee-fast-4       3.818µ ±  3%         4.010µ ±  3% 
Larger/memdb-1000000-100-16-40/query-no-in-tree-guarantee-slow-4       20.66µ ±  6%         18.34µ ±  5% 
Larger/memdb-1000000-100-16-40/query-hits-fast-4                       5.650µ ±  7%         5.913µ ±  4% 
Larger/memdb-1000000-100-16-40/query-hits-slow-4                       25.17µ ±  8%         22.22µ ±  2% 
Larger/memdb-1000000-100-16-40/iteration-fast-4                        592.3m ±  3%         553.8m ±  5% 
Larger/memdb-1000000-100-16-40/iteration-slow-4                         12.78 ±  4%          6.023 ±  2%
Larger/memdb-1000000-100-16-40/update-4                                317.9µ ±  5%         212.5µ ± 10%
Larger/memdb-1000000-100-16-40/block-4                                 34.30m ±  6%         24.11m ±  3%
Larger/goleveldb-1000000-100-16-40/query-no-in-tree-guarantee-fast-4   4.123µ ±  7%         4.169µ ±  2%
Larger/goleveldb-1000000-100-16-40/query-no-in-tree-guarantee-slow-4   32.94µ ± 12%         26.92µ ±  6%
Larger/goleveldb-1000000-100-16-40/query-hits-fast-4                   12.36µ ±  3%         12.22µ ±  4%
Larger/goleveldb-1000000-100-16-40/query-hits-slow-4                   41.00µ ±  5%         33.70µ ±  4%
Larger/goleveldb-1000000-100-16-40/iteration-fast-4                    780.4m ±  3%         819.9m ± 10%
Larger/goleveldb-1000000-100-16-40/iteration-slow-4                     30.43 ±  3%          13.92 ±  2%
Larger/goleveldb-1000000-100-16-40/update-4                            373.6µ ±  9%         276.1µ ± 10%
Larger/goleveldb-1000000-100-16-40/block-4                             50.60m ±  3%         35.58m ±  5%
Larger/rocksdb-1000000-100-16-40/query-no-in-tree-guarantee-fast-4     6.655µ ±  6%         6.695µ ±  8%
Larger/rocksdb-1000000-100-16-40/query-no-in-tree-guarantee-slow-4     22.06µ ±  9%         25.07µ ±  7%
Larger/rocksdb-1000000-100-16-40/query-hits-fast-4                     8.097µ ±  4%         8.287µ ±  3%
Larger/rocksdb-1000000-100-16-40/query-hits-slow-4                     29.97µ ±  4%         28.99µ ±  4%
Larger/rocksdb-1000000-100-16-40/iteration-fast-4                       4.352 ±  2%          4.359 ±  3%
Larger/rocksdb-1000000-100-16-40/iteration-slow-4                       16.81 ±  2%          15.28 ±  2%
Larger/rocksdb-1000000-100-16-40/update-4                              436.1µ ± 10%         281.0µ ± 10%
Larger/rocksdb-1000000-100-16-40/block-4                               45.70m ±  6%         32.38m ±  4%
geomean                                                                1.356m               1.135m

@tac0turtle tac0turtle merged commit e46665c into master Mar 13, 2023
@tac0turtle tac0turtle deleted the 592/refactor-nonce-new branch March 13, 2023 11:52
@ValarDragon
Copy link
Contributor

🎉

larry0x added a commit to larry0x/iavl that referenced this pull request May 11, 2023
Since cosmos#676, nodes are indexed by an integer nonce instead of its hash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants