Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move][stdlib] Add efficient BigOrderedMap implementation #14753

Merged
merged 20 commits into from
Jan 13, 2025
Merged

Conversation

igor-aptos
Copy link
Contributor

@igor-aptos igor-aptos commented Sep 25, 2024

Description

Having efficient and concurrent large-sized "Map" and "OrderedMap" implementations is useful across variety of needs.

Current SmartTable implementation has various limitations, from it being sequential, to it not having smart ways to deal with collisions. It cannot be improved - as the structs are unmodifiable, and so should be deprecated fully.

In this PR:

  • provide efficient big "OrderedMap" implementation. BPlusTreeMap is chosen as it is best suited for the onchain storage layout - where majority of cost comes from loading and writing to storage items, and there is no partial read/write of them. It also rebalances in a way that reduces amount writes needed
  • writing to keys that are not close enough, to a BPlusTreeMap with multiple layers, is generally parallel. More elements in the BPlusTreeMap, the more operations will be parallel.
  • it is an enum, so can be evolved in the future as needed
  • has an option to pre-pay and pre-allocate storage slots, and to reuse them, achieving predictable gas charges.
    • defined potentially generally useful StorageSlotsAllocator, to manage indexed storage slots for datastructures that need it. Easier to use and more flexible/configurable than directly using Table. (which is also an enum, and so can be evolved / new strategies can be added)
  • keeps root note directly inside the resource, to reduce number of resources needed by 1, and optimizes operations when root is the leaf node.
  • whenever key or value is inserted/modified, we check the size to make sure it is allowed (i.e. that it is smaller than max_node_size / max_degree). this requires bcs::serialized_size() call on every insert/upsert.
    • in case types have constant sizes, check is performed once, on construction, and then it’s skipped on insert/upsert

How Has This Been Tested?

provided extensive unit tests. will probably consolidate and remove some before committing (to not make CI slower/more expensive), but for development those were useful.

For performance, we measured two things.

At large scale

Most relevantly - we measured performance at scale, in comparison to SmartTable. So adding 1M keys into each, with making entries be 4KB on each. we get:

metric SmartTable, 4KB nodes SmartTable, 1KB nodes BigOrderedMap BPlusTreeMap, 4KB nodes BigOrderedMap BPlusTreeMap, 1KB nodes
tps (u64 -> None) 1300 1968 1899 2516
tps (u256 -> None) 2166 3152 2313 2219
gas/txn (u64 -> None) 15 11 9 9
gas/txn (u256 -> None) 10 8 9 10
storage fee/txn (u64 -> None) 1147 1342 652 977
storage fee /txn (u256 -> None) 2814 3926 2177 3701

This shows BigOrderedMap being more storage-efficient, especially when keys/values are small, due to SmartTable storing hash in addition to key,value. It is also more performant even on 1M keys, when we are optimizing for storage costs (i.e. more elements in a single bucket). As we reduce the size of the bucket, SmartTable becomes more competitive, but the storage costs increase significantly.

Note: Test is compared on the single thread to compare apples to apples, as SmartTable has no parallelism. BigOrderedMap is parallel, and running on more threads gives order of magnitude higher throughput.

At small scale

We also measured performance at small scale, and how much overhead is it to use BigOrderedMap, instead of just OrderedMap, when it is unknown if data will be too large to be stored in a single resource.
Here we measure nanoseconds taken for a single pair of insert + remove operation, into a map of varied size.

num elements SimpleMap OrderedMap SortedVectorMap BigOrderedMap BPlusTreeMap, all inlined BigOrderedMap BPlusTreeMap, max_degree=16 SmartTable, 1 bucket SmartTable, preallocated, 1 per bucket
10 61 65 123 123 80 62
100 219 85 146 455 229 62
1000 1508 105 168 567 1458 75
10000 14835 142 210 656 15726 80

Here we can see that inlining of the root node makes the overhead be less than 2 times, and even splitting small maps to achieve higher parallelism - keeps the overhead reasonable (i.e. another 2-3 times). But in all cases it scales extremely well as dataset increases in size.

Key Areas to Review

  • Original implementation is @grao1991 's (and that is the first commit in the stack), on top of it:
  • inline values in the leaf nodes. make max degree of inner nodes separately configurable from max degree of leaf nodes.
  • only modify necessary nodes, to reduce costs and achieve parallelism.
  • made keys generic. removed drop+copy requirement on the values
  • fixed indexing of nodes, and extracted (potentially generally useful SlotsStorage), allowing preallocating/reusing storage slots.

Type of Change

  • New feature

Which Components or Systems Does This Change Impact?

  • Aptos Framework

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Sep 25, 2024

⏱️ 2h 1m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 23m 🟩
rust-move-unit-coverage 14m 🟩
rust-targeted-unit-tests 13m 🟥
rust-move-unit-coverage 12m 🟩
rust-move-unit-coverage 10m 🟩
rust-move-unit-coverage 10m 🟩
rust-cargo-deny 7m 🟩🟩🟩🟩
general-lints 7m 🟩🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥
semgrep/ci 2m 🟩🟩🟩🟩🟩
file_change_determinator 45s 🟩🟩🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Sep 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.4%. Comparing base (76e0b76) to head (e5ee36d).

Additional details and impacted files
@@                Coverage Diff                @@
##           igor/ordered_map   #14753   +/-   ##
=================================================
  Coverage              57.4%    57.4%           
=================================================
  Files                   859      859           
  Lines                211663   211663           
=================================================
  Hits                 121527   121527           
  Misses                90136    90136           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@igor-aptos igor-aptos force-pushed the igor/vector_utilities branch from 4ae8d66 to 6e1add6 Compare September 25, 2024 20:31
@igor-aptos igor-aptos force-pushed the igor/btree_map branch 2 times, most recently from 3ffc3e4 to 95aaf51 Compare September 25, 2024 23:15
@msmouse
Copy link
Contributor

msmouse commented Sep 26, 2024

The world with Enums is beautiful 🚀

Copy link
Contributor

@lightmark lightmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't fully review the tree code with some comments first.

/// Destroys the tree if it's empty, otherwise aborts.
public fun destroy_empty<K: store, V: store>(self: BTreeMap<K, V>) {
let BTreeMap::V1 { nodes, root_index, min_leaf_index: _, max_leaf_index: _, inner_max_degree: _, leaf_max_degree: _ } = self;
aptos_std::debug::print(&nodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

enum SlotsStorage<T: store> has store {
Simple {
slots: Table<u64, Link<T>>,
new_slot_index: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the non-reuse version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one app does all inserts and removals, they might want to get the refunds back, instead of worrying about amortization

/// Returns a newly allocated vector containing the elements in the range [at, len).
/// After the call, the original vector will be left containing the elements [0, at)
/// with its previous capacity unchanged.
public fun split_off<Element>(self: &mut vector<Element>, at: u64): vector<Element> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a native functions, so does the next one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in the stack

@@ -87,7 +87,7 @@ module aptos_std::table {
drop_unchecked_box<K, V, Box<V>>(self)
}

public(friend) fun destroy<K: copy + drop, V>(self: Table<K, V>) {
public fun destroy_empty<K: copy + drop, V>(self: Table<K, V>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this function does NOT check the Table is empty...
Could we confirm with @wrwg ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way to check if a table is empty or not, that's why we didn't expose destroy_empty as a public function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/// An iterator to iterate all keys in the BTreeMap.
enum Iterator<K> has copy, drop {
End,
Some {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: None and Next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important it is End.

}

// Returns true iff the iterator is an end iterator.
public fun is_end_iter<K: store, V: store>(_tree: &BTreeMap<K, V>, iter: &Iterator<K>): bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels more like inline func than actual func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot be done publicly, as enum variants are not public. also some implementation might actually need to do something here.

@@ -87,7 +87,7 @@ module aptos_std::table {
drop_unchecked_box<K, V, Box<V>>(self)
}

public(friend) fun destroy<K: copy + drop, V>(self: Table<K, V>) {
public fun destroy_empty<K: copy + drop, V>(self: Table<K, V>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way to check if a table is empty or not, that's why we didn't expose destroy_empty as a public function

Inner {
// The max key of its child, or the key of the current node if it is a leaf node.
max_key: K,
// The node index of it's child, or NULL_INDEX if the current node is a leaf node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum is somehow confusing to me. Could you draw a ascii diagram to give an example?

NULL_INDEX if the current node is a leaf node

If this is a leaf node, it should be Leaf, right?

public fun new_with_config<K: store, V: store>(inner_max_degree: u16, leaf_max_degree: u16, reuse_slots: bool, num_to_preallocate: u64): BTreeMap<K, V> {
assert!(inner_max_degree == 0 || inner_max_degree >= DEFAULT_INNER_MIN_DEGREE, E_INVALID_PARAMETER);
assert!(leaf_max_degree == 0 || leaf_max_degree >= DEFAULT_LEAF_MIN_DEGREE, E_INVALID_PARAMETER);
let nodes = if (reuse_slots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we don't want to reuse slots?

Comment on lines 562 to 567
let new_node_children = children.split_off(target_size - 1);
children.insert(l, child);
new_node_children
} else {
children.insert(l, child);
children.split_off(target_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need two cases? Just

children.insert(l, child);
children.split_off(target_size)

is good. unless it's for perf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was for perf. since it is only on rebalancing, cleaned it up.

let BTreeMap::V1 { nodes, root_index, min_leaf_index: _, max_leaf_index: _, inner_max_degree: _, leaf_max_degree: _ } = self;
aptos_std::debug::print(&nodes);
nodes.remove(root_index).destroy_empty_node();
nodes.destroy_empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, how to gc?

self.nodes.borrow_mut(*prev).next = left_node_index;
};

if (!*is_leaf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments... update the non-leaf children parents pointer to the left node, which was the original node with new index.

let left_node_slot = self.nodes.create_transient_slot();
let left_node_index = left_node_slot.get_index();
right_node.next = *next;
*next = node_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment
// node_index == right_node_index;

};

// # of children in the current node exceeds the threshold, need to split into two nodes.
let (right_node_slot, node) = self.nodes.transiently_remove(node_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse this node as the left node?

// The brother node has enough elements, borrow an element from the brother node.
brother_size = brother_size - 1;
if (brother_index == next) {
let borrowed_element = brother_children.remove(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a native remove too.

};
let borrowed_max_key = borrowed_element.max_key;
children.push_back(borrowed_element);
current_size = current_size + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary? just make current_size - 1 at the following line?

@igor-aptos igor-aptos force-pushed the igor/vector_utilities branch from 9e1d6af to 0d2ea8f Compare October 3, 2024 20:19
@igor-aptos igor-aptos changed the base branch from igor/vector_utilities to main October 3, 2024 20:20
@igor-aptos igor-aptos requested a review from vgao1996 as a code owner October 4, 2024 20:29
@igor-aptos igor-aptos changed the title [move][stdlib] Add efficient BTreeMap implementation [move][stdlib] Add efficient BigOrderedMap implementation Oct 4, 2024
@igor-aptos igor-aptos changed the base branch from main to igor/ordered_map October 4, 2024 20:31

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on 6593fb81261f25490ffddc2252a861c994234c2a ==> 294b435745dd07f206e1a93bae92c5022e9341d0

Compatibility test results for 6593fb81261f25490ffddc2252a861c994234c2a ==> 294b435745dd07f206e1a93bae92c5022e9341d0 (PR)
1. Check liveness of validators at old version: 6593fb81261f25490ffddc2252a861c994234c2a
compatibility::simple-validator-upgrade::liveness-check : committed: 17691.18 txn/s, latency: 1957.84 ms, (p50: 2100 ms, p70: 2100, p90: 2200 ms, p99: 3300 ms), latency samples: 586040
2. Upgrading first Validator to new version: 294b435745dd07f206e1a93bae92c5022e9341d0
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7735.59 txn/s, latency: 3871.78 ms, (p50: 4500 ms, p70: 4700, p90: 4800 ms, p99: 4900 ms), latency samples: 143700
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7579.93 txn/s, latency: 4364.76 ms, (p50: 4700 ms, p70: 4800, p90: 4900 ms, p99: 4900 ms), latency samples: 258280
3. Upgrading rest of first batch to new version: 294b435745dd07f206e1a93bae92c5022e9341d0
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7975.00 txn/s, latency: 3713.72 ms, (p50: 4200 ms, p70: 4400, p90: 4900 ms, p99: 5100 ms), latency samples: 143540
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 8032.87 txn/s, latency: 4188.03 ms, (p50: 4500 ms, p70: 4700, p90: 4800 ms, p99: 5000 ms), latency samples: 270300
4. upgrading second batch to new version: 294b435745dd07f206e1a93bae92c5022e9341d0
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 12341.09 txn/s, latency: 2343.09 ms, (p50: 2600 ms, p70: 2800, p90: 3000 ms, p99: 3100 ms), latency samples: 215640
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 12400.42 txn/s, latency: 2637.20 ms, (p50: 2800 ms, p70: 2900, p90: 3100 ms, p99: 3200 ms), latency samples: 401740
5. check swarm health
Compatibility test for 6593fb81261f25490ffddc2252a861c994234c2a ==> 294b435745dd07f206e1a93bae92c5022e9341d0 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 294b435745dd07f206e1a93bae92c5022e9341d0

two traffics test: inner traffic : committed: 15145.64 txn/s, latency: 2625.14 ms, (p50: 2700 ms, p70: 2700, p90: 2700 ms, p99: 3000 ms), latency samples: 5758700
two traffics test : committed: 100.00 txn/s, latency: 1323.30 ms, (p50: 1300 ms, p70: 1300, p90: 1500 ms, p99: 1500 ms), latency samples: 1680
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.552, avg: 1.493", "ConsensusProposalToOrdered: max: 0.285, avg: 0.284", "ConsensusOrderedToCommit: max: 0.302, avg: 0.293", "ConsensusProposalToCommit: max: 0.587, avg: 0.576"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.51s no progress at version 1004 (avg 0.19s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.51s no progress at version 2932774 (avg 0.51s) [limit 16].
Test Ok

@igor-aptos igor-aptos merged commit 623156d into main Jan 13, 2025
43 of 46 checks passed
@igor-aptos igor-aptos deleted the igor/btree_map branch January 13, 2025 15:22
rahxephon89 pushed a commit that referenced this pull request Jan 13, 2025
## Description
Having efficient and concurrent large-sized "Map" and "OrderedMap" implementations is useful across variety of needs.

Current SmartTable implementation has various limitations, from it being sequential, to it not having smart ways to deal with collisions. It cannot be improved - as the structs are unmodifiable, and so should be deprecated fully.

In this PR:
* provide efficient big "OrderedMap" implementation. BPlusTreeMap is chosen as it is best suited for the onchain storage layout - where majority of cost comes from loading and writing to storage items, and there is no partial read/write of them. It also rebalances in a way that reduces amount writes needed 
* writing to keys that are not close enough, to a BPlusTreeMap with multiple layers, is generally parallel. More elements in the BPlusTreeMap, the more operations will be parallel. 
* it is an enum, so can be evolved in the future as needed
* has an option to pre-pay and pre-allocate storage slots, and to reuse them, achieving predictable gas charges.
  * defined potentially generally useful StorageSlotsAllocator, to manage indexed storage slots for datastructures that need it. Easier to use and more flexible/configurable than directly using Table. (which is also an enum, and so can be evolved / new strategies can be added)
* keeps root note directly inside the resource, to reduce number of resources needed by 1, and optimizes operations when root is the leaf node.
* whenever key or value is inserted/modified, we check the size to make sure it is allowed (i.e. that it is smaller than max_node_size / max_degree). this requires bcs::serialized_size() call on every insert/upsert.
  * in case types have constant sizes, check is performed once, on construction, and then it’s skipped on insert/upsert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants