Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(store): Add new TreeMap implementation #665

Merged
merged 27 commits into from
Jan 13, 2022
Merged

Conversation

austinabell
Copy link
Contributor

@austinabell austinabell commented Dec 9, 2021

Closes #588

So this is somewhat based on the original impl, but I changed quite a bit, so probably would have been better to just start from scratch. That was incredibly painful to make the old impl more efficient and under the new API of the underlying data structures.

All the APIs are changed to match std::collections::BTreeMap. I'll make a pass-through tonight or tomorrow to make sure everything we can match we do, and then I will open this up. Otherwise, all current functionality will remain the same if anyone wants to start understanding the changes. I tried doing changes in place, to more clearly see differences (586f0f0) but this ended up not being very helpful because a lot of things changed.

@austinabell austinabell marked this pull request as ready for review December 10, 2021 02:20
@austinabell austinabell requested a review from itegulov December 10, 2021 02:21
Copy link
Contributor

@itegulov itegulov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken just a quick look through the code so far. Will try to dive deeper later this week. Let's finish these collections 😉

near-sdk/src/store/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/store/tree_map/mod.rs Show resolved Hide resolved
near-sdk/src/store/tree_map/mod.rs Outdated Show resolved Hide resolved
key: K, // key stored in a node
lft: Option<FreeListIndex>, // left link of a node
rgt: Option<FreeListIndex>, // right link of a node
ht: u32, // height of a subtree at a node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you have inherited the AVL implementation from the old collection and this optimization is probably not worth the trouble, but instead of storing tree height, you can actually just store the balance factor (-1, 0, 1). It is quite easy to see that it would be enough by looking at all current usages of ht. From what I remember upholding the balance factor invariant is a bit tricky though and the gain is minuscule (if any at all), but I thought I would share my thoughts just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's tackle this in a following PR. This is under unstable so let's try to refactor and optimize once we have a framework to compare the optimizations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't mind. Just thought I would point this out before I forget :)

/// - `insert`/`remove`: O(log(N))
/// - `min`/`max`: O(log(N))
/// - `above`/`below`: O(log(N))
/// - `range` of K elements: O(Klog(N))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thing here - if we make iterators actually hold Node instances, we would be able to answer range queries with a better complexity of O(K + log(N)). Every time you need next element, you would just follow a deterministic algorithm of finding the next/previous value in a binary tree. This might be a little more worthwhile than my last point, but I leave it to your judgement.

I am not 100% sure on my assessment of complexity as each next-element-lookup takes O(log(N)), so if we judge the complexity naively it would still be O(Klog(N)). However, if we actually do amortized analysis here, I think in a BST with inner nodes it should be O(K + log(N)). I can uncover some existing formal proof on this if you want or, if not, probably can come up with one myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely possible and should be done, but requires a decent refactor as with the current impl we don't have a way to traverse up to the parent for these operations. Perhaps like the previous point we should implement this optimization in a future PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member

@ChaoticTempest ChaoticTempest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a mini pass on this, with some minor comments. I don't understand enough about AVL-Trees so will do another pass later on when I up my understanding

near-sdk/src/store/tree_map/entry.rs Outdated Show resolved Hide resolved
near-sdk/src/store/tree_map/entry.rs Outdated Show resolved Hide resolved
near-sdk/src/store/tree_map/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/store/tree_map/mod.rs Outdated Show resolved Hide resolved
Copy link
Member

@ChaoticTempest ChaoticTempest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to go. The interface mainly looks the same as the older one, and we should be able to refine it based on what Daniyar mentioned

Comment on lines 167 to 168
let prefix = prefix.into_storage_key();
let mut vec_key = prefix.into_storage_key();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extra into_storage_key() intended?

Copy link
Contributor

@itegulov itegulov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some documentation nitpicks and a potential thing to look into

K: Borrow<Q>,
Q: BorshSerialize + ToOwned<Owned = K> + Ord,
{
self.values.contains_key(k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would personally rename values to something else as lines like this one look a little odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't personally see the issue, but don't have a preference, so I'm happy to hear if you have a name that might be more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant that it looks weird that the map has a field named values that also hold keys. Feel free to ignore my comment, I don't really mind either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, technically it doesn't hold the keys, it just is generic on the key type to do lookups

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point too

near-sdk/src/store/tree_map/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/store/tree_map/mod.rs Outdated Show resolved Hide resolved
near-sdk/src/store/tree_map/mod.rs Outdated Show resolved Hide resolved
} else {
if key.lt(&node.key) {
let idx = match node.lft {
Some(lft) => self.insert_at(expect(self.node(lft)).clone(), lft, key),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you explored whether it is possible to get rid of all these clones by writing another auxiliary method like this one:

fn node_mut(&mut self, id: FreeListIndex) -> Option<&mut Node<K>> {
    self.nodes.get_mut(id)
}

And then just operating with &mut Node<K> in this method (and maybe some others too)?

Copy link
Contributor Author

@austinabell austinabell Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, absolutely did, the issue with the current impl is you can't borrow this mutably and borrow other parts of the vec immutably. Yes we know there isn't overlap, but would need unsafe for this to happen. This is why clones currently happen in the implementation, but is still an improvement over the existing implementation because it was loading and deserializing the value where the clone was.

This can be optimized in the future without breaking changes with the current API, but would require almost a full re-write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I thought something like that might be the case, thanks for confirming.

@austinabell austinabell merged commit 44d1976 into master Jan 13, 2022
@austinabell austinabell deleted the austin/store/treemap branch January 13, 2022 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v2 TreeMap
3 participants