-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support customizing the rebalance threshold #422
Comments
The intention might be setting it as |
I think we can add a field
The rebalance threshold should be Line 375 in bd7d6e9
|
Can we make rebalance threshold configurable that defaults to 25% (to keep it backwards compatible)? It can be set on the Bucket similar to
I think this is simpler change to make. |
More flexibility doesn't mean better. I think we would better not to expose the Similarly, I think it makes sense to set the |
I am trying to maximize page utilization for etcd's use case. Compaction deletes revisions from "keys" bucket but Compaction deletes revisions up to a given revision number but it skips the revision if it is the only revision of a key. Because the revision numbers are incremental, there is no point of having free space in pages other than the latest page at the end of b+ tree. If a page is empty, it gets removed from the tree and added to the freelist. However, these half-filled pages could stay forever and waste disk and memory space. If my understanding is correct and we can solve this issue we can eliminate the need to defrag completely. |
Write keys to etcd, fill roughly 3.5 pages. For the KV size below, one pages takes up to ~60 items.
After writing data, pages are:
Pages of “key” bucket:
Pages 2, 4, 3, 5 has 62, 63, 61, 24 items. The first 30 keys are unique, the rest (180 keys) are revisions of the same key. Get latest revision:
Compact to the middle of the 3rd page:
After compaction “key” bucket contains these pages:
Pages 7, 9, 5 has 30, 38, 24 items.
The 30 unique keys in the 1st page will never get deleted if they don’t get a new revision because etcd never deletes the latest revision of the key. Since the revision number are incremental, the items in the first page will take up full page space forever unless that page's utilization goes below 25% which is not a configurable value. |
The compacted/removed key (revisions in etcd) may be scattered everywhere in the db pages, but the key point is etcd always adds new k/v at the end of the last page. So for etcd, we should set a high value for both The downside is there will be more rebalance operation and accordingly more page writing & syncing in each etcd compaction operation. So we need to evaluate the performance impact. The other concern is that there are other buckets (e.g. lease, auth*) which are not append-only. Since K8s doesn't use auth, so it isn't a problem. I am not sure the exact usage of lease in K8s, e.g. how often it's created & deleted. Please also get this clarified. Note: It only makes sense to set a high value for |
Note it still makes sense to set the |
After #518 is merged, users are able to set rebalance percent to values between 0.0 and 0.5. However, it's still not possible to set it to values larger than 0.5. Can we consider again introducing a new field that is set to 0.25 by default? type Bucket struct {
FillPercent float64
// new field
RebalancePercent float64
} |
If you read my comment above and the new title of this issue " |
Cool. I'll try to come up with a way to evaluate the performance impact. |
@cenkalti are you still working on this? Please feel free to let me know if you don't have bandwidth, then I am happy to take over. |
No, I am not working on this anymore. |
It seems that it isn't that simple. Setting a bigger Lines 230 to 246 in 7eb39a6
The default values for |
what do you guys want to optimize for? tighter packing of pages? less memory allocations? |
The goal was to |
Ah I see, so we have different thresholds and then we wanted to configure them and now you've found a new parameter that needs tweaking alongside it :) |
bbolt/node.go
Lines 420 to 424 in 838586a
How this number was chosen?
What happens if we make it configurable and increase or decrease it?
The text was updated successfully, but these errors were encountered: