Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid rebalancing from many sources into a single target node/store #82759

Closed
lidorcarmel opened this issue Jun 10, 2022 · 1 comment
Closed
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@lidorcarmel
Copy link
Contributor

lidorcarmel commented Jun 10, 2022

This can up in a conversation with @nvanbenschoten, proposed by @kvoli.

Problem
When a new node is added and the rebalance activity is high, many stores may choose to rebalance ranges into that newly added node. For example, when decommissioning a node and adding a new one: #79560.
This thundering herd problem can slow down decommissioning.

Proposed Solution
Stores should be aware of the number of pending rebalance snapshots coming into other stores, so that a store with a long queue will not be considered as a target for rebalancing. We can gossip the queue length of incoming rebalance snapshot requests to all stores.

Alternatively we can pick a "good enough" target instead of the best target, for example by picking a random (or round robin) store to rebalance to, out of the valid stores. The issue here is that it might take a long time to fill the newly added node (though we did not test that), and it is undesired to have an underutilized node for too long (days instead of hours?).

Jira issue: CRDB-16645

@lidorcarmel lidorcarmel added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Jun 10, 2022
@lidorcarmel
Copy link
Contributor Author

closing, instead we are doing other work in #85445.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

1 participant