-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qa: Load-based rebalancing of leases and replicas #30007
Comments
@vilterp you've been randomly assigned this issue. The goal here is to take this feature for a spin and find problems. |
Any estimate of when this will be QA'ed? Sooner is obviously better if it manages to turn up any problems. |
Hey @a-robinson, looking at it today. It's taken me a bit to wrap my head around how this is different than the old stats-based rebalancing approach (seems the docs aren't written yet) but I'm getting a handle on it. |
Thanks! Let me know if you want to chat at all about it. |
At this point I've just done the basics — ran the roachtests for this and observed various metrics while they were running, as well as looking at the unit test to understand how the allocator is making this decision. I think we should chat about it though, since I still don't fully grok how this interacts with "follow the workload" (is that different than the prior "stats-based rebalancing")? I.e. if there are a lot of QPS coming from nodes in a certain locality, we want to move leases there, but we also want to balance leaseholders. These goals seem like they could be competing in some scenarios; a more thorough QA should probably explore that. |
Is there any followup work remaining on this issue? |
I took it for a spin and things seemed to work as promised. Wasn't able to find the time to construct more complicated scenarios that would cause failure modes like thrashing, conflict between different rebalancing methods, etc. Would be good to do that testing to shake out any issues, but not sure who has the bandwidth. |
Yeah, from my perspective this never really got the adversarial testing I was hoping for. Including it in the QA rotation would be ideal, although I understand if other QA issues come first. |
@vilterp thanks for taking it so far. I'll keep this issue open and unassign you. Thanks! |
Original issues:
#17979 for replicas
#21419 for leases
PRs:
#28340
#28852
Docs issue:
cockroachdb/docs#2051
The text was updated successfully, but these errors were encountered: