kvserver: load imbalance not observable from logs and debug.zip #107694
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
I was investigating #106140 (comment) and it is obvious there is a load imbalance which the store rebalancer is unable to fix. From the roachtest artifacts, I was unable to determine which replicas were driving the load during the test; the store rebalancer logging stops just shy of being useful enough for this purpose.
The debug.zip shows some replicas with high CPU per second, however they are quiesced and they don't seem to have enough write activity to justify that they drove the load over the tens of minutes the test was running.
Printing, e.g., the topk drivers of cpu/sec whenever lease rebalancing can no longer make progress, or something like that, would've been helpful here.
Of course historical hot ranges would be even better. As far as I know, there is no way to get that from a debug.zip.
Jira issue: CRDB-30162
The text was updated successfully, but these errors were encountered: