kvserver: redesign load-based splitter to be consistent with new rebalancing signals #90574

KaiSun314 · 2022-10-24T21:19:32Z

Problem:

The current load based splitter is tuned for determining a split point that will evenly divide QPS of a range if it were split. However it is desirable to experiment with new rebalancing signals, which should be symmetric in both the rebalancer and lb splitter to avoid inconsistency where the range doesn’t split yet cannot be rebalanced due to being too large. For example CPU, the LB splitter does not support finding a split key where the magnitude of each span recorded is not 1. A simple solution is to re-record the span against the decider for each increment in magnitude, however due to the sampling method this would not be fairly weighted in finding a split key.

Solution should support:

Recording a positive integer (n) against a span [start, end] at a specific time (ts). Where the n is some measure of load recorded for a span e.g. Record(ts, n, span).
Finding a split key such that the load on the resulting split ranges would be as equal as possible according to the recorded loads above e.g. Key().
Heavier weighting toward more recent recorded spans.
Splitting the data structure so that after we split the current range by the split key, the resulting split ranges can immediately have knowledge of the load / distribution for potential future splits.
Being fast and memory-efficient.

Jira issue: CRDB-20841

Epic CRDB-20845

91636: split: Add a testing framework to benchmark load splitter approaches r=kvoli a=KaiSun314 Informs: #90574 Here, we create a testing framework to benchmark load-based splitters, to enable experimenting and comparison of different designs and settings. This testing framework inputs: 1. Generator settings to generate the requests for the load splitter to record - Start key generator type (zipfian or uniform) and iMax - Span length generator type (zipfian or uniform) and iMax - Weight generator type (zipfian or uniform) and iMax 2. Range request percent (percent of range requests [startKey, endKey) as opposed to point requests with just a start key) 3. Load-based splitter constructor 4. Random seed This testing framework performs the following work: 1. Generates the requests for the load splitter to record 2. Calculates the optimal split key and its left / right weights 3. Constructs the load splitter and invokes the load splitter's Record function on all the generated requests 4. Invokes the load splitter's Key function to get the split key and calculates its left / right weights 5. Maintains the times it took to execute the load splitter's Record and Key functions This testing framework also supports repeating the test multiple times with different random numbers and calculating the average / max percentage differences of left / right weights and execution times. This testing framework also supports running tests with different settings and printing the test results for each setting. Example of testing framework output: <img width="1452" alt="Load-Based Splitter Testing Framework Sample Output" src="https://user-images.githubusercontent.com/28762332/201250317-7bf71fd7-0ef2-44d9-807a-45c66aab79ea.png"> Release note: None Co-authored-by: Kai Sun <[email protected]>

…alancing signals. Fixes: cockroachdb#90574 In the current load splitter, we find the split key that best balances the QPS of the left and right sides. As a result, each request is unweighted, since one request contributes one to the QPS. In particular, the load splitter does not differentiate between what kinds of requests they are, how heavy the request is, and what resources these requests consume, which can result in scenarios where QPS is balanced but one side has a lot more work due to a few heavy requests. Moreover, the current load splitter treats requests that contain a split key as “contained”. Optimizing for QPS, contained requests are bad since splitting at a point in a contained request will not help lower the QPS of either side. However, optimizing for other signals like CPU, splitting at a point in a contained request is great as each side will get part of the work of processing that request. This motivates a redesign of the load splitter, one that enables recording weighted requests and considers contained requests in the weight balancing for splitting. In this PR, we redesign the load-based splitter with the following interface: 1. Record a point key “start” or span “[start, end)” with a weight “w” at a specific time “ts”, where “w” is some measure of load recorded for a span e.g. Record(ts, start, w) or Record(ts, [start, end), w) 2. Find a split key such that the load (i.e. total weight) on the resulting split ranges would be as equal as possible according to the recorded loads above e.g. Key() To make the current load-based splitter (Finder) weighted, we make the following modifications: 1. Instead of using reservoir sampling, we use weighted reservoir sampling (a simplified version of A-Chao) 2. Remove the contained counter 3. Increment the left and right counters by the weight of the request rather than just 1 4. Treat a weighted range request ([start, end), w) into two weighted point requests (start, w/2) and (end, w/2) For more details, see the design doc: https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x. Release note (ops change): The load-based splitter has been redesigned to be more consistent with CPU-based rebalancing rather than QPS-based rebalancing to improve range splits.

91593: kvserver: add load vec r=nvanbenschoten,andrewbaptist a=kvoli Previously, disparate types were used when comparing `StoreCapacity`, `XThreshold` and `RangeUsageInfo`. This patch introduces a uniform intermediate type `Load`, which is used to perform arithmetic and comparison between types representing load. The purpose of this change is to decouple changes to inputs, enable modifying existing dimensions and adding new ones with less code modification. The only load dimension added in this patch is `Queries`, which is then used in place of `QueriesPerSecond` within the rebalancing logic. Additionally, `RangeUsageInfo` is uniformly passed around in place of any specific calls to the underlying tracking datastructure `ReplicaStats`. Part of #91152 Release note: None 93838: split: Redesign the load-based splitter to be consistent with new rebalancing signals. r=kvoli a=KaiSun314 Fixes: #90574 In the current load splitter, we find the split key that best balances the QPS of the left and right sides. As a result, each request is unweighted, since one request contributes one to the QPS. In particular, the load splitter does not differentiate between what kinds of requests they are, how heavy the request is, and what resources these requests consume, which can result in scenarios where QPS is balanced but one side has a lot more work due to a few heavy requests. Moreover, the current load splitter treats requests that contain a split key as “contained”. Optimizing for QPS, contained requests are bad since splitting at a point in a contained request will not help lower the QPS of either side. However, optimizing for other signals like CPU, splitting at a point in a contained request is great as each side will get part of the work of processing that request. This motivates a redesign of the load splitter, one that enables recording weighted requests and considers contained requests in the weight balancing for splitting. In this PR, we redesign the load-based splitter with the following interface: 1. Record a point key “start” or span “[start, end)” with a weight “w” at a specific time “ts”, where “w” is some measure of load recorded for a span e.g. Record(ts, start, w) or Record(ts, [start, end), w) 2. Find a split key such that the load (i.e. total weight) on the resulting split ranges would be as equal as possible according to the recorded loads above e.g. Key() To make the current load-based splitter (Finder) weighted, we make the following modifications: 1. Instead of using reservoir sampling, we use weighted reservoir sampling (a simplified version of A-Chao) 2. Remove the contained counter 3. Increment the left and right counters by the weight of the request rather than just 1 4. Treat a weighted range request ([start, end), w) into two weighted point requests (start, w/2) and (end, w/2) For more details, see (internal) https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x. Release note (ops change): The load-based splitter has been redesigned to be more consistent with CPU-based rebalancing rather than QPS-based rebalancing to improve range splits. Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Kai Sun <[email protected]>

KaiSun314 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Oct 24, 2022

KaiSun314 added this to the 23.1 milestone Oct 24, 2022

KaiSun314 self-assigned this Oct 24, 2022

blathers-crl bot added the T-kv KV Team label Oct 24, 2022

kvoli mentioned this issue Oct 25, 2022

kvserver: allocator cpu balancing for overload protection #90582

Closed

13 tasks

irfansharif changed the title ~~Redesign Load-Based Splitter Consistent With New Rebalancing Signals~~ kvserver: redesign load-based splitter consistent with new rebalancing signals Oct 28, 2022

irfansharif changed the title ~~kvserver: redesign load-based splitter consistent with new rebalancing signals~~ kvserver: redesign load-based splitter to be consistent with new rebalancing signals Oct 28, 2022

KaiSun314 mentioned this issue Nov 10, 2022

split: Add a testing framework to benchmark load splitter approaches #91636

Merged

kvoli mentioned this issue Nov 10, 2022

ttl: consider either disabling or throttling splits caused by DELETEs from the job #90713

Closed

KaiSun314 mentioned this issue Dec 17, 2022

split: Redesign the load-based splitter to be consistent with new rebalancing signals. #93838

Merged

craig bot closed this as completed in 09adcc0 Jan 3, 2023

jlinder unassigned KaiSun314 Jan 3, 2023

kvoli mentioned this issue Jan 17, 2023

kvserver: cpu based load splitting #95377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: redesign load-based splitter to be consistent with new rebalancing signals #90574

kvserver: redesign load-based splitter to be consistent with new rebalancing signals #90574

KaiSun314 commented Oct 24, 2022 •

edited by exalate-issue-sync bot

Loading

kvserver: redesign load-based splitter to be consistent with new rebalancing signals #90574

kvserver: redesign load-based splitter to be consistent with new rebalancing signals #90574

Comments

KaiSun314 commented Oct 24, 2022 • edited by exalate-issue-sync bot Loading

KaiSun314 commented Oct 24, 2022 •

edited by exalate-issue-sync bot

Loading