Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: redesign load-based splitter to be consistent with new rebalancing signals #90574

Closed
Tracked by #90582
KaiSun314 opened this issue Oct 24, 2022 · 0 comments · Fixed by #93838
Closed
Tracked by #90582
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Milestone

Comments

@KaiSun314
Copy link
Contributor

KaiSun314 commented Oct 24, 2022

Problem:

The current load based splitter is tuned for determining a split point that will evenly divide QPS of a range if it were split. However it is desirable to experiment with new rebalancing signals, which should be symmetric in both the rebalancer and lb splitter to avoid inconsistency where the range doesn’t split yet cannot be rebalanced due to being too large. For example CPU, the LB splitter does not support finding a split key where the magnitude of each span recorded is not 1. A simple solution is to re-record the span against the decider for each increment in magnitude, however due to the sampling method this would not be fairly weighted in finding a split key.

Solution should support:

  • Recording a positive integer (n) against a span [start, end] at a specific time (ts). Where the n is some measure of load recorded for a span e.g. Record(ts, n, span).
  • Finding a split key such that the load on the resulting split ranges would be as equal as possible according to the recorded loads above e.g. Key().
  • Heavier weighting toward more recent recorded spans.
  • Splitting the data structure so that after we split the current range by the split key, the resulting split ranges can immediately have knowledge of the load / distribution for potential future splits.
  • Being fast and memory-efficient.

Jira issue: CRDB-20841

Epic CRDB-20845

@KaiSun314 KaiSun314 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Oct 24, 2022
@KaiSun314 KaiSun314 added this to the 23.1 milestone Oct 24, 2022
@KaiSun314 KaiSun314 self-assigned this Oct 24, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Oct 24, 2022
@irfansharif irfansharif changed the title Redesign Load-Based Splitter Consistent With New Rebalancing Signals kvserver: redesign load-based splitter consistent with new rebalancing signals Oct 28, 2022
@irfansharif irfansharif changed the title kvserver: redesign load-based splitter consistent with new rebalancing signals kvserver: redesign load-based splitter to be consistent with new rebalancing signals Oct 28, 2022
craig bot pushed a commit that referenced this issue Nov 14, 2022
91636: split: Add a testing framework to benchmark load splitter approaches r=kvoli a=KaiSun314

Informs: #90574

Here, we create a testing framework to benchmark load-based splitters, to enable experimenting and comparison of different designs and settings.

This testing framework inputs:
1. Generator settings to generate the requests for the load splitter to record
- Start key generator type (zipfian or uniform) and iMax
- Span length generator type (zipfian or uniform) and iMax
- Weight generator type (zipfian or uniform) and iMax
2. Range request percent (percent of range requests [startKey, endKey) as opposed to point requests with just a start key)
3. Load-based splitter constructor
4. Random seed

This testing framework performs the following work:
1. Generates the requests for the load splitter to record
2. Calculates the optimal split key and its left / right weights
3. Constructs the load splitter and invokes the load splitter's Record function on all the generated requests
4. Invokes the load splitter's Key function to get the split key and calculates its left / right weights
5. Maintains the times it took to execute the load splitter's Record and Key functions

This testing framework also supports repeating the test multiple times with different random numbers and calculating the average / max percentage differences of left / right weights and execution times. This testing framework also supports running tests with different settings and printing the test results for each setting.

Example of testing framework output:
<img width="1452" alt="Load-Based Splitter Testing Framework Sample Output" src="https://user-images.githubusercontent.com/28762332/201250317-7bf71fd7-0ef2-44d9-807a-45c66aab79ea.png">

Release note: None

Co-authored-by: Kai Sun <[email protected]>
KaiSun314 added a commit to KaiSun314/cockroach that referenced this issue Dec 17, 2022
…alancing signals.

Fixes: cockroachdb#90574

In the current load splitter, we find the split key that best balances
the QPS of the left and right sides. As a result, each request is
unweighted, since one request contributes one to the QPS. In particular,
the load splitter does not differentiate between what kinds of requests
they are, how heavy the request is, and what resources these requests
consume, which can result in scenarios where QPS is balanced but one
side has a lot more work due to a few heavy requests. Moreover, the
current load splitter treats requests that contain a split key as
“contained”. Optimizing for QPS, contained requests are bad since
splitting at a point in a contained request will not help lower the QPS
of either side. However, optimizing for other signals like CPU,
splitting at a point in a contained request is great as each side will
get part of the work of processing that request. This motivates a
redesign of the load splitter, one that enables recording weighted
requests and considers contained requests in the weight balancing for
splitting.

In this PR, we redesign the load-based splitter with the following
interface:
1. Record a point key “start” or span “[start, end)” with a weight “w”
at a specific time “ts”, where “w” is some measure of load recorded for
a span e.g. Record(ts, start, w) or Record(ts, [start, end), w)
2. Find a split key such that the load (i.e. total weight) on the
resulting split ranges would be as equal as possible according to the
recorded loads above e.g. Key()

To make the current load-based splitter (Finder) weighted, we make the
following modifications:
1. Instead of using reservoir sampling, we use weighted reservoir
sampling (a simplified version of A-Chao)
2. Remove the contained counter
3. Increment the left and right counters by the weight of the request
rather than just 1
4. Treat a weighted range request ([start, end), w) into two weighted
point requests (start, w/2) and (end, w/2)

For more details, see the design doc:
https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x.

Release note (ops change): The load-based splitter has been redesigned
to be more consistent with CPU-based rebalancing rather than QPS-based
rebalancing to improve range splits.
KaiSun314 added a commit to KaiSun314/cockroach that referenced this issue Dec 22, 2022
…alancing signals.

Fixes: cockroachdb#90574

In the current load splitter, we find the split key that best balances
the QPS of the left and right sides. As a result, each request is
unweighted, since one request contributes one to the QPS. In particular,
the load splitter does not differentiate between what kinds of requests
they are, how heavy the request is, and what resources these requests
consume, which can result in scenarios where QPS is balanced but one
side has a lot more work due to a few heavy requests. Moreover, the
current load splitter treats requests that contain a split key as
“contained”. Optimizing for QPS, contained requests are bad since
splitting at a point in a contained request will not help lower the QPS
of either side. However, optimizing for other signals like CPU,
splitting at a point in a contained request is great as each side will
get part of the work of processing that request. This motivates a
redesign of the load splitter, one that enables recording weighted
requests and considers contained requests in the weight balancing for
splitting.

In this PR, we redesign the load-based splitter with the following
interface:
1. Record a point key “start” or span “[start, end)” with a weight “w”
at a specific time “ts”, where “w” is some measure of load recorded for
a span e.g. Record(ts, start, w) or Record(ts, [start, end), w)
2. Find a split key such that the load (i.e. total weight) on the
resulting split ranges would be as equal as possible according to the
recorded loads above e.g. Key()

To make the current load-based splitter (Finder) weighted, we make the
following modifications:
1. Instead of using reservoir sampling, we use weighted reservoir
sampling (a simplified version of A-Chao)
2. Remove the contained counter
3. Increment the left and right counters by the weight of the request
rather than just 1
4. Treat a weighted range request ([start, end), w) into two weighted
point requests (start, w/2) and (end, w/2)

For more details, see the design doc:
https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x.

Release note (ops change): The load-based splitter has been redesigned
to be more consistent with CPU-based rebalancing rather than QPS-based
rebalancing to improve range splits.
KaiSun314 added a commit to KaiSun314/cockroach that referenced this issue Dec 22, 2022
…alancing signals.

Fixes: cockroachdb#90574

In the current load splitter, we find the split key that best balances
the QPS of the left and right sides. As a result, each request is
unweighted, since one request contributes one to the QPS. In particular,
the load splitter does not differentiate between what kinds of requests
they are, how heavy the request is, and what resources these requests
consume, which can result in scenarios where QPS is balanced but one
side has a lot more work due to a few heavy requests. Moreover, the
current load splitter treats requests that contain a split key as
“contained”. Optimizing for QPS, contained requests are bad since
splitting at a point in a contained request will not help lower the QPS
of either side. However, optimizing for other signals like CPU,
splitting at a point in a contained request is great as each side will
get part of the work of processing that request. This motivates a
redesign of the load splitter, one that enables recording weighted
requests and considers contained requests in the weight balancing for
splitting.

In this PR, we redesign the load-based splitter with the following
interface:
1. Record a point key “start” or span “[start, end)” with a weight “w”
at a specific time “ts”, where “w” is some measure of load recorded for
a span e.g. Record(ts, start, w) or Record(ts, [start, end), w)
2. Find a split key such that the load (i.e. total weight) on the
resulting split ranges would be as equal as possible according to the
recorded loads above e.g. Key()

To make the current load-based splitter (Finder) weighted, we make the
following modifications:
1. Instead of using reservoir sampling, we use weighted reservoir
sampling (a simplified version of A-Chao)
2. Remove the contained counter
3. Increment the left and right counters by the weight of the request
rather than just 1
4. Treat a weighted range request ([start, end), w) into two weighted
point requests (start, w/2) and (end, w/2)

For more details, see the design doc:
https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x.

Release note (ops change): The load-based splitter has been redesigned
to be more consistent with CPU-based rebalancing rather than QPS-based
rebalancing to improve range splits.
KaiSun314 added a commit to KaiSun314/cockroach that referenced this issue Jan 2, 2023
…alancing signals.

Fixes: cockroachdb#90574

In the current load splitter, we find the split key that best balances
the QPS of the left and right sides. As a result, each request is
unweighted, since one request contributes one to the QPS. In particular,
the load splitter does not differentiate between what kinds of requests
they are, how heavy the request is, and what resources these requests
consume, which can result in scenarios where QPS is balanced but one
side has a lot more work due to a few heavy requests. Moreover, the
current load splitter treats requests that contain a split key as
“contained”. Optimizing for QPS, contained requests are bad since
splitting at a point in a contained request will not help lower the QPS
of either side. However, optimizing for other signals like CPU,
splitting at a point in a contained request is great as each side will
get part of the work of processing that request. This motivates a
redesign of the load splitter, one that enables recording weighted
requests and considers contained requests in the weight balancing for
splitting.

In this PR, we redesign the load-based splitter with the following
interface:
1. Record a point key “start” or span “[start, end)” with a weight “w”
at a specific time “ts”, where “w” is some measure of load recorded for
a span e.g. Record(ts, start, w) or Record(ts, [start, end), w)
2. Find a split key such that the load (i.e. total weight) on the
resulting split ranges would be as equal as possible according to the
recorded loads above e.g. Key()

To make the current load-based splitter (Finder) weighted, we make the
following modifications:
1. Instead of using reservoir sampling, we use weighted reservoir
sampling (a simplified version of A-Chao)
2. Remove the contained counter
3. Increment the left and right counters by the weight of the request
rather than just 1
4. Treat a weighted range request ([start, end), w) into two weighted
point requests (start, w/2) and (end, w/2)

For more details, see the design doc:
https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x.

Release note (ops change): The load-based splitter has been redesigned
to be more consistent with CPU-based rebalancing rather than QPS-based
rebalancing to improve range splits.
craig bot pushed a commit that referenced this issue Jan 3, 2023
91593: kvserver: add load vec r=nvanbenschoten,andrewbaptist a=kvoli

Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of #91152

Release note: None

93838: split: Redesign the load-based splitter to be consistent with new rebalancing signals. r=kvoli a=KaiSun314

Fixes: #90574

In the current load splitter, we find the split key that best balances the QPS of the left and right sides. As a result, each request is unweighted, since one request contributes one to the QPS. In particular, the load splitter does not differentiate between what kinds of requests they are, how heavy the request is, and what resources these requests consume, which can result in scenarios where QPS is balanced but one side has a lot more work due to a few heavy requests. Moreover, the current load splitter treats requests that contain a split key as “contained”. Optimizing for QPS, contained requests are bad since splitting at a point in a contained request will not help lower the QPS of either side. However, optimizing for other signals like CPU, splitting at a point in a contained request is great as each side will get part of the work of processing that request. This motivates a redesign of the load splitter, one that enables recording weighted requests and considers contained requests in the weight balancing for splitting.

In this PR, we redesign the load-based splitter with the following interface:
1. Record a point key “start” or span “[start, end)” with a weight “w” at a specific time “ts”, where “w” is some measure of load recorded for a span e.g. Record(ts, start, w) or Record(ts, [start, end), w)
2. Find a split key such that the load (i.e. total weight) on the resulting split ranges would be as equal as possible according to the recorded loads above e.g. Key()

To make the current load-based splitter (Finder) weighted, we make the following modifications:
1. Instead of using reservoir sampling, we use weighted reservoir sampling (a simplified version of A-Chao)
2. Remove the contained counter
3. Increment the left and right counters by the weight of the request rather than just 1
4. Treat a weighted range request ([start, end), w) into two weighted point requests (start, w/2) and (end, w/2)

For more details, see (internal)
https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x.

Release note (ops change): The load-based splitter has been redesigned to be more consistent with CPU-based rebalancing rather than QPS-based rebalancing to improve range splits.

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Kai Sun <[email protected]>
@craig craig bot closed this as completed in 09adcc0 Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
1 participant