Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: support different load dimensions in allocation code and cleanup #91152

Closed
Tracked by #90582
kvoli opened this issue Nov 2, 2022 · 1 comment
Closed
Tracked by #90582
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Milestone

Comments

@kvoli
Copy link
Collaborator

kvoli commented Nov 2, 2022

The allocation code currently hardcodes many references to QPS which makes it difficult to swap out the dimension, without duplicating significant amounts of code.

This is unfortunate as the control mechanism for converging a single load dimension across stores is effective and not limited to just QPS.

The solution is to refactor several sections of the allocation code to support registering and using a different load dimension.

The solution should also involve clearing up interactions between the components in the following areas:

  • pkg/kv/kvserver/allocator/*
  • pkg/kv/kvserver/replicate_queue.go
  • pkg/kv/kvserver/store_rebalancer.go
  • pkg/kv/kvserver/replica_rankings.go
  • pkg/kv/kvserver/replicastats/*
  • pkg/kv/kvserver/replica_load.go
  • pkg/kv/kvserver/replica_metrics.go
  • pkg/kv/kvserver/store.go.Capacity()
  • pkg/kv/kvserver/store.go.updateReplicationGuages()
  • pkg/kv/kvserver/store.go.recordNewPerSecondStats()
  • pkg/kv/kvserver/store.go.GossipStore()

Jira issue: CRDB-21126

Epic CRDB-20845

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Nov 2, 2022
@kvoli kvoli self-assigned this Nov 2, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Nov 2, 2022
@kvoli kvoli added this to the 23.1 milestone Nov 2, 2022
@kvoli kvoli changed the title kvserver: support different load dimensions in allocation code kvserver: support different load dimensions in allocation code and cleanup Nov 2, 2022
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 9, 2022
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to the underlying
inputs in these components to enable modifying existing load dimensions
and adding new ones with less code modification.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying accounting `ReplicaStats`, to a
tight dependency.

Part of cockroachdb#91152

Release note: None
craig bot pushed a commit that referenced this issue Nov 9, 2022
90604: ui: correct tooltip message for circuit breaker events chart r=koorosh a=koorosh

This patch corrects the message on circuit breaker events chart to highlight that the chart displays the total number of events per aggregated interval of time instead of "per second".

Release note: None

Epic: none

91158: kvserver: misc cleanup of allocation code r=aayushshah15 a=kvoli

This patch removes the deprecated method `rangeRebalanceThreshold` which
was used for mixed version compatibility between 21.2 and 22.1.

This patch adds a metric
`kv.allocator.load_based_lease_transfers.follow_the_workload` which
tracks the number of times the allocator returned a range lease transfer
target based based on access locality.

Previously there was no insight into how often this condition was hit.

Part of #91152

Co-authored-by: Andrii Vorobiov <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 21, 2022
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of cockroachdb#91152

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 27, 2022
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of cockroachdb#91152

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 29, 2022
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of cockroachdb#91152

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 29, 2022
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of cockroachdb#91152

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 3, 2023
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of cockroachdb#91152

Release note: None
craig bot pushed a commit that referenced this issue Jan 3, 2023
91593: kvserver: add load vec r=nvanbenschoten,andrewbaptist a=kvoli

Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of #91152

Release note: None

93838: split: Redesign the load-based splitter to be consistent with new rebalancing signals. r=kvoli a=KaiSun314

Fixes: #90574

In the current load splitter, we find the split key that best balances the QPS of the left and right sides. As a result, each request is unweighted, since one request contributes one to the QPS. In particular, the load splitter does not differentiate between what kinds of requests they are, how heavy the request is, and what resources these requests consume, which can result in scenarios where QPS is balanced but one side has a lot more work due to a few heavy requests. Moreover, the current load splitter treats requests that contain a split key as “contained”. Optimizing for QPS, contained requests are bad since splitting at a point in a contained request will not help lower the QPS of either side. However, optimizing for other signals like CPU, splitting at a point in a contained request is great as each side will get part of the work of processing that request. This motivates a redesign of the load splitter, one that enables recording weighted requests and considers contained requests in the weight balancing for splitting.

In this PR, we redesign the load-based splitter with the following interface:
1. Record a point key “start” or span “[start, end)” with a weight “w” at a specific time “ts”, where “w” is some measure of load recorded for a span e.g. Record(ts, start, w) or Record(ts, [start, end), w)
2. Find a split key such that the load (i.e. total weight) on the resulting split ranges would be as equal as possible according to the recorded loads above e.g. Key()

To make the current load-based splitter (Finder) weighted, we make the following modifications:
1. Instead of using reservoir sampling, we use weighted reservoir sampling (a simplified version of A-Chao)
2. Remove the contained counter
3. Increment the left and right counters by the weight of the request rather than just 1
4. Treat a weighted range request ([start, end), w) into two weighted point requests (start, w/2) and (end, w/2)

For more details, see (internal)
https://docs.google.com/document/d/1bdSxucz-xFzwnxL3fFXNZsRc9Vsht0oO0XuZrz5Iw84/edit#bookmark=id.xjc41tm3jx3x.

Release note (ops change): The load-based splitter has been redesigned to be more consistent with CPU-based rebalancing rather than QPS-based rebalancing to improve range splits.

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Kai Sun <[email protected]>
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 3, 2023
Previously, disparate types were used when comparing `StoreCapacity`,
`XThreshold` and `RangeUsageInfo`. This patch introduces a uniform
intermediate type `Load`, which is used to perform arithmetic and
comparison between types representing load.

The purpose of this change is to decouple changes to inputs, enable
modifying existing dimensions and adding new ones with less code
modification.

The only load dimension added in this patch is `Queries`, which is then
used in place of `QueriesPerSecond` within the rebalancing logic.

Additionally, `RangeUsageInfo` is uniformly passed around in place of
any specific calls to the underlying tracking datastructure
`ReplicaStats`.

Part of cockroachdb#91152

Release note: None
@kvoli
Copy link
Collaborator Author

kvoli commented Jan 3, 2023

completed on kvoli@eb764b9

@kvoli kvoli closed this as completed Jan 3, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 12, 2023
Previously, only bytes written at the leaseholder were accounted for on
a replica. Now, follower replicas will record the bytes written.

Informs: cockroachdb#91152

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

1 participant