-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: increase the number of ranges in rebalance/by-load tests #102801
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
kvoli
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-distribution
Relating to rebalancing and leasing.
labels
May 5, 2023
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
May 5, 2023
The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) cockroachdb#88829. This commit updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This commit updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: cockroachdb#102801 resolves: cockroachdb#102823 Release note: None
craig bot
pushed a commit
that referenced
this issue
May 9, 2023
102824: roachtest: assert on CPU in rebalance/by-load r=aliher1911 a=kvoli The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) #88829. This PR updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This PR updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: #102801 resolves: #102823 Release note: None Co-authored-by: Austen McClernon <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
May 9, 2023
The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) #88829. This commit updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This commit updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: #102801 resolves: #102823 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Jun 27, 2023
The `rebalance/by-load/replicas` roachtests periodically flake due to a known limitation, where a cluster with a small number of ranges may not be properly balanced due to heterogeneous localities (including multi-store) cockroachdb#88829. This commit updates the total number of ranges from 1 per-store, to 5 per-store for `rebalance/by-load/*` roachtests. The roachtest asserted on the lease count, as a proxy for load, assuming that the load evenly hits each lease for the created ranges. However, the principal indicator of load in a read heavy workload is CPU. This commit updates the test assertion to require that every store's CPU is within 10% of the cluster mean. The test assertion previously required that the max-min lease count delta was 0, when no outside splits occurred; or 1 when the number of ranges was greater than the number of stores. The logging format is updated for easier debugging: ``` cpu outside bounds mean=23.9 tolerance=10.0% (±2.4) bounds=[21.5, 26.3] below = [] within = [s1: 22 (-6.7%), s2: 22 (-6.3%)] above = [s3: 26 (+13.0%)] ... cpu within bounds mean=25.7 tolerance=10.0% (±2.6) bounds=[23.2, 28.3] stores=[s1: 24 (-3.0%), s2: 24 (-2.9%), s3: 27 (+5.9%)] ``` resolves: cockroachdb#102801 resolves: cockroachdb#102823 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
In the
rebalance/by-load/...
roachtest, a KV table is initialized and split into ranges so that the number of ranges is equal to the number of stores.cockroach/pkg/cmd/roachtest/tests/rebalance_load.go
Line 63 in 9189237
These tests periodically fail due to #88829. Since this is a known limitation, we should update the roachtests to use a greater number of ranges, so that we can still assert on load being rebalanced but without the noise from this limitation.
Jira issue: CRDB-27661
The text was updated successfully, but these errors were encountered: