Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: perturbation/metamorphic/decommission failed #135241

Open
cockroach-teamcity opened this issue Nov 15, 2024 · 17 comments
Open

roachtest: perturbation/metamorphic/decommission failed #135241

cockroach-teamcity opened this issue Nov 15, 2024 · 17 comments
Assignees
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-3 Issues/test failures with no fix SLA T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 15, 2024

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 6610d705724a21c836f3521f75972e65d9e9e2d4:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:493
	            				pkg/cmd/roachtest/test_runner.go:1307
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 9.6046 > 5.0000 BASE: 3.850596ms SCORE: 36.983429ms
	            	
	            	FAILURE: read           : Increase 9.2015 > 5.0000 BASE: 3.917431ms SCORE: 36.046384ms
	            	
	            	FAILURE: write          : Increase 14.2758 > 5.0000 BASE: 3.817288ms SCORE: 54.494675ms
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=32
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-44411

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Nov 15, 2024
@andrewbaptist andrewbaptist added T-storage Storage Team T-admission-control Admission Control and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Nov 15, 2024
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Nov 15, 2024
@andrewbaptist andrewbaptist added the P-3 Issues/test failures with no fix SLA label Nov 15, 2024
@andrewbaptist
Copy link
Collaborator

This was caused by CPU admission control on n10 with the system waiting on the kv-regular-store-queue for up to 5 seconds during the decommission process. I'm not sure why this kicked in during this test.

Here is a link to the stats at the time.

The CPU was reasonable (~50%), the goroutine count was low and the goroutine scheduler latency wass also low:

Attached is a profile of a slow operation:

2024-11-15T07_15_40Z-UPSERT-5796.155ms.zip

I think the reason for this slowdown was because the system was so "overprovisioned" for this workload.

The config is:

2024/11/15 06:57:22 framework.go:404: test variations are: seed: 6732599062455550972, fillDuration: 10m0s, maxBlockBytes: 1, perturbationDuration: 10s, validationDuration: 5m0s, ratioOfMax: 0.500000, splits: 10000, numNodes: 12, numWorkloadNodes: 1, vcpu: 32, disks: 2, memory: standard, leaseType: epoch, cloud: gce, perturbation: {drain:true}

@kvoli kvoli removed the T-kv KV Team label Nov 15, 2024
@exalate-issue-sync exalate-issue-sync bot removed the T-storage Storage Team label Nov 15, 2024
@aadityasondhi
Copy link
Collaborator

I am not familiar with test so I am providing a purely AC overload analysis below. Hopefully this helps in investigating the issue.

This was caused by CPU admission control on n10

What evidence did we have for CPU AC kicking in?

system waiting on the kv-regular-store-queue for up to 5 seconds during the decommission process

I can see that in the metrics as well. The store queue is for stores (i.e. IO overload).

P99 SQL Latency spike near the failure
image
Correlates well with IO overload
Store work queue has waiting requests. image
Substantial spike in L0 sublevels image
Compaction queue building up image
Read + write bandwidth for reference in case it turns out to be useful later
image

Now looking into logs.

IO load listener

I241115 07:15:25.876713 654 util/admission/io_load_listener.go:665 ⋮ [T1,Vsystem,n10,s20] 52223 IO overload: compaction score 0.650 (466 ssts, 40 sub-levels), L0 growth 32 MiB (write 32 MiB (ignored 0 B) ingest 118 KiB (ignored 30 MiB)): requests 127093 (88146 bypassed) with 5.6 MiB acc-write (3.9 MiB bypassed) + 0 B acc-ingest (0 B bypassed) + 32 MiB adjusted-LSM-writes + 690 MiB adjusted-disk-writes + write-model 3.00x+124 B (smoothed 2.85x+62 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + write-amp-model 21.63x+1 B (smoothed 33.12x+1 B) + at-admission-tokens 188 B, compacted 28 MiB [≈21 MiB], flushed 469 MiB [≈0 B] (mult 1.00); admitting 16 MiB (rate 1.1 MiB/s) (elastic 1 B rate 0 B/s) due to L0 growth (used total: 15 MiB elastic 0 B); write stalls 0; diskBandwidthLimiter (tokenUtilization 0.00, tokensUsed (elastic 0 B, snapshot 0 B, regular 0 B) tokens (write 0 B (prev 0 B)), writeBW 0 B/s, readBW 0 B/s, provisioned 0 B/s)


I241115 07:15:25.876561 654 util/admission/io_load_listener.go:665 ⋮ [T1,Vsystem,n10,s19] 52222 IO overload: compaction score 0.600 (304 ssts, 37 sub-levels), L0 growth 30 MiB (write 29 MiB (ignored 14 KiB) ingest 92 KiB (ignored 20 MiB)): requests 125912 (87037 bypassed) with 5.5 MiB acc-write (3.9 MiB bypassed) + 0 B acc-ingest (0 B bypassed) + 29 MiB adjusted-LSM-writes + 741 MiB adjusted-disk-writes + write-model 3.00x+106 B (smoothed 2.84x+53 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + write-amp-model 25.18x+1 B (smoothed 34.41x+1 B) + at-admission-tokens 179 B, compacted 26 MiB [≈20 MiB], flushed 531 MiB [≈0 B] (mult 1.00); admitting 16 MiB (rate 1.1 MiB/s) (elastic 1 B rate 0 B/s) due to L0 growth (used total: 15 MiB elastic 0 B); write stalls 0; diskBandwidthLimiter (tokenUtilization 0.00, tokensUsed (elastic 0 B, snapshot 0 B, regular 0 B) tokens (write 0 B (prev 0 B)), writeBW 0 B/s, readBW 0 B/s, provisioned 0 B/s)

My take here is that we are seeing IO overload due to the growth of L0 (see metrics above). And the reason for the high growth seems to replication writes that are bypassing AC (from logs). Example: requests 125912 (87037 bypassed) with 5.5 MiB acc-write (3.9 MiB bypassed).

Not familiar with the test to know if we expect to have so many bypassed writes. But if these writes are bypassing AC, either an issue with the integration of RAC or the workload is overloading the system.

One side thing to rule out would be bandwidth saturation but I doubt that is the case given this is GCP cluster (high provisioned bandwidth by default) and we have evidence of high bypassed writes.

@andrewbaptist Let me know if this is helpful. It seems to be the classic case of replicated writes overloading the store.

@andrewbaptist
Copy link
Collaborator

Thanks for the clarification on the AC metric, I had misread it. You are correct that it is due to IO overload.

To clarify what this test does, it determines what the "50% usage" of the cluster is and runs a constant workload expecting consistent throughput and latency while it makes a change. The rate for this cluster was 134,832 requests/sec.

It then runs a decommission of the node.

The decommission runs from 07:15:06 - 07:15:43. During this ~30s window, there are a lot of background snapshots sent but the regular traffic load does not change (or at least should not change).

However here we see it drops quite a bit by about a factor of 4, because some requests experience significant AC delay.

What I don't understand is why this test is causing IO overload at all. I expected the snapshots will go straight to L6.

The expectation if the system can't handle the incoming rate would be to either:

  1. Slow down snapshot transfers
  2. Increase compaction rate to prevent overload.

I will take a bit more of a look this afternoon to see why the LSM is getting so inverted.

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:645
	            				pkg/cmd/roachtest/test_runner.go:1307
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 11.6611 > 5.0000 BASE: 20.186537ms SCORE: 235.397432ms
	            	
	            	FAILURE: read           : Increase 11.9632 > 5.0000 BASE: 19.833571ms SCORE: 237.273606ms
	            	
	            	FAILURE: write          : Increase 14.5329 > 5.0000 BASE: 17.030372ms SCORE: 247.500542ms
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@aadityasondhi
Copy link
Collaborator

Following our slack discussion.

I want to wrap up the open thread here. The TLDR is that we saw L0 sublevel growth simply because bytes into L0 > compacted bytes out of L0. We see from metrics and logs that there were no snapshots being ingested into L0, so pacing snapshots would do nothing to help here. A quick mitigation here is to increase compaction concurrency for this test using the env variable.

Some more details about the above:

  • In the IO load listener logs above we see L0 growth 32 MiB (write 32 MiB (ignored 0 B) , note that no ingests are landing in L0, so slowing down snapshot ingest would do nothing to help here. Comparing the growth (bytes into L0) with the compacted bytes shows a positive delta: compacted 28 MiB [≈21 MiB].
  • AC kicks in because of the overload threshold >0.5 in this case, but it cannot pace replication traffic that is regular priority. RACv2 should address that. requests 127093 (88146 bypassed) High number of bypassed bytes.
  • What you see here happen is, AC sees that the L0 is growing because of the rates mentioned above and it will dial down the tokens to avoid this L0 growth. And since we cannot pace regular replication traffic (bypassed bytes) it will limit only the locally admitted writes. Thus, leading to AC queuing that this test saw.

Ideally, we will have replication AC for regular traffic. In the meantime, my recommendation for you is to increase the compaction concurrency for this test since we have ample CPU.

I am going to remove the AC assignment from this issue. There is nothing actionable on AC that could be done here. Other than having RACv2 for regular traffic, which is tracked separately.

@aadityasondhi aadityasondhi removed their assignment Nov 19, 2024
@aadityasondhi aadityasondhi added T-kv KV Team and removed A-storage Relating to our storage engine (Pebble) on-disk storage. T-admission-control Admission Control labels Nov 19, 2024
@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Nov 19, 2024
@kvoli
Copy link
Collaborator

kvoli commented Nov 19, 2024

Marking as a c-bug/a-testing based on the above comment which suggests increasing the compaction concurrency (configuration related).

@exalate-issue-sync exalate-issue-sync bot removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Nov 19, 2024
@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Nov 19, 2024
@andrewbaptist
Copy link
Collaborator

Adding a link to #74697 since I don't see a more general story for auto-tuning compaction concurrency. Also a link to the current guidance: https://www.cockroachlabs.com/docs/stable/architecture/storage-layer#compaction

This would be hard to do metamorphically unless there was some more clear guidance on how this should be tuned. And if we have that guidance why wouldn't we encode this in the system rather than in the test framework.

I'll watch for additional failures on this test and try and get a set of workarounds for different hardware configurations.

@aadityasondhi
Copy link
Collaborator

And if we have that guidance why wouldn't we encode this in the system rather than in the test framework.

It's like manually tuning any rate. If there is ample room, the guidance is to keep increasing until the desired effect is reached and you dont over utilize the CPU. We keep the base low to avoid over utilizing the CPU.

Adding a link to #74697

Thanks! We have actively started working on a possible solution to the problem. @itsbilal you might find this interesting as you are starting to look at a design and prototype for such a case.

@aadityasondhi
Copy link
Collaborator

I think this is a more current version of it cockroachdb/pebble#1329.

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 844d7637f3b4dc1275e8aa05c1cf3bbb1f59f8eb:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:653
	            				pkg/cmd/roachtest/test_runner.go:1314
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 18.5366 > 5.0000 BASE: 22.185752ms SCORE: 411.248419ms
	            	
	            	FAILURE: read           : Increase 18.9731 > 5.0000 BASE: 22.053385ms SCORE: 418.420916ms
	            	
	            	FAILURE: write          : Increase 19.4849 > 5.0000 BASE: 15.302167ms SCORE: 298.160995ms
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ d18eb683b2759fd8814dacf0baa913f596074a17:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:671
	            				pkg/cmd/roachtest/test_runner.go:1322
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 247149.3892 > 5.0000 BASE: 14.566089ms SCORE: 1h0m0s
	            	
	            	FAILURE: read           : Increase 249905.4871 > 5.0000 BASE: 14.405446ms SCORE: 1h0m0s
	            	
	            	FAILURE: write          : Increase 320901.2333 > 5.0000 BASE: 11.218405ms SCORE: 1h0m0s
	            	
	            	FAILURE: follower-read  : Increase 247149.3892 > 5.0000 BASE: 14.566089ms SCORE: 1h0m0s
	            	
	            	FAILURE: read           : Increase 249905.4871 > 5.0000 BASE: 14.405446ms SCORE: 1h0m0s
	            	
	            	FAILURE: write          : Increase 320901.2333 > 5.0000 BASE: 11.218405ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=fullNormalElasticRepl
  • arch=amd64
  • blockSize=1
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • diskBandwidthLimit=350MiB
  • disks=1
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=expiration
  • localSSD=true
  • mem=low
  • numNodes=12
  • numWorkloadNodes=1
  • perturbationDuration=30m0s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=7233357114014281534
  • splits=10000
  • ssd=1
  • validationDuration=5m0s
  • vcpu=8
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ e8ee6f574ddf1fce1a4cb53f392c5a9baf633b76:

(cluster.go:2481).Run: full command output in run_071511.534348395_n5_cockroach-node-drain.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=fullBoth
  • arch=amd64
  • blockSize=1024
  • cloud=gce
  • coverageBuild=false
  • cpu=32
  • diskBandwidthLimit=350MiB
  • disks=2
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=epoch
  • localSSD=true
  • mem=standard
  • numNodes=5
  • numWorkloadNodes=1
  • perturbationDuration=30m0s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=1205238940572837635
  • splits=1
  • ssd=2
  • validationDuration=5m0s
  • vcpu=32
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ b63a736c85cfc1a968b74863b7f99c89ddebc1d3:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:673
	            				pkg/cmd/roachtest/test_runner.go:1314
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 109384.2977 > 5.0000 BASE: 32.911488ms SCORE: 1h0m0s
	            	
	            	FAILURE: read           : Increase 101767.7194 > 5.0000 BASE: 35.374675ms SCORE: 1h0m0s
	            	
	            	FAILURE: write          : Increase 31481.4678 > 5.0000 BASE: 114.352991ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=fullBoth
  • arch=amd64
  • blockSize=1024
  • cloud=gce
  • coverageBuild=false
  • cpu=32
  • diskBandwidthLimit=350MiB
  • disks=1
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=epoch
  • localSSD=true
  • mem=standard
  • numNodes=5
  • numWorkloadNodes=1
  • perturbationDuration=10m0s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=5641047121427902826
  • splits=1
  • ssd=1
  • validationDuration=5m0s
  • vcpu=32
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 58422b11aa3ceb11dd5a382a2236d985517e1506:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:673
	            				pkg/cmd/roachtest/test_runner.go:1314
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 6.0368 > 5.0000 BASE: 12.718765ms SCORE: 76.780432ms
	            	
	            	FAILURE: read           : Increase 6.0635 > 5.0000 BASE: 12.635187ms SCORE: 76.613335ms
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=fullNormalElasticRepl
  • arch=amd64
  • blockSize=1
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • diskBandwidthLimit=0
  • disks=2
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=epoch
  • localSSD=true
  • mem=standard
  • numNodes=30
  • numWorkloadNodes=2
  • perturbationDuration=10s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=-1259536967077922990
  • splits=10000
  • ssd=2
  • validationDuration=5m0s
  • vcpu=8
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 47699f3887ad5d1b8c7c5905eb5c49628aa59bbe. A Side-Eye cluster snapshot was captured on timeout: https://app.side-eye.io/#/snapshots/496.

(test_runner.go:1363).runTest: test timed out (3h0m0s)
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=elasticOnlyBoth
  • arch=amd64
  • blockSize=1024
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • diskBandwidthLimit=350MiB
  • disks=1
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=leader
  • localSSD=true
  • mem=low
  • numNodes=12
  • numWorkloadNodes=1
  • perturbationDuration=10s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=2256456537027529277
  • splits=10000
  • ssd=1
  • validationDuration=5m0s
  • vcpu=8
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ 47699f3887ad5d1b8c7c5905eb5c49628aa59bbe. A Side-Eye cluster snapshot was captured on timeout: https://app.side-eye.io/#/snapshots/499.

(test_runner.go:1363).runTest: test timed out (3h0m0s)
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=fullNormalElasticRepl
  • arch=amd64
  • blockSize=1024
  • cloud=gce
  • coverageBuild=false
  • cpu=32
  • diskBandwidthLimit=350MiB
  • disks=1
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=expiration
  • localSSD=true
  • mem=standard
  • numNodes=5
  • numWorkloadNodes=1
  • perturbationDuration=10s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=-8141743592257623532
  • splits=100
  • ssd=1
  • validationDuration=5m0s
  • vcpu=32
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.perturbation/metamorphic/decommission failed with artifacts on master @ efacd11db5f357a69f8b8fd0b10148028d87ed36:

(assertions.go:363).Fail: 
	Error Trace:	pkg/cmd/roachtest/tests/perturbation/framework.go:673
	            				pkg/cmd/roachtest/test_runner.go:1315
	            				src/runtime/asm_amd64.s:1695
	Error:      	Should be true
	Test:       	perturbation/metamorphic/decommission
	Messages:   	FAILURE: follower-read  : Increase 99002.7263 > 5.0000 BASE: 36.362635ms SCORE: 1h0m0s
	            	
	            	FAILURE: read           : Increase 92924.8443 > 5.0000 BASE: 38.740985ms SCORE: 1h0m0s
	            	
	            	FAILURE: write          : Increase 133316.0615 > 5.0000 BASE: 27.003498ms SCORE: 1h0m0s
	            	
	            	FAILURE: follower-read  : Increase 99002.7263 > 5.0000 BASE: 36.362635ms SCORE: 1h0m0s
	            	
	            	FAILURE: read           : Increase 92924.8443 > 5.0000 BASE: 38.740985ms SCORE: 1h0m0s
	            	
	            	FAILURE: write          : Increase 133316.0615 > 5.0000 BASE: 27.003498ms SCORE: 1h0m0s
(require.go:1950).True: FailNow called
test artifacts and logs in: /artifacts/perturbation/metamorphic/decommission/run_1

Parameters:

  • acMode=elasticOnlyBoth
  • arch=amd64
  • blockSize=4096
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • diskBandwidthLimit=0
  • disks=1
  • encrypted=false
  • fillDuration=10m0s
  • fs=ext4
  • leaseType=expiration
  • localSSD=true
  • mem=high
  • numNodes=5
  • numWorkloadNodes=1
  • perturbationDuration=10m0s
  • ratioOfMax=0.5
  • runtimeAssertionsBuild=false
  • seed=-4696941036669659720
  • splits=10000
  • ssd=1
  • validationDuration=5m0s
  • vcpu=8
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-3 Issues/test failures with no fix SLA T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

4 participants