Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: admission-control/disk-bandwidth-limiter failed #131484

Closed
cockroach-teamcity opened this issue Sep 27, 2024 · 80 comments · Fixed by #134430
Closed

roachtest: admission-control/disk-bandwidth-limiter failed #131484

cockroach-teamcity opened this issue Sep 27, 2024 · 80 comments · Fixed by #134430
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-storage Storage Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 27, 2024

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 67dc7a1c9bf117046b10513c3277bf7ccf0db975:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 80.123750 (68.563359 + 11.560391) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-42564

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team labels Sep 27, 2024
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Sep 27, 2024
@aadityasondhi
Copy link
Collaborator

😞

@aadityasondhi aadityasondhi self-assigned this Sep 27, 2024
@aadityasondhi aadityasondhi removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 27, 2024
@aadityasondhi
Copy link
Collaborator

image image

Seemingly random spike in bandwidth. I think it is likely due to a compaction. And since the bandwidth limiter doesn't react instantly, we hit the threshold. Maybe we should smooth out the assertion here to avoid a spike like this to cause an assertion failure. My hypothesis is that it would auto recover by reducing the amount of tokens available in the next run.

It is likely that our read estimation is problematic since we only adjust for it every 15s and it is based on the past window. Either way, it should have recovered. And we expect to do better (adjust reads at a higher frequency) once we have reads hooked up to the limiter as well.

@sumeerbhola what do you think?

@aadityasondhi
Copy link
Collaborator

We could also lower the utilization threshold. It is currently at 0.8, maybe a value of 0.7 makes more sense. This will give regular work more headroom.

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 67dc7a1c9bf117046b10513c3277bf7ccf0db975:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 78.956172 (69.707344 + 9.248828) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 5400cb9a70e63bfe1aa2849a566c195ad63130d1:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 82.646016 (67.692578 + 14.953437) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ b6c13686495bbe9ad476b28033461ef7628e18a8:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 82.313203 (69.730859 + 12.582344) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ b6c13686495bbe9ad476b28033461ef7628e18a8:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 85.960859 (70.279531 + 15.681328) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ b6c13686495bbe9ad476b28033461ef7628e18a8:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 80.160234 (67.242656 + 12.917578) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ b6c13686495bbe9ad476b28033461ef7628e18a8:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 79.343672 (63.106953 + 16.236719) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 74333311616b937fea6a995462215a1cb5962686:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 79.568672 (64.471953 + 15.096719) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 74333311616b937fea6a995462215a1cb5962686:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 82.909531 (66.356328 + 16.553203) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

@nicktrav nicktrav moved this from Incoming to Tests (failures, skipped, flakes) in [Deprecated] Storage Oct 1, 2024
@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ ec2573dc6aaeefc226440bb2c5a7c94a63989868:

(admission_control_disk_bandwidth_overload.go:138).func1: failed to set kvadmission.store.provisioned_bandwidth: dial tcp 3.16.75.82:26257: connect: connection refused
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ ec2573dc6aaeefc226440bb2c5a7c94a63989868:

(admission_control_disk_bandwidth_overload.go:190).3: write + read bandwidth 83.903906 (70.543906 + 13.360000) exceeded threshold of 78.750000
(cluster.go:2483).Run: context canceled
(cluster.go:2483).Run: context canceled
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 0c0af9540ed3f9d63eba523bc870eeb6c7eebe90:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 94.216719 (56.696250 + 37.520469) exceeded threshold of 78.750000
(cluster.go:2478).Run: context canceled
(cluster.go:2478).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 4de315c9ca4ccf7c3bdbf53a5226e8c14c84a68e:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.540078 (67.169375 + 12.370703) exceeded threshold of 78.750000
(cluster.go:2478).Run: context canceled
(cluster.go:2478).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 4de315c9ca4ccf7c3bdbf53a5226e8c14c84a68e:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 81.436406 (69.104219 + 12.332187) exceeded threshold of 78.750000
(cluster.go:2478).Run: context canceled
(cluster.go:2478).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ f842c3b4b5adc040d411bd17d7d10005273fc1b6:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.308437 (58.347813 + 20.960625) exceeded threshold of 78.750000
(cluster.go:2478).Run: context canceled
(cluster.go:2478).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ becbd0fcdfa2e37a6ff23b33af70f2f91eca0790:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 112.706328 (48.201016 + 64.505313) exceeded threshold of 78.750000
(cluster.go:2449).Run: context canceled
(cluster.go:2449).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ f9918d8f81a1829df63ac734fd6d21c60141e338:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 96.279219 (50.173437 + 46.105781) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 17535c13cfed95db70cd8dfb1ba6a700686f57b1:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 96.964219 (73.436953 + 23.527266) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 17535c13cfed95db70cd8dfb1ba6a700686f57b1:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 82.158437 (67.003125 + 15.155313) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 6bb6dc96ebf0ee2f23c5c568fa0d421019dc0946:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 80.325469 (67.615313 + 12.710156) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 6bb6dc96ebf0ee2f23c5c568fa0d421019dc0946:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.225938 (63.959375 + 15.266562) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 27c521de897105cdeeed88c3a853380c14345a22:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 100.097813 (45.432969 + 54.664844) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 27c521de897105cdeeed88c3a853380c14345a22:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 85.147734 (69.234609 + 15.913125) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 015b2f48cf80a6d8b60d7038c8c3457d934c716a:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 88.837734 (62.676016 + 26.161719) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ d0e07efe30dfe64d36412363000a1b977b4d5d2e:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 83.813047 (70.314844 + 13.498203) exceeded threshold of 78.750000
(cluster.go:2451).Run: context canceled
(cluster.go:2451).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ a60d739746648922134ec3c0a22bb069bf1d283c:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 81.201562 (65.455000 + 15.746562) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ a44a9b1ffce25f51026b494a1dcb393cfc5361f3:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 78.887344 (66.985078 + 11.902266) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ a44a9b1ffce25f51026b494a1dcb393cfc5361f3:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 80.916875 (64.491094 + 16.425781) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 688e82e8d015350fe3aa263484416d28b232a25d:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.894531 (68.108828 + 11.785703) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 688e82e8d015350fe3aa263484416d28b232a25d:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 81.842500 (69.956484 + 11.886016) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 8f5366d09e6cf2144ca43f9cdda7e1128a13fbf8:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.893906 (67.279219 + 12.614688) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 8f5366d09e6cf2144ca43f9cdda7e1128a13fbf8:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.858750 (67.741641 + 12.117109) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ 8f5366d09e6cf2144ca43f9cdda7e1128a13fbf8:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 84.245536 (68.846387 + 15.399150) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ ea4644b040dd4503f2eb7292cfebc31a58fd16fb:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 79.088906 (65.604297 + 13.484609) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.admission-control/disk-bandwidth-limiter failed with artifacts on master @ ea4644b040dd4503f2eb7292cfebc31a58fd16fb:

(admission_control_disk_bandwidth_overload.go:185).3: write + read bandwidth 84.301406 (71.693359 + 12.608047) exceeded threshold of 78.750000
(cluster.go:2452).Run: context canceled
(cluster.go:2452).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/admission-control/disk-bandwidth-limiter/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=aws
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Nov 12, 2024
134430: roachtest: disk bandwidth limiter test should only asssert on writes r=sumeerbhola a=aadityasondhi

Since we do not pace reads yet, the test will remain flaky in this assertion, as the system can see unbounded read bandwidth usage and fail the assertion even if writes are paced.

Fixes #131484

Release note: None

134527: roachtest: add debugging to gossip/chaos r=tbg a=tbg

This test has had a string of weird failures where
either a `t.L().Printf` call or `time.Sleep(1s)`
take dozens of seconds.

This PR adds a goroutine that gets spawned right
before and, unless signaled within 2s by both
the Printf and the Sleep having completed, dumps
stacks to stderr.

See the main issue #130737.
Closes the duplicates across various branches:

Closes #132651.
Closes #134495.

Epic: none
Release note: None


134751: lease: dump stacks if TestDescriptorRefreshOnRetry fails r=rafiss a=rafiss

We added additional logging to help debug a source of flakiness in which
the acquisition counts exceed the number of release counts. For that
logging to be useful, we need to know the goroutine IDs and stacks.

Marking this as fixing the linked issue so that the next time it fails,
we are reminded to look at the logs.

fixes: #134695
Release note: None

134953: kvserver/rangefeed: rename Disconnect to SendError for stream interface r=tbg,stevendanna a=wenyihu6

This patch renames `Disconnect` to `SendError` in the
`rangefeed.Stream` interface to clarify its role for sending
errors, distinguishing it from other similarly named
functions like `registration.disconnect`.

Part of: #110432
Release note: none

Co-authored-by: Steven Danna [email protected]

Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Wenyi Hu <[email protected]>
@craig craig bot closed this as completed in 4ae299b Nov 12, 2024
@github-project-automation github-project-automation bot moved this from Tests (failures, skipped, flakes) to Done in [Deprecated] Storage Nov 12, 2024
Copy link

blathers-crl bot commented Nov 12, 2024

Based on the specified backports for linked PR #134430, I applied the following new label(s) to this issue: branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 label Nov 12, 2024
blathers-crl bot pushed a commit that referenced this issue Nov 12, 2024
Since we do not pace reads yet, the test will remain flaky in this
assertion, as the system can see unbounded read bandwidth usage and fail
the assertion even if writes are paced.

Fixes #131484

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-storage Storage Team
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants