-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: failover/chaos/read-only failed #106681
Comments
@renatolabs if Test Eng isn't the right place for this, please let me know |
The error is:
which happens when the cluster setting is not propagated within the expected timeout (10s): cockroach/pkg/sql/set_cluster_setting.go Lines 612 to 615 in 8789fdd
This all makes it look like this is a test flake -- there are situations in which doing the cluster setting update as the test is doing is not guaranteed to succeed. In that case, the test owners (kv) would be in the best position to fix this, as the authors of the test. If you believe this to be an infrastructure flake for some reason, do let me know and provide more info, please. |
@erikgrinaker I'll take a look at this tomorrow. It appears that we are trying to set a cluster setting while the system is only partially available. This isn't guaranteed to succeed in these chaos tests since the necessary range may either be unavailable or needs to wait until a failover has occurred. Since we only wait 10 seconds it is possible this doesn't happen fast enough. |
cc @cockroachdb/replication |
Makes sense. I submitted a fix in #106893. |
106893: failover: re-enable disk stall detector in `diskStallFailer.Ready` r=erikgrinaker a=erikgrinaker `pauseFailer` needs to disable the disk stall detector to avoid false positives. However, it attempted to re-enable it via cluster setting during recovery. If a system range is unavailable during recovery (typically in chaos tests with concurrent failures), this can error out. This patch instead (re-)enables the disk stall detector during `diskStallFailer.Ready`. Touches #106681. Touches #106752. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
Resolved by #107251. |
roachtest.failover/chaos/read-only failed with artifacts on release-23.1 @ 57c94bfe124fd08b948caa257acd3cb1e7cf1667:
Parameters:
ROACHTEST_arch=amd64
,ROACHTEST_cloud=gce
,ROACHTEST_cpu=2
,ROACHTEST_encrypted=false
,ROACHTEST_fs=ext4
,ROACHTEST_localSSD=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-29675
Epic CRDB-27234
The text was updated successfully, but these errors were encountered: