Evict leader scheduler can not show after pd leader recovery from failure #4707
Labels
affects-6.0
affects-6.3
affects-6.4
affects-6.6
affects-7.0
affects-7.1
This bug affects the 7.1.x(LTS) versions.
found/automation
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
What did you do?
1、Add evict-leader-scheduler to two tikv;
2、Inject pd leader instance down chaos;
3、After more than 5min,show scheduler config, found no evict-leader,try remove it at this time, return 404 also;
4、After several hours,show scheduler again, found it exist.
What did you expect to see?
In step 3, there should be evict-leader show here.
What did you see instead?
In step 3, no evict leader scheduler.
What version of PD are you using (
pd-server -V
)?/ # /pd-server -V
Release Version: v5.5.0-alpha-72-gcc256b5e
Edition: Community
Git Commit Hash: cc256b5
Git Branch: master
UTC Build Time: 2022-03-01 08:26:57
Test logs:
[2022/03/04 13:20:41.459 +08:00] [INFO] [pdutil.go:105] ["/pd-ctl scheduler remove evict-leader-scheduler:[404] "[PD:scheduler:ErrSchedulerNotFound]scheduler not found""]
2022-03-04T13:20:41.729+0800 INFO k8s/client.go:107 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2022/03/04 13:20:42.163 +08:00] [INFO] [pdutil.go:105] ["/pd-ctl scheduler add evict-leader-scheduler 4:Success!"]
[2022/03/04 13:20:42.223 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-1=4007]
[2022/03/04 13:20:52.283 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-1=4007.5]
[2022/03/04 13:21:02.338 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-1=4008]
[2022/03/04 13:21:12.406 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-1=4008]
[2022/03/04 13:21:22.479 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-1=2004]
[2022/03/04 13:21:32.553 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-1=0]
2022-03-04T13:21:32.553+0800 INFO k8s/client.go:107 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2022/03/04 13:21:33.108 +08:00] [INFO] [pdutil.go:105] ["/pd-ctl scheduler add evict-leader-scheduler 5:Success!"]
[2022/03/04 13:21:33.162 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=5352]
[2022/03/04 13:21:43.236 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=5352]
[2022/03/04 13:21:53.315 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=5351.5]
[2022/03/04 13:22:03.380 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=5349.5]
[2022/03/04 13:22:13.436 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=5349.5]
[2022/03/04 13:22:23.498 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=2674]
[2022/03/04 13:22:33.564 +08:00] [INFO] [check.go:471] ["current leader:"] [tc-tikv-3=0]
[2022/03/04 13:22:33.564 +08:00] [INFO] [chaos.go:358] ["fault will last for"] [duration=2m0s]
[2022/03/04 13:22:34.056 +08:00] [INFO] [chaos.go:86] ["Run chaos"] [name="pd leader"] [selectors="[testbed-oltp-hm-7wksp/tc-pd-0]"] [experiment="{"Duration":"","Scheduler":null}"]
[2022/03/04 13:24:34.128 +08:00] [INFO] [chaos.go:151] ["Clean chaos"] [name="pd leader"] [chaosId="ns=testbed-oltp-hm-7wksp,kind=failure,name=pod-failure-qcsfgfnq,spec=&k8s.ChaosIdentifier{Namespace:"testbed-oltp-hm-7wksp", Name:"pod-failure-qcsfgfnq", Spec:FailureExperimentSpec{Duration: "", Scheduler: }}"]
2022-03-04T13:26:34.329+0800 INFO k8s/client.go:107 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2022/03/04 13:26:34.750 +08:00] [INFO] [pdutil.go:105] ["/pd-ctl scheduler config evict-leader-scheduler:[404] scheduler not found"]
2022-03-04T13:26:34.751+0800 INFO k8s/client.go:107 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2022/03/04 13:26:35.188 +08:00] [INFO] [pdutil.go:105] ["/pd-ctl scheduler remove evict-leader-scheduler:[404] "[PD:scheduler:ErrSchedulerNotFound]scheduler not found""]
The text was updated successfully, but these errors were encountered: