br restore failed with error “attempt:0,error:rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout” when inject pd leader network partition #51553
Labels
component/br
This issue is related to BR of TiDB.
may-affects-5.4
This bug maybe affects 5.4.x versions.
may-affects-6.1
may-affects-6.5
may-affects-7.1
may-affects-7.5
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、cluster deploy with 3 pd
2、br restore
3、inject network partition between pd leader and all other pod
2. What did you expect to see? (Required)
br restore can success
3. What did you see instead (Required)
br restore failed when inject network partition between pd leader and all other pod
Run BrRestore failed.
cmd start at 2024-03-05 12:52:08
cmd failed at 2024-03-05 12:56:17
stdout:
Detail BR log in /tmp/br.log.2024-03-05T04.52.08Z
{level:warn,ts:2024-03-05T04:53:22.013381Z,logger:etcd-client,caller:[email protected]/retry_interceptor.go:62,msg:retrying of unary invoker failed,target:etcd-endpoints://0xc000cd4e00/tc-pd.ha-test-br-restore-tps-7081896-1-63:2379,attempt:0,error:rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout}
br logs:
br.log.2024-03-05T04.52.08Z.tar.gz
4. What is your TiDB version? (Required)
./tidb-server -V
Release Version: v8.0.0-alpha
Edition: Community
Git Commit Hash: 5ac8a5b
Git Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-03-04 11:47:38
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-03-05T07:19:50.565+0800
The text was updated successfully, but these errors were encountered: