Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br restore failed with error “attempt:0,error:rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout” when inject pd leader network partition #51553

Closed
Lily2025 opened this issue Mar 6, 2024 · 4 comments · Fixed by #51578
Assignees
Labels
component/br This issue is related to BR of TiDB. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 severity/major type/bug The issue is confirmed as a bug.

Comments

@Lily2025
Copy link

Lily2025 commented Mar 6, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、cluster deploy with 3 pd
2、br restore
3、inject network partition between pd leader and all other pod

2. What did you expect to see? (Required)

br restore can success

3. What did you see instead (Required)

br restore failed when inject network partition between pd leader and all other pod

Run BrRestore failed.
cmd start at 2024-03-05 12:52:08
cmd failed at 2024-03-05 12:56:17
stdout:
Detail BR log in /tmp/br.log.2024-03-05T04.52.08Z
{level:warn,ts:2024-03-05T04:53:22.013381Z,logger:etcd-client,caller:[email protected]/retry_interceptor.go:62,msg:retrying of unary invoker failed,target:etcd-endpoints://0xc000cd4e00/tc-pd.ha-test-br-restore-tps-7081896-1-63:2379,attempt:0,error:rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout}

br logs:
br.log.2024-03-05T04.52.08Z.tar.gz

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.0.0-alpha
Edition: Community
Git Commit Hash: 5ac8a5b
Git Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-03-04 11:47:38
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-03-05T07:19:50.565+0800

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label Mar 6, 2024
@Lily2025
Copy link
Author

Lily2025 commented Mar 6, 2024

/assign BornChanger

@Lily2025
Copy link
Author

Lily2025 commented Mar 6, 2024

/type bug
/severity major

@Lily2025
Copy link
Author

Lily2025 commented Mar 6, 2024

/assign @Leavrth

@seiya-annie seiya-annie added the component/br This issue is related to BR of TiDB. label Mar 7, 2024
@Leavrth
Copy link
Contributor

Leavrth commented Mar 12, 2024

closed by #51578

Regard the error grpc: context canceled as a retryable error.
Once the ctx from pdclient.GetTS(ctx) is canceled, it returns context.Canceled. Therefore, for pd backoff, it's OK to regard the error grpc: context canceled as a retryable error in BR side.

@Leavrth Leavrth closed this as completed Mar 12, 2024
@Leavrth Leavrth linked a pull request Mar 12, 2024 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/br This issue is related to BR of TiDB. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants