Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeed stucks after injecting TiKV failure #9595

Closed
fubinzh opened this issue Aug 17, 2023 · 5 comments · Fixed by #9597
Closed

changefeed stucks after injecting TiKV failure #9595

fubinzh opened this issue Aug 17, 2023 · 5 comments · Fixed by #9597
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/critical type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Aug 17, 2023

What did you do?

  1. create redo log changefeed
cdc  cli  changefeed  create "--server=127.0.0.1:8301"  "--sink-uri=mysql://root:@downstream.cdc-testbed-tps-1892534-1-706:3306"  "--changefeed-id=redo-enable-cdc-all-node-restart-sync"  "--config=/tmp/changefeed.toml"

$ cat /tmp/changefeed.toml 
[consistent]
level = "eventual"
nstorage = "s3://tmp/test-infra-redolog/redo-enable-cdc-all-node-restart-sync9ac9a259-7f35-4e58-926d-0d6d216693a9?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http://minio-peer:9000&force-path-style=true
max-log-size = 64
  1. run workload
  2. Injact failure for all TiKV
[2023/08/16 19:45:41.280 +00:00] [INFO] [step.go:44] ["kvFailure: duration=1m0s,  allTiKV=true, failNode=0"]
[2023/08/16 19:45:41.280 +00:00] [INFO] [tikv_chaos.go:44] ["Inject TiKV failure for all nodes"]
[2023/08/16 19:46:41.385 +00:00] [INFO] [chaos.go:93] ["run chaos finished"]

What did you expect to see?

Changefeed should not stuck after TiKV failure restored

What did you see instead?

redo log changefeed stucks

image

"[meta_manager.go:334] [\"Redo meta has not changed for a long time, owner may be stuck\"] [namespace=default] [changefeed=redo-enable-cdc-all-node-restart-sync] [lastFlushTime=9m21.99436182s] [meta=\"{\\\"CheckpointTs\\\":443604045771833536,\\\"ResolvedTs\\\":443604045771833536}\"]"

Versions of the cluster

CDC version:

"[version.go:47] [\"Welcome to Change Data Capture (CDC)\"] [release-version=v7.4.0-alpha] [git-hash=dcfcb43a99bacd7639156421993a3a95284c5f45] [git-branch=heads/refs/tags/v7.4.0-alpha] [utc-build-time=\"2023-08-16 11:36:28\"] [go-version=\"go version go1.21.0 linux/amd64\"] [failpoint-build=false]"
@fubinzh fubinzh added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. labels Aug 17, 2023
@nongfushanquan
Copy link
Contributor

/assign @CharlesCheung96

@hicqu
Copy link
Contributor

hicqu commented Aug 17, 2023

It can be reproduced in my local environment, even if without redo-log enabled.
It disappears after I revert #9519 , so I guess it's introduced by #9519 .

@fubinzh
Copy link
Author

fubinzh commented Aug 17, 2023

This issue also seen when running mysql sink changefeed and injecting tikv failures. (case: tikv_unavailable_sync)

@fubinzh
Copy link
Author

fubinzh commented Aug 17, 2023

/found automation
/severity critical

@fubinzh fubinzh changed the title Redo log changefeed stucks after injecting KV failure changefeed stucks after injecting KV failure Aug 17, 2023
@fubinzh fubinzh changed the title changefeed stucks after injecting KV failure changefeed stucks after injecting TiKV failure Aug 17, 2023
@CharlesCheung96 CharlesCheung96 added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. and removed may-affects-5.2 may-affects-5.3 may-affects-5.4 may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Aug 18, 2023
@fubinzh
Copy link
Author

fubinzh commented Aug 18, 2023

This issue also seen in case of tikv network partition or cdc scale scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants