Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm-worker hang when etcd watch ch closed #4548

Closed
Ehco1996 opened this issue Feb 10, 2022 · 2 comments
Closed

dm-worker hang when etcd watch ch closed #4548

Ehco1996 opened this issue Feb 10, 2022 · 2 comments
Assignees
Labels
affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 area/dm Issues or PRs related to DM. severity/major type/bug The issue is confirmed as a bug.

Comments

@Ehco1996
Copy link
Contributor

Ehco1996 commented Feb 10, 2022

What did you do?

one dm-worker node in weak network environment

so thread of WatchSourceBound exit by

return

How did I determine that exited on this line? in logs we can see

[2022/01/03 15:49:16.622 +08:00] [INFO] [server.go:581] ["receive source bound"] [bound="{\"source\":\"\",\"worker\":\"dm-xxxx\"}"] ["is deleted"=true]

do some unboud source operateions

[2022/01/03 16:22:30.662 +08:00] [INFO] [server.go:602] ["handleSourceBound will quit now"]

because there is no error log like observeSourceBound is failed and will quit now, se we can make sure handleSourceBound exit becasue sourceBoundCh or sourceBoundErrCh is closed

but the thread of keepalive( keepalive and watchsourceBound may use different grpc stream ) is still working...

[2022/01/03 16:22:31.663 +08:00] [INFO] [join.go:93] ["start to keepalive with master"]
[2022/01/03 16:24:49.471 +08:00] [INFO] [keepalive.go:139] ["keep alive channel is closed"]
[2022/01/03 16:24:51.517 +08:00] [WARN] [keepalive.go:122] ["fail to revoke lease"] []
[2022/01/03 16:24:51.517 +08:00] [WARN] [join.go:105] ["keepalive with master goroutine paused"] []
[2022/01/03 16:24:51.517 +08:00] [WARN] [server.go:558] ["worker has not been started, no need to stop"] [source=]
[2022/01/03 16:24:52.517 +08:00] [INFO] [join.go:93] ["start to keepalive with master"]

as a result, dm-master think this worker is alive ,but actually the worker is hang forever

how to reproduce this?

see #4553

What did you expect to see?

dm-worker offline when observeSourceBound quit

What did you see instead?

so the response of dmctl query-status is:

❯ dmctl query-status
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=40070:class=dm-worker:scope=internal:level=high], Message: no mysql source is being handled in the worker",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "worker1",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
            ]
        }
    ]
}

Versions of the cluster

master

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)
@Ehco1996 Ehco1996 added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. labels Feb 10, 2022
@Ehco1996 Ehco1996 self-assigned this Feb 11, 2022
@Ehco1996
Copy link
Contributor Author

@XuJianxu i am not sure about this bug's level could you please add a lable for me?

@Ehco1996
Copy link
Contributor Author

close for no updated now, see more here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 area/dm Issues or PRs related to DM. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants