dm-worker hang when etcd watch ch closed #4548

Ehco1996 · 2022-02-10T01:18:36Z

What did you do?

one dm-worker node in weak network environment

so thread of WatchSourceBound exit by

Line 273 in 1c1015b

return

How did I determine that exited on this line？ in logs we can see

[2022/01/03 15:49:16.622 +08:00] [INFO] [server.go:581] ["receive source bound"] [bound="{\"source\":\"\",\"worker\":\"dm-xxxx\"}"] ["is deleted"=true]

do some unboud source operateions

[2022/01/03 16:22:30.662 +08:00] [INFO] [server.go:602] ["handleSourceBound will quit now"]

because there is no error log like observeSourceBound is failed and will quit now, se we can make sure handleSourceBound exit becasue sourceBoundCh or sourceBoundErrCh is closed

but the thread of keepalive( keepalive and watchsourceBound may use different grpc stream ) is still working...

[2022/01/03 16:22:31.663 +08:00] [INFO] [join.go:93] ["start to keepalive with master"]
[2022/01/03 16:24:49.471 +08:00] [INFO] [keepalive.go:139] ["keep alive channel is closed"]
[2022/01/03 16:24:51.517 +08:00] [WARN] [keepalive.go:122] ["fail to revoke lease"] []
[2022/01/03 16:24:51.517 +08:00] [WARN] [join.go:105] ["keepalive with master goroutine paused"] []
[2022/01/03 16:24:51.517 +08:00] [WARN] [server.go:558] ["worker has not been started, no need to stop"] [source=]
[2022/01/03 16:24:52.517 +08:00] [INFO] [join.go:93] ["start to keepalive with master"]

as a result, dm-master think this worker is alive ,but actually the worker is hang forever

how to reproduce this?

see #4553

What did you expect to see?

dm-worker offline when observeSourceBound quit

What did you see instead?

so the response of dmctl query-status is:

❯ dmctl query-status
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=40070:class=dm-worker:scope=internal:level=high], Message: no mysql source is being handled in the worker",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "worker1",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
            ]
        }
    ]
}

Versions of the cluster

master

current status of DM cluster (execute `query-status <task-name>` in dmctl)

(paste current status of DM cluster here)

The text was updated successfully, but these errors were encountered:

Ehco1996 · 2022-02-11T04:07:44Z

@XuJianxu i am not sure about this bug's level could you please add a lable for me?

Ehco1996 · 2022-04-12T08:13:55Z

close for no updated now, see more here

Ehco1996 added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. labels Feb 10, 2022

Ehco1996 mentioned this issue Feb 10, 2022

worker(dm): add retry for watch when network is weak #4553

Closed

Ehco1996 self-assigned this Feb 11, 2022

XuJianxu added the severity/major label Feb 16, 2022

ti-chi-bot added may-affects-4.0 may-affects-5.0 may-affects-5.1 may-affects-5.2 may-affects-5.3 may-affects-5.4 labels Feb 16, 2022

Ehco1996 changed the title ~~dm-worker hang forver when etcd watch ch closed~~ dm-worker hang when etcd watch ch closed Feb 17, 2022

lance6716 added affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. and removed may-affects-4.0 may-affects-5.1 may-affects-5.2 may-affects-5.0 labels Feb 24, 2022

ti-chi-bot removed may-affects-5.3 may-affects-5.4 labels Feb 24, 2022

VelocityLight added the affects-6.0 label Mar 17, 2022

Ehco1996 closed this as completed Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dm-worker hang when etcd watch ch closed #4548

dm-worker hang when etcd watch ch closed #4548

Ehco1996 commented Feb 10, 2022 •

edited

Loading

Ehco1996 commented Feb 11, 2022

Ehco1996 commented Apr 12, 2022

dm-worker hang when etcd watch ch closed #4548

dm-worker hang when etcd watch ch closed #4548

Comments

Ehco1996 commented Feb 10, 2022 • edited Loading

What did you do?

how to reproduce this?

What did you expect to see?

What did you see instead?

Versions of the cluster

current status of DM cluster (execute query-status <task-name> in dmctl)

Ehco1996 commented Feb 11, 2022

Ehco1996 commented Apr 12, 2022

Ehco1996 commented Feb 10, 2022 •

edited

Loading

current status of DM cluster (execute `query-status <task-name>` in dmctl)