dm-worker hang when etcd watch ch closed #4548
Labels
affects-5.3
affects-5.4
This bug affects the 5.4.x(LTS) versions.
affects-6.0
area/dm
Issues or PRs related to DM.
severity/major
type/bug
The issue is confirmed as a bug.
What did you do?
one dm-worker node in weak network environment
so thread of
WatchSourceBound
exit bytiflow/dm/pkg/ha/bound.go
Line 273 in 1c1015b
How did I determine that exited on this line? in logs we can see
because there is no error log like
observeSourceBound is failed and will quit now
, se we can make surehandleSourceBound
exit becasuesourceBoundCh
orsourceBoundErrCh
is closedbut the thread of keepalive( keepalive and watchsourceBound may use different grpc stream ) is still working...
as a result, dm-master think this worker is alive ,but actually the worker is hang forever
how to reproduce this?
see #4553
What did you expect to see?
dm-worker offline when observeSourceBound quit
What did you see instead?
so the response of dmctl query-status is:
Versions of the cluster
master
current status of DM cluster (execute
query-status <task-name>
in dmctl)(paste current status of DM cluster here)
The text was updated successfully, but these errors were encountered: