-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Milvus Container Exited After few hours unexpectedly #38011
Comments
@AhsanAli1116 |
check your log and disk usage see if etcd is slow |
/assign @AhsanAli1116 |
my milvus container becomes unhealthy without any error . I have upgrade the storage from Hdd to SSD. |
"Health": {
|
Is there an existing issue for this?
Environment
Current Behavior
My Milvus container stops unexpectedly after few hours, I experienced this many times on my remote machine. Where I have deployed it. Minio and etcd containers are healthy but the milvus standalone container shut down.
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
[2024/11/24 20:16:48.398 +00:00] [INFO] [msgdispatcher/manager.go:219] ["merge done"] [role=datanode] [nodeID=1] [vchannel="{"by-dev-rootcoord-dml_0_454118027394023460v0":{}}"] [mergeTs=454159081932324865]
[2024/11/24 20:16:48.398 +00:00] [INFO] [msgdispatcher/dispatcher.go:206] ["begin to work"] [pchannel=by-dev-rootcoord-dml_0] [isMain=true]
[2024/11/24 20:16:48.399 +00:00] [INFO] [msgdispatcher/manager.go:198] ["start merging..."] [role=datanode] [nodeID=1] [vchannel="{"by-dev-rootcoord-dml_1_454118027394023500v0":{}}"]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=true]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:211] ["stop working"] [pchannel=by-dev-rootcoord-dml_1] [isMain=true]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=true]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=false]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:211] ["stop working"] [pchannel=by-dev-rootcoord-dml_1] [isMain=false]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=false]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:150] ["add new target"] [vchannel=by-dev-rootcoord-dml_1_454118027394023500v0] [isMain=true]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=terminate] [isMain=false]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgstream/mq_msgstream.go:218] ["start to close mq msg stream"] ["producer num"=0] ["consumer num"=1]
[2024/11/24 20:16:48.400 +00:00] [INFO] [client/client_impl.go:169] ["Consumer MsgMutex closed"]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=terminate] [isMain=false]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=resume] [isMain=true]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=resume] [isMain=true]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/manager.go:219] ["merge done"] [role=datanode] [nodeID=1] [vchannel="{"by-dev-rootcoord-dml_1_454118027394023500v0":{}}"] [mergeTs=454159081932324865]
[2024/11/24 20:16:48.400 +00:00] [INFO] [msgdispatcher/dispatcher.go:206] ["begin to work"] [pchannel=by-dev-rootcoord-dml_1] [isMain=true]
[2024/11/24 20:16:48.456 +00:00] [INFO] [server/rocksmq_impl.go:302] ["init the latest message id done"] [topicName=by-dev-rootcoord-dml_0] [msgID=454118028643398218]
[2024/11/24 20:16:48.456 +00:00] [INFO] [server/rocksmq_impl.go:302] ["init the latest message id done"] [topicName=by-dev-rootcoord-dml_1] [msgID=454118028899512901]
[2024/11/24 20:16:48.563 +00:00] [WARN] [rootcoord/root_coord.go:1639] ["failed to updateTimeTick"] [traceID=9c72edf23fdede65af8befc395b99c85] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 1"]
[2024/11/24 20:16:48.564 +00:00] [WARN] [proxy/proxy.go:378] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 1"]
[2024/11/24 20:16:48.661 +00:00] [WARN] [datacoord/services.go:1376] ["node is not matched with channel"] [traceID=7f2d9d0b32313c37e6f4847efec7bfe2] [channel=by-dev-rootcoord-dml_0_454118027394023460v0] [nodeID=1]
[2024/11/24 20:16:48.661 +00:00] [WARN] [datacoord/services.go:1376] ["node is not matched with channel"] [traceID=7f2d9d0b32313c37e6f4847efec7bfe2] [channel=by-dev-rootcoord-dml_1_454118027394023500v0] [nodeID=1]
[2024/11/24 20:16:48.661 +00:00] [WARN] [datacoord/services.go:1435] ["node is not matched with channel"] [traceID=1ecabb119e05e645bb5ceaed585667b2] [channel=by-dev-rootcoord-dml_0_454118027394023460v0] [sourceID=1] [ts=454159082037182465]
[2024/11/24 20:16:48.661 +00:00] [WARN] [datacoord/services.go:1435] ["node is not matched with channel"] [traceID=1ecabb119e05e645bb5ceaed585667b2] [channel=by-dev-rootcoord-dml_1_454118027394023500v0] [sourceID=1] [ts=454159082037182465]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/manager.go:198] ["start merging..."] [role=querynode] [nodeID=1] [vchannel="{"by-dev-rootcoord-dml_1_454118027394023500v0":{}}"]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=true]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:211] ["stop working"] [pchannel=by-dev-rootcoord-dml_1] [isMain=true]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=true]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=false]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:211] ["stop working"] [pchannel=by-dev-rootcoord-dml_1] [isMain=false]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=pause] [isMain=false]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:150] ["add new target"] [vchannel=by-dev-rootcoord-dml_1_454118027394023500v0] [isMain=true]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=terminate] [isMain=false]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgstream/mq_msgstream.go:218] ["start to close mq msg stream"] ["producer num"=0] ["consumer num"=1]
[2024/11/24 20:16:48.662 +00:00] [INFO] [client/client_impl.go:169] ["Consumer MsgMutex closed"]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=terminate] [isMain=false]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:178] ["get signal"] [pchannel=by-dev-rootcoord-dml_1] [signal=resume] [isMain=true]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:201] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_1] [signal=resume] [isMain=true]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/manager.go:219] ["merge done"] [role=querynode] [nodeID=1] [vchannel="{"by-dev-rootcoord-dml_1_454118027394023500v0":{}}"] [mergeTs=454159082037182465]
[2024/11/24 20:16:48.662 +00:00] [INFO] [msgdispatcher/dispatcher.go:206] ["begin to work"] [pchannel=by-dev-rootcoord-dml_1] [isMain=true]
[2024/11/24 20:16:48.764 +00:00] [WARN] [rootcoord/root_coord.go:1639] ["failed to updateTimeTick"] [traceID=387b3ab163d1146f5aa30ea9d2648ba4] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 1"]
[2024/11/24 20:16:48.764 +00:00] [WARN] [proxy/proxy.go:378] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 1"]
[2024/11/24 20:16:48.856 +00:00] [INFO] [server/rocksmq_impl.go:302] ["init the latest message id done"] [topicName=by-dev-rootcoord-dml_1] [msgID=454118028899512903]
[2024/11/24 20:16:48.898 +00:00] [WARN] [sessionutil/session_util.go:553] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=7587882946848335745] [error="etcdserver: requested lease not found"]
[2024/11/24 20:16:48.898 +00:00] [WARN] [sessionutil/session_util.go:553] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=7587882946848335687] [error="etcdserver: requested lease not found"]
[2024/11/24 20:16:48.898 +00:00] [WARN] [sessionutil/session_util.go:553] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=7587882946848335693] [error="etcdserver: requested lease not found"]
[2024/11/24 20:16:48.898 +00:00] [WARN] [sessionutil/session_util.go:882] ["connection lost detected, shuting down"]
[2024/11/24 20:16:48.898 +00:00] [WARN] [sessionutil/session_util.go:882] ["connection lost detected, shuting down"]
[2024/11/24 20:16:48.898 +00:00] [ERROR] [querycoordv2/server.go:159] ["QueryCoord disconnected from etcd, process will exit"] [serverID=1] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1.1\n\t/workspace/source/internal/querycoordv2/server.go:159"]
[2024/11/24 20:16:48.898 +00:00] [WARN] [sessionutil/session_util.go:553] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=7587882946848335708] [error="etcdserver: requested lease not found"]
[2024/11/24 20:16:48.899 +00:00] [WARN] [sessionutil/session_util.go:882] ["connection lost detected, shuting down"]
[2024/11/24 20:16:48.899 +00:00] [ERROR] [querynodev2/server.go:173] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=1] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/workspace/source/internal/querynodev2/server.go:173"]
[2024/11/24 20:16:48.899 +00:00] [WARN] [sessionutil/session_util.go:553] ["fail to retry keepAliveOnce"] [serverName=rootcoord] [LeaseID=7587882946848335717] [error="etcdserver: requested lease not found"]
Anything else?
No response
The text was updated successfully, but these errors were encountered: