-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad allocation getting killed and transferred to another node in the cluster #3289
Comments
@smohankarthik Can you try out the Nomad 0.7 beta on the client? We have improved the clients heartbeat reliability. Also it would be good to see the logs from the client around the TTL. |
@dadgar I changed to beta version and didn't come across Heartbeat issue. I have a couple of queries.
Nomad versionNomad v0.7.0 Beta Now after upgrading the client version,I am facing another error Driver Failure regarding to rpc failure and timeout while plugin to start .
|
There are a few heartbeat related settings on the server: https://www.nomadproject.io/docs/agent/configuration/server.html#heartbeat_grace Increasing the grace setting is probably the most straightforward way to give clients more time to heartbeat.
Is there any chance you can reproduce this behavior? The logs would be very useful in debugging (including the |
Did you ever figure out a fix? I seem to be having a similar problem after upgrading to Nomad 0.7.0.
|
Try to cleaning up everything and retry it . Hope it works for you also, it worked for me. I am not sure what was the issue. |
I am going to close this since the original issue seems to be fixed and the other doesn't look related. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.6.3
Operating system and Environment details
Ubuntu 16.04(Nomad Server)/Windows 2016 Server(Nomad Client Node)
Issue
When I try to deploy services to the Windows nodes(it's a 2 Node Cluster). My services are running in the beginning of the process but apparently after 10 mins the processes are being killed and later moved to the other Node in client's cluster.
I cross-checked with the node which having this problem ,it works when I drained the other node and deployed all services in 1 Node but it doesn't work when its 2 or a 3 node cluster.
In the recent activities its actually showing that its being getting killed.
In the recent activities its actually showing that its being getting killed.
Nomad Server logs
I find that nomad heartbeat TTL expired .
I'm not quite sure but this is what I found in the nomad server log.
The text was updated successfully, but these errors were encountered: