-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kill Allocations when client is disconnected from servers #2185
Comments
@diptanu A related question: |
Workaround till 0.6 release: Script in crontab
|
@drscre Not quite. So it depends on how long of a connection loss there is. The clients heartbeat to the server every 15-45 seconds depending on the size of the cluster. If you fail a heartbeat the server marks that node as down and will replace the allocations. When the node comes back it will detect that it shouldn't be running those allocations and kill them. If you loose and regain connection within a heartbeat nothing will be restarted. |
i asked something similar but got a refuse on this. Bad judgement IMHO. IMHO whenever there is a connection lost/killed agent/any issue with nomad that prevent the admin to terminate the machines from remote - the tasks on the nodes must suicide immediately. On a 100 machines cluster these issues will happen on a daily basis. |
@dadgar It would be great to have configurable timeout for the cases when the exact number of running tasks is not important and network is overloaded/laggy. Also it seems to me that it's better for a Nomad server to kill allocation not at the time the node is lost, but at the time the node is back or after configurable timeout. |
This is really helpful for running virtual machine (qemu) in nomad. Usually virtual machine disk image is on shared storage, and we want to keep exactly one instance of a particular virtual machine among the whole cluster, otherwise two instances of a vm writing to the same disk image will cause data corruption. For now we have to use exec driver with "consul lock ... qemu-kvm ...." to workaround this problem. |
Is there any setting for nomad client (or will be in the future) that will |
@jfvubiquity there currently isn't. This would be the issue to watch for that feature. |
I think there maybe two types of workloads, "at least N copies of instance" and "exactly N copies of instance". Being able to kill allocation to for the second use case. I think it would be helpful for Nomad to provide a semaphore-like resource declaration to solve this problem. For example,
So Nomad can uses Raft to implement an semaphore, each instance of the task will consume one slot. When the client is offline for 60s, it loses the semaphore and kill the allocation, Nomad will be able to create a new instance on other node. Another way is to integrate Consul lock interface in the job specification. |
@edwardbadboy your idea is just great. Currently i need to implement locks manually via mongodb database (going to replace with consul locks now) but nomad locking might solve a lot of my problems and reduce system complexity. |
I can see the benefit of the proposed changes in this thread and I would like to have them too. That said, I'd like to be able to completely opt out for such timeout/cleanup behavior. From a failure handling PoV, given such features, say for whatever reason Nomad client and server lost contact for a certain period of time, I'd like Nomad to NOT wipe out my infrastructure services, and offer me the chance to recover from such networking/Nomad outage without fighting all other infra outage fire at the same time. This is one of the things I tested and like about Nomad in my destructive testing cases. On a side note, DC/OS and k8s has similar behavior of NOT wiping out existing runtimes in such cases, at least for the versions that we tested and running in our infra. For a couple of times, DC/OS not wiping out all existing runtimes upon complete master nodes failure gave us the relief of only fighting DC/OS fire while existing services running on the broker DC/OS cluster remain unaffected. |
Appending some notes to this issue for CSI support: The unpublishing workflow for CSI assumes the cooperation of a healthy client that can report that allocs are terminal. On the client side we'll help to reconcile this by having client that is out-of-contact with the server for too long will mark its allocs as terminal, which will call |
@langmartin while we're working through the design for this, we should consider the cases of #6212 and #7607 as well. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad servers replaces the allocations running on a node when the client misses heartbeats. The client can be partitioned from the servers, or the client might just be dead but it doesn't mean that the allocations are actually dead when a client is disconnected. This might be a problem in some cases when certain applications need only a fixed number of shards running.
Nomad will solve the above problem by allowing the users to configure a time duration at a task group level which will make the client kill the allocations of that task group after it is disconnected from the server. In cases where the client will be dead too, drivers like
exec
orraw_exec
which uses an executor for supervising the processes will kill the process and exit.The text was updated successfully, but these errors were encountered: