-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to send Unix Signals to processes Nomad is running #817
Comments
@phinze We can do this for our exec based drivers. Docker doesn't have any API AFAIK to send signals to a running pid inside a container. |
@c4milo The kill API you referenced assumes that the user wants to kill the container by sending a signal, and the call waits until the container exits. The usecase for this ticket is that a user might want to send an arbitrary signal to a pid asynchronously. So the kill API won't work in this case. |
@diptanu it looks to me, from the docs, like you can send arbitrary signals via the |
@diptanu, Docker has no assumption that a signal would stop the process inside a container. Ue use Docker kill API extensively to send signals to processes. |
@phinze Is there any reason you can't put an HTTP API on top of the services. It is nice to limit the surface area of the scheduler to do just what is required. We already send a soft-kill (SIGTERM) signal before SIGKILL to let you do cleanup work and the kill timeout between these two is configurable. |
@dadgar Definitely understand the desire to keep the scheduler simple! So I'm coming from the opposite direction - trying to minimize the number of for-each-service changes I have to make as I lift an existing microservices architecture into Nomad. So the idea of requiring a sweep through to flop We might be able to make things work with the TERM/KILL window though. (We have some very long running jobs (6-12 hours) that need to drain from some of our services.) From where I sit, it seems that the interrupt-driven use cases of pause/resume service, immediate config reload, and other behavior triggered by Unix signals is a relatively core enough feature to warrant inclusion in Nomad. But happy to discuss further! |
@dadgar, I agree with @phinze here. Ability to send signals to a task process would be really useful addition to Nomad. For example, NginX, as well as a large amount of other widely-used software, uses signals for a number of essential actions: One could definitely put some kind of HTTP API on top of NginX by running a coprocess that would listen to HTTP requests and communicate with NginX using signals, but that would require non-trivial effort. |
@skozin So I think in the world of cluster schedulers some of the use cases you have described changes - On the topic of config reloading, if you use something like consul template or any other co-process which re-generates the nginx config, I would imagine that the co-process is going to send a signal to the Nginx pid to reload the config and not the operator. But I don't disagree that sometimes sending signals can be handy, but I agree with @dadgar that in an environment where services are run on cluster schedulers the need for sending signals to processes becomes less. |
@diptanu we're attempting to co-schedule a task group with two docker tasks: a consul-template container and an nginx container. I'm curious as to your statement:
It's seems a bit tricky in this scenario for the consul-template container to send a signal to the nginx container. Might a proper Nomad HTTP API to send a signal would simplify this problem? Fictional example: Nomad injects metadata (env var) for a unique endpoint to POST a signal to specific sibling tasks in the group. My consul-template task might then |
@dadgar, you wrote
Could you maybe post a reference to this? I couldn't find anything in https://www.nomadproject.io/docs/drivers/docker.html. |
@JensRantil I think you can add a kill_timeout parameter on the task object. Docs can be found here: https://www.nomadproject.io/docs/jobspec/index.html#kill_timeout. It is not docker specific. |
@diptanu, you wrote
Any news on that? I can't find any reference in the documentation |
@maruina I think he meant in the abstract. We haven't done this because not only is it driver specific it is also operating system specific. It requires more thought as to whether we want to support this. |
We would be happy to be able to send signals to jobs/groups/individual tasks (via HTTP API as described in one of the comments above) as well! |
I'd propose adding a kill_signal parameter, analogous to the template update signal. Background is that different signals lead to different exit behaviour, in my case e.g. for gitlab-ci runner i want to send SIGQUIT instead of SIGINT: https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/master/docs/commands/README.md#signals |
Closing. Nomad v0.9.2 added the Feel free to open a new issue if there are use cases we didn't cover. Thanks and sorry for the delay in closing this issue! |
@schmichael @dadgar sorry to comment a closed ticket, but I have a use case that can really benefit from such capability the best. The use case is for a runtime controlled by nomad client that can benefit from having a chance for clean shutdown when the machine is being decommissioned. Specifically, it's a Kafka broker process ran as nomad job on AWS ec2 machine (amazon linux2 with nomad client ran as systemd unit). When the ec2 machine gets killed (say b/c degraded hardware, aws scheduled maintenance etc.) without nomad being in the loop, nomad job level kill_signal or nomad migration feature are not able to help. Such cases can benefit from a clean/controlled shutdown process initiated by the nomad client on the affected machine. We can use On the other hand, with capability to send the nomad client a signal as a trigger, each node can only trigger such draining for itself. I understand not all system NEEDs such clean shutdown capability (I'm a fan of crash only system in fact). But our production Kafka on nomad can really benefit from such local node draining for unexpected machine shutdown cases |
@rmlsun you might be able to cover this case with the |
@rmlsun Hm, what you're describing sounds like #2052 ? Would a If so please leave a comment over on that issue. If not please file a new issue like @tgross said. |
@tgross thanks for the pointer. In most of our cases, we want the exact opposite of that
|
@schmichael thanks for the pointer. Yes I think that kinds of configurable nomad client shutdown behavior will be helpful in this particular case:
Basically what we want is, if nomad itself is running into unexpected issues, leave the task runtime alone and confined nomad issue to be just nomad issue as much as possible (smallest blast radius possible). On the other hand, if it's an intentional shutdown of nomad client, provides a way to trigger a clean shutdown of task runtimes |
I think there might be a fine line here @schmichael Ideally, if nomad client itself crashes or shutdown b/c not operator initiated reasons, it should not trigger task shutdown. Only if it's an operator initiated shutdown, it triggers (and waits for the finish of) clean shutdown of all tasks. So would a signal be a good way to indicate it's an intentional shutdown? Like, instead of having |
Need additional clarification on this, does draining a node always force quit tasks or will it use the configured |
Draining a node stops tasks in the same way
The drain command's A long The only way to bypass |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Many of our services respond to Unix Signals (
SIGHUP
,SIGUSR1
, etc.) to change their behavior. The most important one is to allow a service to "drain" (stop accepting new work, complete current in-flight work) before it is fully shut down.The question is how this will work once Nomad is running our services. How can we send signals or achieve the equivalent behavior?
Happy to provide more detail and discuss as needed. 👍
The text was updated successfully, but these errors were encountered: