Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully clean up Child on drop instead of SIGKILL? #2504

Closed
Mygod opened this issue May 7, 2020 · 16 comments
Closed

Gracefully clean up Child on drop instead of SIGKILL? #2504

Mygod opened this issue May 7, 2020 · 16 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-process Module: tokio/process

Comments

@Mygod
Copy link

Mygod commented May 7, 2020

Version

master

Platform

All but mainly Android.

Subcrates

N/A

Description

Currently Child::kill simply sends SIGKILL on Unix (and similarly on Windows), which does not allow child process to perform clean ups. A more graceful solution would be to send SIGTERM first, and then SIGKILL on timeout perhaps. However, this proves to be very hard to be implement as since Runtime is also dropping.

Currently we have a busy spin but it would be great if tokio could support it officially.

See also: shadowsocks/shadowsocks-rust#234

@ipetkov
Copy link
Member

ipetkov commented May 8, 2020

Thanks for the report! Tokio attempts to match the APIs standard library's behavior to avoid surprises for anyone coming from using those APIs. The standard library sends SIGKILL on Unix platforms to ensure that a child really is killed when the caller invokes the signal. Windows does not have a distinction between SIGKILL or SIGTERM, and so the standard library chooses a common ground between the two of them, and leaves it up to libraries/applications to perform more OS specific handling such as sending specific signals.

Tokio does expose a Child::id method which could be used in conjunction with libc::kill(child.id(), libc::SIGTERM) to support sending a specific singal. Would this work for your use case?

@Mygod
Copy link
Author

Mygod commented May 8, 2020

Yes but the problem is to wait for process (with timeout) to exit actually...

@Darksonn Darksonn added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-process Module: tokio/process labels May 8, 2020
@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2020

Perhaps you could call spawn from the destructor and have the spawned task take care of cleaning it up?

@Mygod
Copy link
Author

Mygod commented May 8, 2020

I guess @zonyitoo prefers not creating another thread to avoid the overhead.

@zonyitoo
Copy link
Contributor

zonyitoo commented May 8, 2020

Perhaps you could call spawn from the destructor and have the spawned task take care of cleaning it up?

If destructor is called because of panic!, then calling Handle::current().block_on(..) will cause another panic!.

Sample code:

    fn drop(&mut self) {
        let handle = Handle::current();
        handle.block_on(async {
            for plugin in &mut self.plugins {
                let id = plugin.id();
                if let Ok(..) = plugin.await {
                    // ...
                }
            }
        });
    }

Panics with message:

thread 'main' panicked at 'Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive asynchronous tasks.', /Users/zonyitoo/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.20/src/runtime/enter.rs:38:5

So what can we do if we couldn't use block_on to check if subprocesses are exited gracefully?

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2020

My suggestion was to use tokio::spawn. It does not spawn a new thread; that is the entire point of using async/await.

Rust does not support async drop, and it is not possible to block the destructor on waiting for the child.

@zonyitoo
Copy link
Contributor

zonyitoo commented May 8, 2020

It does not spawn a new thread; that is the entire point of using async/await.

Yes. I think @Mygod was talking about a solution that we wanted to make: spawning a new thread and create a new Runtime in it.

If we don't block the drop(), what would happen for that spawned task? Will that destructing Runtime wait for it to finish?

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2020

If the runtime is shutting down, the spawned task would be immediately cancelled. I guess you would have to block the executor if it is shutting down.

@zonyitoo
Copy link
Contributor

zonyitoo commented May 8, 2020

That's a way to work around. But I am hoping for a better way.

When this struct is destructing because of stack unwind, then it is unwise to create a task into the current Runtime, which may cause "panic while panicking".

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2020

To do something while the executor is shutting down, you have to block. You can't block with block_on, so you would have to do something different. For example, you could wait with thread::sleep, which would also block the executor.

@zonyitoo
Copy link
Contributor

zonyitoo commented May 8, 2020

Yup. That's what I was doing: a busy spin. I think @Mygod is hoping for a better and cleaner solution. :(

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2020

If you want to use different solutions depending on whether the runtime is shutting down, you could do this:

  1. In the Drop impl, spawn a future.
  2. Move a new child object into this future.
  3. In the future, wait for the child, and kill it through ordinary means.
  4. But if the object in the spawned future is dropped, you can block the executor with the busy spin in that destructor.

I suppose you could also detect if it was dropped outside the executor, and just do the busy spin directly.

let mut spin_loop_on_kill = SpinChild::new(self); // or whatever
if let Ok(handle) = Handle::try_current() {
    handle.spawn(async move {
        spin_loop_on_kill.kill_with_async_waiting_and_no_spin().await;
        // but if cancelled, destructor of spin child is called
    });
}
// else { run destructor of `spin_loop_on_kill` here

@zonyitoo
Copy link
Contributor

zonyitoo commented May 8, 2020

That is one of the solution that I was thinking about 10 minutes ago. What do you think @Mygod ?

@Mygod
Copy link
Author

Mygod commented May 8, 2020

Well we already have a busy spin already right? I was looking for a solution that does not involve busy spin and rely on e.g. signals... 🤣

@Darksonn
Copy link
Contributor

I'm closing this. The drop impl of Child is not going to sleep to wait for the child, and alternatives have been suggested.

@da-kami
Copy link

da-kami commented Feb 2, 2024

I stumbled over this issue and had a similar problem to solve - I found it hard deriving a solution from the comments so here's what I came up with.
I am not sure what I came up with is a good solution, but it does it's job for my test setup.


My problem is similar to one originally described: I am writing system tests where I spawn a tokio::process::Child. Now, when I use kill_on_drop on the Command then the binary I run has no chance to gracefully shut down, as the process is killed without signaling.

The binary implements handling SIGTERM to gracefully shut down, so I would like to achieve, that I can have a guaranteed SIGTERM sent to the child process no matter if the test fails or not.

What I first came up with was a wrapper for the Child that would implement Drop to send the signal.
But, depending on your test setup you might find that impracticable because you want to be able to pass the Child around (or consume it's stdin or stdout which can get tricky if you pack it into a struct that overwrites Drop).

Since we only need the process id for signalling we can just extract that id once we have the Child and just keep that around for signaling upon drop.
Just make sure you keep the struct that holds the process_id in scope until the end of the test:

/// Sends a SIGTERM to the recorded process_id upon drop
///
/// Record child process id in here and keep in scope until the end of your test.
pub struct ProcessCleanup {
    pub process_id: i32,
}

impl ProcessCleanup {
    pub fn new(process_id: i32) -> Self {
        Self { process_id }
    }
}

impl Drop for ProcessCleanup {
    fn drop(&mut self) {
        println!(
            "Cleaning up: Sending SIGTERM to process {}",
            self.process_id
        );
        unsafe { kill(self.process_id, SIGTERM) };
    }
}

note: uses nix crate to send signal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-process Module: tokio/process
Projects
None yet
Development

No branches or pull requests

5 participants