Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give interactive processes a chance to gracefully shutdown #14580

Merged
merged 6 commits into from
Feb 28, 2022
Merged

Give interactive processes a chance to gracefully shutdown #14580

merged 6 commits into from
Feb 28, 2022

Conversation

slessans
Copy link
Contributor

@slessans slessans commented Feb 23, 2022

Overview

This PR aims to add a graceful shutdown feature for interactive processes.

Currently, when pants run receives a SIGINT, it send a SIGKILL to child processes which causes them to shutdown without an opportunity to run cleanup procedures (for instance, try .. finally and context managers in python). This can cause file handles to be left open, streams to not be flushed, etc.

This PR changes the behavior to first send a SIGINT and wait for the process to gracefully terminate within a pre-determined amount of time. If the process does not shut itself down gracefully, a SIGKILL is sent.

This problem is outlined in greater detail in #13230

Approach

This PR takes approach number (2) from #13230 (comment). This is was the lowest-lift implementation. It is slightly sub-optimal since it blocks in the Drop implementation, however, this should not cause deadlocks since it blocks for a maximum of a bounded time. Furthermore, in my benchmarking, this drop rarely runs, as most of the time the cancellation happens before drop and the drop becomes essentially a noop.

An alternative is to run our own reaping process using a separate routine launched with tokio::spawn. The main loop could await this reaping process before shutting down. This would be more complex but would allow us to avoid using blocking calls where we should be awaiting.

Long term a solution for AsyncDrop would solve this problem the most elegantly.

Fixes #13230.

Scott Lessans added 3 commits February 17, 2022 12:57
wip
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@@ -78,7 +130,7 @@ impl DerefMut for ManagedChild {
impl Drop for ManagedChild {
fn drop(&mut self) {
if !self.killed.load(Ordering::SeqCst) {
let _ = self.kill_pgid();
let _ = self.graceful_shutdown();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuhood in all of my testing I was unable to trigger this codepath -- it seems that the process will actually hang if this path is removed. Maybe we can change the drop to simply call kill instead of graceful shutdown to avoid blocking in drop. Not sure if that is better though, since it could be a race-condition

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yea. Possibly. On the other hand, the symmetry of using the same method here is a little bit better, and reduces the number of unique codepaths. So IMO, let's leave it.

We don't have direct unit tests of this code (mea culpa: sorry), but it's possible that we could cause this path to execute that way? Not a blocker for landing though, I think.

@slessans slessans marked this pull request as ready for review February 23, 2022 18:23
@slessans slessans changed the title [wip] Give interactive processes a chance to gracefully shutdown Give interactive processes a chance to gracefully shutdown Feb 23, 2022
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great: thanks a lot!

@@ -78,7 +130,7 @@ impl DerefMut for ManagedChild {
impl Drop for ManagedChild {
fn drop(&mut self) {
if !self.killed.load(Ordering::SeqCst) {
let _ = self.kill_pgid();
let _ = self.graceful_shutdown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yea. Possibly. On the other hand, the symmetry of using the same method here is a little bit better, and reduces the number of unique codepaths. So IMO, let's leave it.

We don't have direct unit tests of this code (mea culpa: sorry), but it's possible that we could cause this path to execute that way? Not a blocker for landing though, I think.

cleanup_wait_time=2,
)

def test_pantsd_graceful_shutdown_deadline(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@Eric-Arellano
Copy link
Contributor

A very impressive and high-impact first contribution 🙌 Thank you!

I was at first thinking this would be good to cherry-pick to 2.10.0rc2, but given that signal handling can be finicky, I think it'd be better that we dogfood this for some time. Then we could cherry-pick to 2.10.1. Wdyt?

slessans and others added 3 commits February 28, 2022 16:16
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interrupting an (Interactive)Process should attempt to kill the process gracefully
3 participants