Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shell processes leaked when windows are closed #364

Closed
goonzoid opened this issue Aug 31, 2023 · 4 comments
Closed

Shell processes leaked when windows are closed #364

goonzoid opened this issue Aug 31, 2023 · 4 comments

Comments

@goonzoid
Copy link
Member

goonzoid commented Aug 31, 2023

This was a fun one to debug. I thought I was going crazy for a bit, but I think I understand what's happening now.

TL;DR: On MacOS (haven't tested linux), Ghostty leaks shell processes when closing windows because it calls waitpid too soon after sending SIGHUP.

I noticed Ghostty was leaking zsh processes when I was closing windows. Initially I suspected another zsh shell-integration issue, but it was still happening after I disabled shell-integration. I also wiped out my .zshrc so that I could be sure it wasn't anything broken in my config (my prompt, for example, seems to run an additional zsh process in the background).

I added some logging that confirmed that the process was getting signaled as intended, and that waitpid was being called on the correct pid. So... what was the issue?

Well, it turns out that the issue is related to the way that we call waipid with WNOHANG. In this case, waipid will return a pid of 0 if no processes "wish to report status". Based on my testing, it seems that in this case, it's as though wait has never been called for the child process, and so when it exits, it doesn't get reaped.

Inserting a short delay before calling waitpid fixed the issue, at which point the result returned by waitpid had a status that matched the WIFSIGNALED macro, as we'd expect.

Obviously we don't want to just add an arbitrary delay in there, but we can leverage the pid of 0 in the return value to detect whether the signal has registered yet, and that seems to work fine in my testing. Here's a fix that does that: main...goonz/fix-zomby-subprocesses

For completeness, I'll mention that simply calling waitpid without WNOHANG also fixes the issue, but I don't think that's a good idea, for exactly the reasons the flag was being used in the first place.

I haven't yet opened a PR for this for two reasons:

  1. I don't feel great about spinning around a syscall like that, and so I'm wondering if there's a better solution
  2. Someone should test this on Linux.

Let me know if I'm being overly cautious and I should just open a PR!

@mitchellh
Copy link
Contributor

Thank you for the thorough description. Open a PR. This looks good to me. Maybe lets ask the discord if anyone knows if this is a bad idea?

Spinning on the syscall seems fine because this is going to be called from termio which is always on its own dedicated thread per surface. And second, the syscall will always eventually return non-zero, right?

@goonzoid
Copy link
Member Author

Yeah, I think it's very likely fine. If the call doesn't eventually return non-zero, then something else has very likely already gone wrong. I have a tendency to be overly paranoid about that kind of thing.

The one thing that could possibly go wrong would be if someone called wait(true) and so we don't use NOHANG, at which point we're spinning on a blocking call... which would not be wonderful. Currently the only things that call that function that way are tests, though.

@mitchellh
Copy link
Contributor

The one thing that could possibly go wrong would be if someone called wait(true) and so we don't use NOHANG, at which point we're spinning on a blocking call... which would not be wonderful

Hah, coincidentally just commented on this in the PR.

@mitchellh
Copy link
Contributor

Fixed by #365

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants