Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding SIGCHLD signal handler #38

Merged
merged 9 commits into from
Jul 12, 2021
Merged

adding SIGCHLD signal handler #38

merged 9 commits into from
Jul 12, 2021

Conversation

mitnk
Copy link
Owner

@mitnk mitnk commented Jul 9, 2021

try fix #33

@mitnk mitnk changed the title WIP: adding SIGCHLD signal handler adding SIGCHLD signal handler Jul 10, 2021
@godmar
Copy link

godmar commented Jul 11, 2021

You can't just install a SIGCHLD handler and allow it to execute at arbitrary points in time. The Rust runtime is not async-signal-safe. Imagine SIGCHLD arrives in the middle of an allocation or other operation. This means your signal handler cannot safely execute any Rust code.

Instead, you need to think about when it is safe to run the SIGCHLD handler. Generally, you need to ensure that no Rust code is in progress when SIGCHLD fires and the handler executes. One strategy to do this is to block SIGCHLD (via sigprocmask) during all sections where you run Rust code.

Specifically, you could unblock it only when you call libc::read to read input from the terminal. This will handle the case where children aren't being reaped while the shell waits for input on the prompt. To handle the first case (children exiting while you're waiting for a fg job) I'd recommend to call waitpid(-1, ...)

@mitnk
Copy link
Owner Author

mitnk commented Jul 11, 2021

Thanks. I'll try find places need to block the signal.

To handle the first case (children exiting while you're waiting for a fg job) ... to call `waitpid(-1, ...

In what real cases, is waitpid(-1) needed? I don't see the necessary of using it at least for normal cases like following (the sig handler will reap the bg job):

$ sleep 5 &
$ sleep 100

@godmar
Copy link

godmar commented Jul 11, 2021

Thanks. I'll try find places need to block the signal.

To handle the first case (children exiting while you're waiting for a fg job) ... to call `waitpid(-1, ...

In what real cases, is waitpid(-1) needed? I don't see the necessary of using it at least for normal cases like following (the sig handler will reap the bg job):

$ sleep 5 &
$ sleep 100

The problem with letting the SIGCHLD handler reap this job is that you would need to unblock SIGCHLD while you're calling waitpid(). You could probably do that:

sig_unblock(SIGCHLD)
waitpid()
sig_block(SIGCHLD)

if you instead calling waitpid(-1, ...) you would save the unblock/block.
It's also more efficient to wait for all children than to poll for each bg job in a loop the way your old code did, btw.

@mitnk
Copy link
Owner Author

mitnk commented Jul 11, 2021

The problem with letting the SIGCHLD handler reap this job is that you would need to unblock SIGCHLD while you're calling waitpid()

  1. For waiting the fg job (logic of current main program code), I do not want to use wait -1, since that would change the way to organize the results of a pipeline foo | bar | baz, if bar or baz exit before foo.

  2. why I need to unblock SIGCHLD before waitpid()? I'm only expecting block and then timely unblock SIGCHLD occasionally, like:

sig_block(SIGCHLD)
sth = read(100)
sig_unblock(SIGCHLD)

i.e. main program is not blocking SIGCHLD in most time.

@mitnk
Copy link
Owner Author

mitnk commented Jul 11, 2021

Just checked some docs on Interruption of signal handlers (please search to locate section titled "Interruption of system calls and library functions by signal handlers"): There are some system calls (read() is included) that can automatically restart when interrupted by single handler (use flag SA_RESTART when installing single handler).

So I guess there may be rare places need to block/unblock SIGCHLD?

@godmar
Copy link

godmar commented Jul 11, 2021

I doesn't make sense to block around read, you need to unblock around the read.
You can't take a signal most of the time because it could arrive, for instance, when your Rust code does a Box::new() or similar and executes inside the allocator.

@mitnk
Copy link
Owner Author

mitnk commented Jul 11, 2021

You can't take a signal most of the time ...

Ah, I seem to follow you now: you mean we need to block SIGCHLD at the most time, to make Rust safe.

@godmar
Copy link

godmar commented Jul 11, 2021

See vorner/signal-hook#109 for more discussion.

The issue of restarting is orthogonal. If you unblock during the read() call and the signal handler executes, the read() will be restarted.

@mitnk
Copy link
Owner Author

mitnk commented Jul 11, 2021

@godmar Had a glance on the links you provided. Seems it's way complex-er than my expectation :( I may leave this issue/PR open before Rust address it.

Personally, I can live with both options below:
a) Keep cicada shell not having a SIGCHLD handler.
b) Assuming possibility of the signal handler breaks Rust is very very low (near not happening in real use, at least in cicada project), we can install the SIGCHLD handler without doing extra block/unblock logics.

Handling block/unblock in every corner seems over complex to me. I'll spend more time to see anything proper we have.

@godmar
Copy link

godmar commented Jul 11, 2021

Handling block/unblock in every corner seems over complex to me.

FWIW, I implemented a shell in Rust as well.
In my opinion, I think it's completely ok to leave SIGCHLD blocked all the time, except when reading input. So you don't block/unblock in every corner. You block, and you have a single site where you unblock, read, and block. (I've done this by patching rustyline).

SIGCHLD is not latency critical, and your shell will spend most of its time either waiting for a fgjob or waiting for input, so there's not much to be gained from frequent blocking/unblocking.

ps: I also don't unblock while waiting for a fgjob because I use waitpid(-1, ...)

@mitnk
Copy link
Owner Author

mitnk commented Jul 11, 2021

I implemented a shell in Rust as well

Yeah, I read that, that's nice! Any chance to make it public when you polished it maybe? :)

it's completely ok to leave SIGCHLD blocked all the time, except when reading input.

Good to know. I'll try it. Thanks

@godmar
Copy link

godmar commented Jul 11, 2021

I implemented a shell in Rust as well

Yeah, I read that, that's nice! Any chance to make it public when you polished it maybe? :)

No. I'm contemplating it for an assignment in a class I'm teaching.

@godmar
Copy link

godmar commented Jul 12, 2021

If you code:

signals::unblock_signals();
match rl.read_line() {
 ...

then a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe.

@godmar
Copy link

godmar commented Jul 12, 2021

FWIW, I'm using rustyline not linefeed. The place where I am unblocking SIGCHLD is around this function.

@mitnk
Copy link
Owner Author

mitnk commented Jul 12, 2021

a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe

@godmar correct sir. It's a tradeoff of not forking and patching lib linefeed. It's also not safe to unblock before waitpid for fg pipelines too.

Just for curious, do you ever have a Rust demo code that can produce/simulate an asycn-signal-safe issue?

@godmar
Copy link

godmar commented Jul 12, 2021

a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe

@godmar correct sir. It's a tradeoff of not forking and patching lib linefeed. It's also not safe to unblock before waitpid for fg pipelines too.

unblocking before libc::waitpid is fine because waitpid is async-signal-safe. What's not async-signal-safe is running Rust code that potentially allocates or drop.

Just for curious, do you ever have a Rust demo code that can produce/simulate an asycn-signal-safe issue?

Here's one in C: https://www.cs.cmu.edu/afs/cs/academic/class/15213-f16/www/code/15-ecf-signals/signaldeadlock.c
It'll work similarly in Rust.

@mitnk
Copy link
Owner Author

mitnk commented Jul 12, 2021

Merging PR now :)

  1. Let's track in another issue if linefeed::read_line() is a concern with signal handlers.
  2. Will handle issue of STOP/CONT status of job control in another PR.

BTW, when using waitpid(-1) for fg jobs (see last commit), the main thread has to do some sleep between waitpid()s. Not sure if this would affect commands like time <any-commands> etc. Any ideas here?

@mitnk mitnk merged commit c4f5005 into master Jul 12, 2021
@mitnk mitnk deleted the sig-handler branch July 12, 2021 16:11
@mitnk
Copy link
Owner Author

mitnk commented Jul 12, 2021

Ohh, with sleep(1ms) with waitpid(-1), now it's consuming too much CPU (~2%) with long-run commands (e.g. sleep 9999). Changing to 5ms take ~0.8% CPU and 10ms to ~0.4%. Also causing "Idle Wake Ups" (seen in Mac's Activity Monitor). (the sleep won't affect result of time foobar tho).

Seems I need to change back to sync waitpid(pid) for fg jobs.

@mitnk
Copy link
Owner Author

mitnk commented Jul 12, 2021

Oh man, I should not use WNOHANG here.. problem solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shell fails to reap child processes in a timely manner
2 participants