Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.Popen.send_signal() should poll the process #82811

Closed
vstinner opened this issue Oct 29, 2019 · 26 comments
Closed

subprocess.Popen.send_signal() should poll the process #82811

vstinner opened this issue Oct 29, 2019 · 26 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 38630
Nosy @vstinner, @giampaolo, @njsmith, @SpecLad, @oconnor663
PRs
  • bpo-38630: subprocess: enhance send_signal() on Unix #16984
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-01-15.16:53:35.210>
    created_at = <Date 2019-10-29.11:18:26.776>
    labels = ['library', '3.9']
    title = 'subprocess.Popen.send_signal() should poll the process'
    updated_at = <Date 2020-12-03.17:05:56.445>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-12-03.17:05:56.445>
    actor = 'oconnor663'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-01-15.16:53:35.210>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2019-10-29.11:18:26.776>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38630
    keywords = ['patch']
    message_count = 25.0
    messages = ['355645', '355647', '355706', '355856', '355917', '355918', '355928', '356073', '356074', '356075', '356076', '356077', '356084', '356085', '356571', '356593', '358137', '358142', '358146', '358147', '360064', '360065', '382382', '382397', '382430']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'giampaolo.rodola', 'njs', 'SpecLad', 'oconnor663']
    pr_nums = ['16984']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38630'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    subprocess.Popen.send_signal() doesn't check if the process exited since the poll() method has been called for the last time. If the process exit just before os.kill() is called, the signal can be sent to the wrong process if the process identified is recycled.

    Attached PR simply calls poll() once to reduce the time window when this race condition can occur, but it doesn't fully kill the race condition.

    --

    See also the new "pidfd" API which only landed very recently in the Linux kernel to prevent this issue:

    Illumos, OpenBSD, NetBSD and FreeBSD have similar concepts.

    I don't propose to use pidfd here, but it's just to highlight that it's a real issue and that kernels are evolving to provide more reliable solutions against the kill(pid, sig) race condition ;-)

    @vstinner vstinner added 3.9 only security fixes stdlib Python modules in the Lib dir labels Oct 29, 2019
    @vstinner
    Copy link
    Member Author

    I noticed this issue by reading subprocess code while working on bpo-38207 ("subprocess: Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe") and bpo-38502 ("regrtest: use process groups").

    @vstinner
    Copy link
    Member Author

    subprocess.Popen.send_signal() doesn't check if the process exited since the poll() method has been called for the last time. If the process exit just before os.kill() is called, the signal can be sent to the wrong process if the process identified is recycled.

    I'm not sure under which conditions this case happens.

    I understood that the pid is released (and so can be reused) as soon as the child process completes *and* waitpid() has been called.

    Two cases:

    • os.waitpid(pid, 0) can be called directly ("raw call")
    • Popen API used, but called from a different thread: thread A calls send_signal(), thread B calls waitpid()

    I'm mostly concerned by the multithreading case. I noticed the race condition while working on regrtest which uses 2 threads. One thread which is responsible for the process (call .communicate(), call .wait(), etc.), and one "main" thread which sends a signal to every child processes to interrupt them.

    My first implementation called .wait() after calling .send_signal() in the main thread. But it seems like Popen is not "fully" thread safe: the "waitpid lock" only protects the os.waitpid() call, it doesn't protect the self.returncode attribute.

    I modified regrtest to only call .waitpid() in the thread responsible of the process, to avoid such race condition. It seems like the new design is more reliable. The main thread only calls .send_signal(): it doesn't call .wait() (which calls os.waitpid()) anymore.

    @giampaolo
    Copy link
    Contributor

    -1 about the PR solution to suppress ProcessLookupError in case the process is gone. In psutil I solved the “pid reused problem” by using process creation time to identify a process uniquely (on start).
    A decorator can be used to protect the sensibile methods interacting with the PID/handle (communicate(), send_signal(), terminate(), kill()) and raise an exception (maybe ProcessLooKupError(“PID has been reused”)?).

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 4, 2019

    -1 about the PR solution to suppress ProcessLookupError in case the process is gone.

    Did someone propose a pull request to fix this issue by ignoring ProcessLookupError?

    In psutil I solved the “pid reused problem” by using process creation time to identify a process uniquely (on start).

    Well, that requires to write platform specific code.

    I would prefer to only use platform specific code when the platform provides something like pidfd: I understand that Linux, FreeBSD, OpenBSD, NetBSD, and Illumos are concerned. I mean reuse an exisiting solution, rather than writing our own "workaround" (solution?).

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 4, 2019

    I would prefer to only use platform specific code when the platform provides something like pidfd: I understand that Linux, FreeBSD, OpenBSD, NetBSD, and Illumos are concerned. I mean reuse an exisiting solution, rather than writing our own "workaround" (solution?).

    By the way, I suggest to stick to the initial problem in this issue. If someone is intersted to explore the pidfd idea, I recommend to open a different issue ;-)

    @giampaolo
    Copy link
    Contributor

    Did someone propose a pull request to fix this issue by ignoring ProcessLookupError?

    I misread your PR, sorry. I thought that was the effect.

    @njsmith
    Copy link
    Contributor

    njsmith commented Nov 5, 2019

    It sounds like there's actually nothing to do here? (Except maybe eventually switch to pidfd or similar, but Victor says he wants to use a different issue for that.) Can this be closed?

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 5, 2019

    It sounds like there's actually nothing to do here? (Except maybe eventually switch to pidfd or similar, but Victor says he wants to use a different issue for that.) Can this be closed?

    I opened this issue to propose PR 16984. Did you notice the PR?

    In short, I propose to call poll() before calling os.kill() in send_signal().

    @njsmith
    Copy link
    Contributor

    njsmith commented Nov 5, 2019

    But then deeper in the thread it sounded like you concluded that this didn't actually help anything, which is what I would expect :-).

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 5, 2019

    But then deeper in the thread it sounded like you concluded that this didn't actually help anything, which is what I would expect :-).

    The PR contains a test which demonstrate one race condition. It fails without the fix.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 5, 2019

    (Except maybe eventually switch to pidfd or similar, but Victor says he wants to use a different issue for that.)

    pidfd is not available on all platforms (ex: Windows).

    @njsmith
    Copy link
    Contributor

    njsmith commented Nov 6, 2019

    You can't solve a time-of-check-to-time-of-use race by adding another check. I guess your patch might narrow the race window slightly, but it was a tiny window before and it's a tiny window after, so I don't really get it.

    The test doesn't demonstrate a race condition. What it demonstrates is increased robustness against user code that corrupts Popen's internal state by reaping one of its processes behind its back.

    That kind of robustness might be a good motivation on its own! (I actually just did a bunch of work to make trio more robust against user code closing fds behind its back, which is a similar kind of thing.) But it's a very different motivation than what you say in this bug.

    @njsmith
    Copy link
    Contributor

    njsmith commented Nov 6, 2019

    Hmm, you know, on further thought, it is actually possible to close this race for real, without pidfd.

    What you do is split 'wait' into two parts: first it waits for me process to become reapable without actually reaping it. On Linux you can do this with waitid+WNOWAIT. On kqueue platforms you can do it with kqueue.

    Then, you take a lock, and use it to atomically call waitpid/waitid to reap the process + update self.returncode to record that you've done so.

    In send_signal, we use the same lock to atomically check whether the process has been reaped, and then send the signal if it hasn't.

    That doesn't fix your test, but it does fix the race condition: as long as users only interact with the process through the Popen API, it guarantees that send_signal will never rather the wrong process.

    @giampaolo
    Copy link
    Contributor

    To further elaborate on the creation time solution, the idea in pseudo-code is the following:

    class Popen:
    
        def __init__(self, ...):
            self._execute_child(...)
            try:
                self._ctime = get_create_time(self.pid)
            except ProcessLookupError:
                self._ctime = None
    
        def send_signal(self, sig):
            if self._ctime is not None:
                if self._ctime != get_create_time(self.pid):
                    raise ProecssLookupError("PID has been reused")
            os.kill(self.pid, sig)

    Technically there is still a race condition between _execute_child() and get_create_time(), but:

    1. the time between the two calls is supposed to be extremely short
    2. an OS should not reuse a PID that fast, precisely in order to minimize the chances of such a race condition to occur
    3. as a general rule, on most systems PIDs are supposed to be assigned sequentially and wrap around. E.g. on Linux:

    ~$ for i in $(seq 20); do ps; done | grep ps
    3199 pts/0 00:00:00 ps
    3200 pts/0 00:00:00 ps
    3201 pts/0 00:00:00 ps
    3202 pts/0 00:00:00 ps
    3203 pts/0 00:00:00 ps
    3204 pts/0 00:00:00 ps
    3205 pts/0 00:00:00 ps
    3207 pts/0 00:00:00 ps
    3208 pts/0 00:00:00 ps
    3209 pts/0 00:00:00 ps
    3210 pts/0 00:00:00 ps
    3213 pts/0 00:00:00 ps
    3214 pts/0 00:00:00 ps
    3215 pts/0 00:00:00 ps
    3216 pts/0 00:00:00 ps
    3217 pts/0 00:00:00 ps
    3218 pts/0 00:00:00 ps
    3219 pts/0 00:00:00 ps
    3221 pts/0 00:00:00 ps
    3223 pts/0 00:00:00 ps

    As for Windows, the termination is based on the process handle instead of the PID, so I assume that means Windows is safe in this regard. There was a prior discussion about this actually:
    https://bugs.python.org/issue36067#msg336243
    Getting process creation time is relatively easy, even though platform-dependent (and should be done in C).
    pidfd API would be nicer to use, but it seems it's not portable and relatively recent.

    @vstinner
    Copy link
    Member Author

    To further elaborate on the creation time solution, the idea in pseudo-code is the following: (...)
    Technically there is still a race condition between _execute_child() and get_create_time(), but: (...)

    Even if the approach seems to be different, PR 16984 seems to have the same advantages and drawbacks.

    PR 16984 rely on the fact that the Linux kernel doesn't use a pid until its parent calls os.waitpid(). Even if the child process completed, the pid remains assigned (and the process status is "zombie").

    PR 16984 has the advantage that it reuses existing code which has been battle tested: we are sure that the code is portable and works well.

    Whereas get_create_time() will be likely platform specific and will add a little maintenance burden.

    I know that PR 16984 is a partial solution (there is still a race condition if waitpid() is called directly, without using the Popen API), but I don't see a strong reason to prefer get_create_time().

    @SpecLad
    Copy link
    Mannequin

    SpecLad mannequin commented Dec 9, 2019

    What you do is split 'wait' into two parts: first it waits for me process to become reapable without actually reaping it. On Linux you can do this with waitid+WNOWAIT. On kqueue platforms you can do it with kqueue.

    Then, you take a lock, and use it to atomically call waitpid/waitid to reap the process + update self.returncode to record that you've done so.

    In send_signal, we use the same lock to atomically check whether the process has been reaped, and then send the signal if it hasn't.

    It's a good idea, but it would break the scenario where two threads call wait() concurrently. It would create this race condition:

    1. Thread A reaps the process.
    2. Thread B thinks the process is still running, so it calls waitid+WNOHANG on a stale PID, with unpredictable results.
    3. Thread A sets self.returncode.

    What is needed here is a reader-writer lock. subprocess.wait would work like this (pseudocode):

    with lock taken for reading:
        os.waitid(..., WNOHANG)
    with lock taken for writing:
        os.waitid(...)
        self.returncode = ...

    Whereas subprocess.send_signal would work like this:

    with lock taken for reading:
        os.kill(...)

    Unfortunately, it doesn't seem like Python has reader-writer locks in the library...

    @njsmith
    Copy link
    Contributor

    njsmith commented Dec 9, 2019

    Thread B thinks the process is still running, so it calls waitid+WNOHANG on a stale PID, with unpredictable results.

    I'm pretty sure you mean WNOWAIT, right? The actual reaping step (which might use WNOHANG) is already protected by a lock, but there's a danger that a process might try to perform the blocking-until-exit step (which uses WNOWAIT) on a stale pid.

    Good catch!

    I think this can be fixed by using a second lock around all of .wait(). But this is a bit tricky because we also need to account for concurrent .poll() calls. So something like:

    def wait(self):
        # Don't take the big lock unless the process is actually still running
        # This isn't just an optimization; it's critical for poll() correctness
        if self.poll() is not None:
            return self.returncode
    
        with self._blocking_wait_lock:
            with self._returncode_lock:
                revalidate pid licenses
    
            block_until_child_exits()
    
            with self._returncode_lock:
                reap the child and mark it as reaped
    
    def poll(self):
        # if someone is already blocked waiting for the child, then it definitely
        # hasn't exited yet, so we don't need to call waitpid ourselves.
        # This isn't just an optimization; it's critical for correctness.
        if not self._blocking_wait_lock.acquire(blocking=False):
            return None
        try:
            with self._returncode_lock:
                do the actual poll + returncode updating
        finally:
            self._blocking_wait_lock.release()

    Notes:

    If there's already a wait() running, and someone calls poll(), then we have to make sure the poll() can't reap the process out from under the wait(). To fix that, poll skips trying in case wait is already running.

    But, this could introduce its own race condition: if a process has exited but we haven't noticed yet, and then someone calls wait() and poll() simultaneously, then the wait() might take the lock, then poll() notices the lock is taken, and concludes that the process can't have exited yet. If course wait() will immediately reap the process and drop the lock, but by this point poll() has already returned the wrong information.

    The problem is that poll() needs to be able to treat "the blocking wait lock is taken" as implying "the process hasn't exited yet". So to make that implication true, we add a check at the top of wait().

    Of course if a process exits while wait() is running, and someone calls poll() in that brief interval between it exiting and wait() noticing, then poll() could again fail to report it exiting. But I think this is fine: it's ok if poll() is a fraction of a second out of date; that race condition is inherent in its API. The problem would be if the fails to notice a process that exited a while ago and is just sitting around waiting to be reaped.

    ... Ugh but there is still one more race condition here. I think this fixes all the cases involving send_signal, wait, and poll interactions with each other, BUT we broke the case of two threads calling poll() at the same time. One thread will take the blocking_wait_lock, then the other will take this as evidence that someone is blocked in wait() and exit early, which isn't appropriate.

    I think we can patch up this last race condition by adding yet one more lock: a simple

    with self._poll_lock:

    around the whole body of poll(), so that only one thread can call it at a time.

    @SpecLad
    Copy link
    Mannequin

    SpecLad mannequin commented Dec 9, 2019

    I'm pretty sure you mean WNOWAIT, right?

    Right.

    revalidate pid licenses

    What does this mean?

    I think we can patch up this last race condition by adding yet one more lock

    Wouldn't it be sufficient to add

    if self.returncode is not None:
        return self.returncode

    at the top of poll()?

    (And in fact, you don't even need to add it, since an equivalent of this check is already there.)

    I think this makes calling poll() from wait() unnecessary too.

    @njsmith
    Copy link
    Contributor

    njsmith commented Dec 9, 2019

            revalidate pid licenses
    

    It means autocorrect mangled the text... I don't remember what word that's supposed to be, but basically I meant "revalidate the pid hasn't been reaped"

    Wouldn't it be sufficient to add

    if self.returncode is not None:
    return self.returncode

    at the top of poll()?

    No, the race condition is:

    1. Process exits
      [some time passes]
    2. Thread 1 and Thread 2 both call poll() simultaneously
    3. Thread 1 and Thread 2 both see that the process hasn't been reaped
    4. Thread 1 attempts to take the blocking_wait_lock, to make sure no-one is blocked in wait(). It succeeds.
    5. Thread 2 attempts to take the blocking_wait_lock, and it fails (because Thread 1 already has it). Thread 2's poll() returns saying that the process hasn't exited yet (which is wrong!)
    6. Thread 1 calls waitpid(..., WNOHANG) and reaps the process, then releases the lock.

    So adding a check at the top (= step 3) doesn't help. The problem is in steps 4/5.

    @vstinner
    Copy link
    Member Author

    New changeset e85a305 by Victor Stinner in branch 'master':
    bpo-38630: Fix subprocess.Popen.send_signal() race condition (GH-16984)
    e85a305

    @vstinner
    Copy link
    Member Author

    While there was no strong agreement on my change, Nathaniel and Giampaolo said that it should not harm to merge it :-) So I merged my change.

    It has been said multiple times: my change doesn't fully close the race condition. It only reduces the risk that it happens. I rephrased my change to mention that there is still room for the race condition and that polling only *reduces* the risk, it doesn't fully fix the race condition.

    --

    Changing locking can help. But I'm not sure that it covers *all* cases. For example, what if another function calls os.waitpid() directly for whatever reason and the process completed? The pid recycling issue can also happen in this case, no?

    Linux pidfd really fix the issue. If someone wants to enhance subprocess to use the new os.pidfd_open(), I suggest to open a separated issue.

    Another approach is to emulate pidfd in userspace. Giampaolo explains that he also compares the process creation time for that (msg356571). I'm not excited to workaround OS issue this way. It requires to write platform specific get_create_time() code. I would prefer to reuse what the OS provides when available: Linux pidfd.

    --

    This issue title is quite explicit on purpose: "subprocess.Popen.send_signal() should poll the process". I now close this issue: send_signal() does now poll :-)

    Because there is no *strong* consensus on my change, I decided to not backport it to stable branches 3.7 and 3.8.

    Thanks everyone for this very interesting discussion!

    @oconnor663
    Copy link
    Mannequin

    oconnor663 mannequin commented Dec 3, 2020

    This change caused a crash in the Duct library in Python 3.9. Duct uses the waitid(NOWAIT) strategy that Nathaniel Smith has described in this thread. With this change, the child process is reaped by kill() in a spot we didn't expect, and a subsequent call to os.waitid() raises a ChildProcessError. This is a race condition that only triggers if the child happens to exit before kill() is called.

    I just pushed oconnor663/duct.py@5dfae70 to work around this, which I'll release shortly as Duct version 0.6.4.

    Broadly speaking, this change could break any program that uses Popen.kill() together with os.waitpid() or os.waitid(). Checking Popen.returncode before calling the os functions is a good workaround for the single-threaded case, but it doesn't fix the multithreaded case. Duct is going to avoid calling Popen.kill() entirely. There are some longer comments about race conditions in that commit I linked.

    I don't feel strongly one way the other about keeping this new behavior, but we should probably document it clearly in Popen.send_signal() and all of its callers that these functions might reap the child.

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 3, 2020

    On Linux, a pidfd file descriptor can be created with os.pidfd_open() (added to Python 3.9). It would avoid even more race conditions. The best would be to request a pidfd file descriptor directly when we spawn the process which is possible on recent Linux kernel versions.

    we should probably document it clearly in Popen.send_signal() and all of its callers that these functions might reap the child.

    Feel free to propose a PR to document the behavior.

    I'm not sure of what you mean by send_signal() which "reaps" the child process.

    See also bpo-40550.

    Anyway, this issue and bpo-40550 are closed. I suggest to open a new issue to enhance the documentation.

    @oconnor663
    Copy link
    Mannequin

    oconnor663 mannequin commented Dec 3, 2020

    Filed separately as https://bugs.python.org/issue42558.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @graingert
    Copy link
    Contributor

    Filed here #86724

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants