Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve FastChildWatcher with WNOWAIT? #70124

Closed
WGH mannequin opened this issue Dec 23, 2015 · 15 comments
Closed

Improve FastChildWatcher with WNOWAIT? #70124

WGH mannequin opened this issue Dec 23, 2015 · 15 comments
Labels
pending The issue will be closed if no feedback is provided topic-asyncio

Comments

@WGH
Copy link
Mannequin

WGH mannequin commented Dec 23, 2015

BPO 25936
Nosy @gvanrossum, @vstinner, @1st1

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-12-23.23:58:04.667>
labels = ['expert-asyncio']
title = 'Improve FastChildWatcher with WNOWAIT?'
updated_at = <Date 2015-12-26.03:19:19.178>
user = 'https://bugs.python.org/WGH'

bugs.python.org fields:

activity = <Date 2015-12-26.03:19:19.178>
actor = 'gvanrossum'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['asyncio']
creation = <Date 2015-12-23.23:58:04.667>
creator = 'WGH'
dependencies = []
files = []
hgrepos = []
issue_num = 25936
keywords = []
message_count = 3.0
messages = ['256946', '257015', '257016']
nosy_count = 4.0
nosy_names = ['gvanrossum', 'vstinner', 'yselivanov', 'WGH']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue25936'
versions = []

@WGH
Copy link
Mannequin Author

WGH mannequin commented Dec 23, 2015

The problem with FastChildWatcher lies in the fact that it can accidentally reap processes that it doesn't watch.

However, os module includes waitid function (since Python 3.3), and it has WNOWAIT flags, which means "return status, let process remain waitable (=don't reap)".

What do you think, can this feature fix the problem with FastChildWatcher?

@WGH WGH mannequin added the topic-asyncio label Dec 23, 2015
@gvanrossum
Copy link
Member

Hm... Looks like this wouldn't work unless we also implemented the waitid() (note: 'id', not 'pid') syscall, as waitpid() doesn't support this flag (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=840782).

@gvanrossum
Copy link
Member

Oh, hm, we do seem to have os.waitid() but not on OS X... Well that makes producing a reasonable patch much more complicated. Maybe you want to add a 3rd ChildWatcher just for Linux (or just for platforms that have os.waitid)? (If so, please submit a PR to https://github.com/python/asyncio, which we use as "upstream".)

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ezio-melotti ezio-melotti moved this to Todo in asyncio Jul 17, 2022
@kumaraditya303
Copy link
Contributor

asyncio has multiple child watchers, it depends on your application to choose the best. PidfdChildWatcher is the best as it avoids threads and has best performance. Marking as pending as no PR or patch was provided.

@kumaraditya303 kumaraditya303 added the pending The issue will be closed if no feedback is provided label Jul 30, 2022
@gvanrossum
Copy link
Member

@graingert Or maybe we should close this as there have been no comments since 2015 and you seem to want to get rid of child watchers?

@graingert
Copy link
Contributor

PidfdChildWatcher doesn't avoid threads. It uses the main thread event loop.

@graingert
Copy link
Contributor

@graingert Or maybe we should close this as there have been no comments since 2015 and you seem to want to get rid of child watchers?

We probably want to to use WNOWAIT on all the child watchers if we can support it. @njsmith had a comment somewhere about it

@graingert
Copy link
Contributor

here's the comment: #82811 (comment)

you'd have to fix everywhere that uses os.waitpid() at the same time.

@gvanrossum
Copy link
Member

I wonder if we can get @gpshead interested in this issue…

@graingert
Copy link
Contributor

graingert commented Jul 31, 2022

currently os.close() has the same sharp edges as os.waitpid() but the reason nobody ever has a problem with it is because python has a thread-safe io.FileIO wrapper. And pretty much every fd API goes via this wrapper object, and so consumers simply fall into the pit of success and never need to call os.close() themselves. To anyone that needs to use os.close() it just seems obvious that os.close(f.fileno()) will do something undesirable to f.

to fix os.waitpid(), Python needs to introduce a new api that is the lightest possible thread-safe wrapper around a pid, and then provide new versions of all the APIs that directly expose pid ints so that everyone uses that interface, eg:

class Pid:
    def __init__(self, pid):
        self._pid = pid

    def pidno(self):
        return self._pid

    def closed(self):
        return self._pid != -1

    def reap(self):
        with self._lock:  # _pyio.FileIO is probably missing this lock
             pid = self._pid
             self._pid = -1
        os.waitpid(pid, ...)

@gpshead
Copy link
Member

gpshead commented Jul 31, 2022

asyncio.FastChildwatcher
is documented as "This implementation reaps every terminated processes by calling os.waitpid(-1) directly, possibly breaking other code spawning processes and waiting for their termination."

so is there really anything to fix for this specific issue? FastChildWatcher isn't supposed to be safe.

@gpshead
Copy link
Member

gpshead commented Jul 31, 2022

to fix os.waitpid() ...

Don't describe that as "fixing" os.waitpid(). What you describe is inventing a new API and idiom on top of the os level semantics that the module intentionally exposes. All sorts of real code uses os.waitpid() and int pid values. The os module is intentionally low level. If you wanted a new type, I suggest looking into the feasibility of class Pid(int): as existing APIs could return that theoretically without breaking existing code. No new APIs to return it required. Things that consume pid integers still work, and could be updated to inspect the type and call methods on it when it is a Pid instance rather than a plain int.

But this doesn't really have anything to do with this issue, it belongs in its own feature request with its own problem statement. We do still have very minor race conditions in our stdlib libraries that use waitpid. As Nathaniel noted in the linked comment: It only solves the problem so long as code is using the specific APIs that do the non-waiting waitid, lock, re-check & actually-wait dance behind the scenes. It'd be a minor improvement if we actually did that.

Why doesn't this come often up as a real problem? In general OSes attempt to avoid this scenario by not recycling PIDs rapidly so that running programs do not wind up seeing other processes that have reused another processes pid when they make their pid based calls. With a modern 32bit PID space this is doable even if you have a 1000 CPU thread system. It works well enough until someone starts an antagonistic forkbomb or threadbomb to rapidly consume and throw away PIDs. (yet another form of denial of service, intentional or not) So it's primarily a way to be safer with process management in the face of things that aren't supposed to happen. But by the time that has happened a lot is likely to go wrong on the system anyways so this usually isn't the top priority.

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Aug 1, 2022

FWIW before proposing solutions we should consider if this is even an real issue which cannot be solved with other solutions like using an alternate existing watcher implementation. The current state of child watcher system is complicated and almost all child watcher except pidfd has some form of race conditions or performance issue hence I recommended to try it. In the future I would like asyncio to always try to use pidfd on newer kernel versions and on old kernels fallback to use the most stable one instead of user trying to guess which one is good for their application which most of the time they get wrong.

@graingert
Copy link
Contributor

FWIW before proposing solutions we should consider if this is even an real issue which cannot be solved with other solutions like using an alternate existing watcher implementation. The current state of child watcher system is complicated and almost all child watcher except pidfd has some form of race conditions or performance issue hence I recommended to try it. In the future I would like asyncio to always try to use pidfd on newer kernel versions and on old kernels fallback to use the most stable one instead of user trying to guess which one is good for their application which most of the time they get wrong.

Yep there's really no need to configure a child watcher for advanced users, they can copy the subprocess transport code and use their own OS facilities that they need

@kumaraditya303
Copy link
Contributor

I'm closing this as there isn't anything actionable here, and it is documented that it is unsafe.

@kumaraditya303 kumaraditya303 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2022
Repository owner moved this from Todo to Done in asyncio Sep 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided topic-asyncio
Projects
Status: Done
Development

No branches or pull requests

4 participants