Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PtyProcess.spawn (and thus pexpect) slowdown in close() loop #50

Open
kolyshkin opened this issue Jan 18, 2019 · 2 comments
Open

PtyProcess.spawn (and thus pexpect) slowdown in close() loop #50

kolyshkin opened this issue Jan 18, 2019 · 2 comments

Comments

@kolyshkin
Copy link

kolyshkin commented Jan 18, 2019

The following code in ptyprocess

# Do not allow child to inherit open file descriptors from parent,
# with the exception of the exec_err_pipe_write of the pipe
# and pass_fds.
# Impose ceiling on max_fd: AIX bugfix for users with unlimited
# nofiles where resource.RLIMIT_NOFILE is 2^63-1 and os.closerange()
# occasionally raises out of range error
max_fd = min(1048576, resource.getrlimit(resource.RLIMIT_NOFILE)[0])
spass_fds = sorted(set(pass_fds) | {exec_err_pipe_write})
for pair in zip([2] + spass_fds, spass_fds + [max_fd]):
os.closerange(pair[0]+1, pair[1])

is looping through all possible file descriptors in order to close those (note that closerange() implemented as a loop at least on Linux). In case the limit of open fds (aka ulimit -n, aka RLIMIT_NOFILE, aka SC_OPEN_MAX) is set too high (for example, with recent docker it is 1024*1024), this loop takes considerable time (as it results in about a million close() syscalls).

The solution (at least for Linux and Darwin) is to obtain the list of actually opened fds, and only close those. This is implemented in subprocess module in Python3, and there is a backport of it to Python2 called subprocess32.

This issue was originally reported to docker: docker/for-linux#502

Other good reason for using subprocess (being multithread-safe) is described in #43

@takluyver
Copy link
Member

Hmm, it's annoying that closerange isn't implemented more efficiently, but I guess there's not much I can do about that.

The difficulty with using subprocess is that at the heart of ptyprocess, we call forkpty() rather than fork(). That's a different call all the way down to libc. There is a fallback implementation using fork(), with comments indicating it's for Solaris, but I don't know how well-tested it is, and I'm not confident in my abilities to write properly bulletproof low-level system code like this.

ptyprocess is also (as a component pulled out of pexpect), old code which has, over the years, attempted to support many platforms. I'm reluctant to make major changes to it, because I don't know all the platforms and use cases where it's running in the wild. So maybe you're better off writing a new module under a different name to do the same thing without all the legacy concerns. I'm happy to link to it from the docs.

@kolyshkin
Copy link
Author

kolyshkin commented Mar 1, 2019

@takluyver well, perhaps you can do something like this instead: python/cpython@5626fff

The good news is:

  1. it's already written, just need to be adopted to your codebase;
  2. it's a small change, relatively easy to review;
  3. this is Linux-only and won't break any other platform;
  4. it falls back to the existing implementation in case it's not working;
  5. it will speed up your software considerably under Docker on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants