-
-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tuning a simple shell pipeline (are we missing SIGPIPE?) #373
Comments
My best workaround is to do:
But according to |
So However, when you use sh.head(sh.head('/usr/share/dict/words', _piped=True, n=10000)) Not sure if this is a bug with sh |
Also as you can probably see from |
Thanks for the quick response. Are you using head: error writing 'standard output': Broken pipe |
I'm on 1.12.10. Running it a few times now fails for me sometimes. If I change it to |
Ok, this issue should be fixed in 1.12.11. Basically the problem was related to SIGPIPE as you originally suggested. Python sets SIGPIPE to SIG_IGN on startup, so spawned processes were ignoring SIGPIPE. But they were still dying (from error code 1) from errors when a write reported EPIPE. However, a race existed because sometimes the piping source process finished before the piping destination process, so there never was a "hang up" on the fd (and therefore never an EPIPE)...the data just stayed in the pipe buffer until the destination process could read from it. I imagine that's why the tests never caught it. Your sample code caught it because the cat was longer lived than the head and the head didn't consume all the data. The fix was to make sure spawned processes saw SIGPIPE, and then suppress any exception generated by Anyways good find, confirm that it works for you and let me know. |
Works great, thanks for fixing this so quickly. I'm glad my hunch about SIGPIPE was useful :-) |
Note that import sh
head = sh.head
cat = sh.cat
%timeit -n 100 -r 5 head(cat('/usr/share/dict/words', _piped=True))
100 loops, best of 5: 30.8 ms per loop And, here, I use the aforementioned kludge of invoking the entire command in a single import sh
shell = sh.sh
%timeit -n 100 -r 5 shell('-c', 'cat /usr/share/dict/words | head')
100 loops, best of 5: 17.2 ms per loop Finally, here it is running in bash: $ time sh -c 'cat /usr/share/dict/words | head'
A
a
aa
aal
aalii
aam
Aani
aardvark
aardwolf
Aaron
real 0m0.011s
user 0m0.004s
sys 0m0.007s |
Does that performance gap shrink the longer the processes run? Try piping to |
The difference doesn't shrink on my system as More importantly, thanks for fixing this, it's much better! |
* pypi readme doc bugfix [PR#377](amoffat/sh#377) * bugfix for relative paths to `sh.Command` not expanding to absolute paths [#372](amoffat/sh#372) * updated for python 3.6 * bugfix for SIGPIPE not being handled correctly on pipelined processes [#373](amoffat/sh#373)
I'm trying to convert a bash script to use
sh
, versionsh==1.12.10
. Here's an example command that I'm trying:sh.head(sh.cat('/usr/share/dict/words', _piped='direct'))
The above takes about 2.81 seconds on my system (timed with
%timeit
in an IPython shell).When running in bash, using the command
time (cat /usr/share/dict/words | head)
, it is of course much quicker: 4 milliseconds.Am I using
sh
correctly? The docs don't seem to cover troubleshooting pipeline performance issues.The text was updated successfully, but these errors were encountered: