-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Replace subprocess.Popen with safer get_subprocess_output #1892
Conversation
I have tested these on personal_chef, and fixed any issues that came up. |
7d404eb
to
f14168b
Compare
@@ -55,8 +57,13 @@ def _get_queue_count(self, directory, queues, tags): | |||
# can dd-agent user run sudo? | |||
test_sudo = os.system('setsid sudo -l < /dev/null') | |||
if test_sudo == 0: | |||
count = os.popen('sudo find %s -type f | wc -l' % queue_path) | |||
count = count.readlines()[0].strip() | |||
find = get_subprocess_output(['sudo', 'find', queue_path, '-type', 'f'], self.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second command is
wc -l
so you can change this whole block with:
count = len(get_subprocess_output(['sudo', 'find', queue_path, '-type', 'f'], self.log).splitlines())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, clever :) I missed it! shakes fist at the sky
systemStats['cpuCores'] = int(wc.communicate()[0]) | ||
grep = get_subprocess_output(['grep', 'model name', '/proc/cpuinfo'], log) | ||
# Must use tempfiles to redirect stdout to second command | ||
tempgrep = tempfile.TemporaryFile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here
a743675
to
e46fa87
Compare
@remh incorporated all your changes :) Sorry I sorta jumped the 🔫 on squashing the commits. All changes have been tested! 👍 |
@@ -385,8 +385,7 @@ def _get_server_pid(self, db): | |||
if pid is None: | |||
try: | |||
if sys.platform.startswith("linux"): | |||
ps = subprocess.Popen(['ps', '-C', 'mysqld', '-o', 'pid'], | |||
stdout=subprocess.PIPE, close_fds=True).communicate()[0] | |||
ps = get_subprocess_output(['ps', '-C', 'mysqld', '-o', 'pid'], self.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use splitlines()
@JohnLZeller can you fix the conflicts please ? |
def _get_version_info(self, varnishstat_path): | ||
# Get the varnish version from varnishstat | ||
# FIXME: Use get_subprocess_output() instead of subprocess.Popen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do it while you're at it ?
Fixed a couple of the FIXME's. Needs a little more love, and some testing. |
Thanks @JohnLZeller can you fix the conflicts please ? |
Okay, fixed the other places where there were existing FIXME comments for subprocess, as well as merged master to catch this branch up, and resolved conflicts. Had to make some changes to the way the I need to do some testing and refactoring on broken tests in the 'morrow. Getting closer. |
Could use some feedback. @olivielpeau would you mind reviewing half? And @yannmh could you review the other half? |
df_out = utils.subprocess_output.get_subprocess_output( | ||
self.DF_COMMAND + ['-k'], self.log | ||
) | ||
df_out, err, rtcode = get_subprocess_output(self.DF_COMMAND + ['-k'], self.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general comment when using functions that return multiple values: you can discard the values that you don't use with _
. It's just a convention but makes it clear that you're not using the value afterwards. For instance here you can use:
df_out, _, _ = get_subprocess_output(self.DF_COMMAND + ['-k'], self.log)
Great job, looks way cleaner now! Looks good to me apart from my comments 👍 |
Thanks @olivielpeau :) Made the changes you mentioned, thanks for the good feedback! |
@JohnLZeller can you squash your commits please ? |
@remh will do |
a663b13
to
3042a1d
Compare
@remh okay, I have squashed and rebased to matched up with the current state of master |
[core] Replace subprocess.Popen with safer get_subprocess_output
I'm very interested in trying out some of these methods in a new check, where might I find a recent build that includes this code? |
Reason I ask is that http://apt.datadoghq.com/dists/nightly/ has datadog-agent_5.5.0.git.6.ff669c3-1_amd64.deb - which is back on Sept 17. |
http://apt.datad0g.com/dists/nightly/
|
Thanks @remh. I must have missed that when looking. |
Issue came up when investigating hostdown issues.
https://datadog.zendesk.com/agent/tickets/30510
It seems as though, periodically, a subprocess call hangs and ultimately causes the watchdog to reset the agent. During this period of time, there is then a host down event, followed by a host recovered event posted to the customers events page. We noticed that it seems as though the hanging happens in various checks (not the same for every host down event). Offending checks have included the customers own custom
iostat
check, as well as our ownpostfix
check, and evenresources/processes.py
.The theory is that this is caused by a hanging call to
subprocess.Popen
. There is a known issue where the use of asubprocess.PIPE
for stdout (or even stderr) could cause the process to hang once thesubprocess.PIPE
fills up. The solution to this is to useget_subprocess_output()
in our ownutils/subprocess_output.py
, which uses a file for stdout and stderr, removing such a low memory capacity issue.To avoid further chasing this problem around the agent, this PR attempts to replace the use of
subprocess.Popen
anywhere it shows up in the agent withget_subprocess_output()
. There are a few instances where further work will be needed to refactor the code in a clean way, and for those instances, I have added aFIXME
note.