[core][subprocess output] Fix unintended empty output errors #3024

byronwolfman · 2016-11-12T22:44:20Z

What does this PR do?

This PR accomplishes two related goals: primarily, it increases the robustness by which subprocess_output.py determines if the output is empty. Secondly, it updates postfix.py so that it does not raise an error when it examines empty postfix queues.

This recent PR refactored and cleaned up subprocess_output.py. At the same time, the conditional used to validate the emptiness (or non-emptiness) of the output was subtly changed. Specifically, emptiness was previously checked via

if output is None:

and after the refactor was checked via

if not output:

This was an important change, since the output from subprocess.Popen is always non-None, and therefore the condition is None would never evaluate to True. Conversely, if not output correctly evaluates to True for empty output, but it will also (and undesirably) evaluate to True for non-empty output that is literally 0.

Since we want to know about zero-length output that is never None, we should measure the length of the output and evaluate that, i.e.

if not len(output):

This will evaluate to True for empty output, but False for a literal output of 0. This PR makes this change.

This PR also makes a related change: the postfix.py check now passes raise_on_empty_output=False to get_subprocess_output since the postfix check functions by counting the number of regular files found in a given directory. Empty postfix queues produce no output for this check, which causes it to raise SubprocessOutputEmptyError without the False parameter. By passing the False parameter, we indicate that empty output is permitted, and the error is not raised.

Motivation

After updating from agent 5.9.1 => 5.10.0 in my testing environment, I noticed that I had stopped receiving postfix metrics. A look inside the collector log revealed this:

2016-11-10 01:27:45 UTC | ERROR | dd.collector | checks.postfix(__init__.py:788) | Check 'postfix' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 771, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/postfix.py", line 33, in check
    self._get_queue_count(directory, queues, tags)
  File "/opt/datadog-agent/agent/checks.d/postfix.py", line 64, in _get_queue_count
    output, _, _ = get_subprocess_output(['sudo', 'find', queue_path, '-type', 'f'], self.log)
  File "/opt/datadog-agent/agent/utils/subprocess_output.py", line 40, in get_subprocess_output
    raise SubprocessOutputEmptyError("get_subprocess_output expected output but had none.")
SubprocessOutputEmptyError: get_subprocess_output expected output but had none.

This host handles almost no mail, so the postfix check will almost always produce empty output. This empty output was not a problem in 5.9.1 though, since empty output was never evaluated as such.

I first attempted to fix this just by adding the False parameter to get_subprocess_output in postfix.py; this worked, but felt like an incomplete solution, hence the update to subprocess_output.py which should allow it to accurately evaluate empty and non-empty output.

degemer · 2016-11-14T18:46:51Z

Hey @byronwolfman, thanks for the PR and the very detailed explanation!

The postfix change is definitely needed, and we'd like to merge it for 5.10.1.

As for the change in get_subprocess_output, even though I understand the change, I'd prefer to keep the old approach. output comes from a read of the temporary file, which could be None (very unlikely since we seek it before, but...).
Basically I'd prefer the function to raise a SubprocessOutputEmptyError if the output is None or even 0 (which theoretically cannot happen since .read() returns a string) instead of a TypeError (raised by len(None) or len(0)). What do you think?

byronwolfman · 2016-11-14T19:18:54Z

Hey! Thanks for reading through the small novella, @degemer.

You're right -- a string value of "0" as returned by read() will be treated correctly by if not output: whereas I was looking at a (impossible) case of an int being returned. Although I view the use of len as a truer measurement of whether or not a string is empty, this would be at the expense of readability. As long as both approaches are equally valid, let's go with the clearer one by reverting to

if not output:

What don't think we can do is revert all the way back to

if output is None:

...as this is what hid the postfix check bug in the first place. Fortunately if not output: evaluates correctly whether the output equals "" or None which seems desirable.

What's the next step? Is it best if I commit a revert to subprocess_output.py or is there a preference for squashed commits?

degemer · 2016-11-14T19:26:20Z

@byronwolfman Thanks for the quick response!
Totally agree, we should (and will) keep the if not output.

We usually prefer squashed commits if you don't mind. :)

A recent fix to subprocess_output.py in dd-agent 5.10.0 exposed an issue in the postfix check. This fix added robustness to the way that subproces_output.py determines whether or not the output is empty, and since the postfix check produces empty output when evaluating empty queues, this caused an error to be raised. By passing raise_on_empty_output=False we can assert that empty output is acceptable for the postfix check, and receive metrics (rather than raising an error).

byronwolfman · 2016-11-14T19:56:25Z

Thanks @degemer. Squashed, and all yours once the automated tests clear. :)

degemer · 2016-11-14T20:13:02Z

Thanks again, merged, will go out with 5.10.1 (planned this week).

olivielpeau assigned degemer Nov 14, 2016

olivielpeau added this to the 5.10.1 milestone Nov 14, 2016

degemer added bugfix checks labels Nov 14, 2016

degemer merged commit 642e374 into DataDog:master Nov 14, 2016

byronwolfman deleted the robust-empty-check branch November 14, 2016 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][subprocess output] Fix unintended empty output errors #3024

[core][subprocess output] Fix unintended empty output errors #3024

byronwolfman commented Nov 12, 2016 •

edited

Loading

degemer commented Nov 14, 2016

byronwolfman commented Nov 14, 2016

degemer commented Nov 14, 2016

byronwolfman commented Nov 14, 2016

degemer commented Nov 14, 2016

[core][subprocess output] Fix unintended empty output errors #3024

[core][subprocess output] Fix unintended empty output errors #3024

Conversation

byronwolfman commented Nov 12, 2016 • edited Loading

What does this PR do?

Motivation

degemer commented Nov 14, 2016

byronwolfman commented Nov 14, 2016

degemer commented Nov 14, 2016

byronwolfman commented Nov 14, 2016

degemer commented Nov 14, 2016

byronwolfman commented Nov 12, 2016 •

edited

Loading