Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][subprocess output] Fix unintended empty output errors #3024

Merged
merged 1 commit into from
Nov 14, 2016
Merged

[core][subprocess output] Fix unintended empty output errors #3024

merged 1 commit into from
Nov 14, 2016

Conversation

byronwolfman
Copy link
Contributor

@byronwolfman byronwolfman commented Nov 12, 2016

What does this PR do?

This PR accomplishes two related goals: primarily, it increases the robustness by which subprocess_output.py determines if the output is empty. Secondly, it updates postfix.py so that it does not raise an error when it examines empty postfix queues.

This recent PR refactored and cleaned up subprocess_output.py. At the same time, the conditional used to validate the emptiness (or non-emptiness) of the output was subtly changed. Specifically, emptiness was previously checked via

if output is None:

and after the refactor was checked via

if not output:

This was an important change, since the output from subprocess.Popen is always non-None, and therefore the condition is None would never evaluate to True. Conversely, if not output correctly evaluates to True for empty output, but it will also (and undesirably) evaluate to True for non-empty output that is literally 0.

Since we want to know about zero-length output that is never None, we should measure the length of the output and evaluate that, i.e.

if not len(output):

This will evaluate to True for empty output, but False for a literal output of 0. This PR makes this change.

This PR also makes a related change: the postfix.py check now passes raise_on_empty_output=False to get_subprocess_output since the postfix check functions by counting the number of regular files found in a given directory. Empty postfix queues produce no output for this check, which causes it to raise SubprocessOutputEmptyError without the False parameter. By passing the False parameter, we indicate that empty output is permitted, and the error is not raised.

Motivation

After updating from agent 5.9.1 => 5.10.0 in my testing environment, I noticed that I had stopped receiving postfix metrics. A look inside the collector log revealed this:

2016-11-10 01:27:45 UTC | ERROR | dd.collector | checks.postfix(__init__.py:788) | Check 'postfix' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 771, in run
    self.check(copy.deepcopy(instance))
  File "/opt/datadog-agent/agent/checks.d/postfix.py", line 33, in check
    self._get_queue_count(directory, queues, tags)
  File "/opt/datadog-agent/agent/checks.d/postfix.py", line 64, in _get_queue_count
    output, _, _ = get_subprocess_output(['sudo', 'find', queue_path, '-type', 'f'], self.log)
  File "/opt/datadog-agent/agent/utils/subprocess_output.py", line 40, in get_subprocess_output
    raise SubprocessOutputEmptyError("get_subprocess_output expected output but had none.")
SubprocessOutputEmptyError: get_subprocess_output expected output but had none.

This host handles almost no mail, so the postfix check will almost always produce empty output. This empty output was not a problem in 5.9.1 though, since empty output was never evaluated as such.

I first attempted to fix this just by adding the False parameter to get_subprocess_output in postfix.py; this worked, but felt like an incomplete solution, hence the update to subprocess_output.py which should allow it to accurately evaluate empty and non-empty output.

@degemer
Copy link
Member

degemer commented Nov 14, 2016

Hey @byronwolfman, thanks for the PR and the very detailed explanation!

The postfix change is definitely needed, and we'd like to merge it for 5.10.1.

As for the change in get_subprocess_output, even though I understand the change, I'd prefer to keep the old approach. output comes from a read of the temporary file, which could be None (very unlikely since we seek it before, but...).
Basically I'd prefer the function to raise a SubprocessOutputEmptyError if the output is None or even 0 (which theoretically cannot happen since .read() returns a string) instead of a TypeError (raised by len(None) or len(0)). What do you think?

@byronwolfman
Copy link
Contributor Author

Hey! Thanks for reading through the small novella, @degemer.

You're right -- a string value of "0" as returned by read() will be treated correctly by if not output: whereas I was looking at a (impossible) case of an int being returned. Although I view the use of len as a truer measurement of whether or not a string is empty, this would be at the expense of readability. As long as both approaches are equally valid, let's go with the clearer one by reverting to

if not output:

What don't think we can do is revert all the way back to

if output is None:

...as this is what hid the postfix check bug in the first place. Fortunately if not output: evaluates correctly whether the output equals "" or None which seems desirable.

What's the next step? Is it best if I commit a revert to subprocess_output.py or is there a preference for squashed commits?

@degemer
Copy link
Member

degemer commented Nov 14, 2016

@byronwolfman Thanks for the quick response!
Totally agree, we should (and will) keep the if not output.

We usually prefer squashed commits if you don't mind. :)

A recent fix to subprocess_output.py in dd-agent 5.10.0 exposed an
issue in the postfix check. This fix added robustness to the way that
subproces_output.py determines whether or not the output is empty, and
since the postfix check produces empty output when evaluating empty
queues, this caused an error to be raised.

By passing raise_on_empty_output=False we can assert that empty output
is acceptable for the postfix check, and receive metrics (rather than
raising an error).
@byronwolfman
Copy link
Contributor Author

Thanks @degemer. Squashed, and all yours once the automated tests clear. :)

@degemer degemer merged commit 642e374 into DataDog:master Nov 14, 2016
@degemer
Copy link
Member

degemer commented Nov 14, 2016

Thanks again, merged, will go out with 5.10.1 (planned this week).

@byronwolfman byronwolfman deleted the robust-empty-check branch November 14, 2016 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants