Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puppet.run always returns 0 #20850

Closed
br0ch0n opened this issue Feb 20, 2015 · 18 comments
Closed

puppet.run always returns 0 #20850

br0ch0n opened this issue Feb 20, 2015 · 18 comments
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix severity-low 4th level, cosemtic problems, work around exists
Milestone

Comments

@br0ch0n
Copy link
Contributor

br0ch0n commented Feb 20, 2015

puppet.run, which seems to by default call "puppet agent -t" using cmd.run_all isn't returning the proper exit code but seems to always return 0:

local run:

root@node2:~# puppet agent -t
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for node2
Info: Applying configuration version '1424393797'
Notice: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]/content: 
--- /etc/apt/sources.list.d/docker.list 2015-02-19 16:56:27.273444832 -0800
+++ /tmp/puppet-file20150219-8098-rly1fi    2015-02-19 16:56:38.053444433 -0800
@@ -1,2 +1,3 @@
 # This file is managed by Puppet. DO NOT EDIT.
 # docker
+deb https://get.docker.com/ubuntu/ docker main

Info: Computing checksum on file /etc/apt/sources.list.d/docker.list
Info: FileBucket got a duplicate file {md5}9dada9398ca45b7a87098a786ab869bb
Info: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]: Filebucketed /etc/apt/sources.list.d/docker.list to puppet with sum 9dada9398ca45b7a87098a786ab869bb
Notice: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]/content: content changed '{md5}9dada9398ca45b7a87098a786ab869bb' to '{md5}84a0badbd70bfa2c6e118022c7612a89'
Info: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]: Scheduling refresh of Exec[apt_update]
Notice: /Stage[main]/Apt::Update/Exec[apt_update]: Triggered 'refresh' from 1 events
Notice: Finished catalog run in 15.35 seconds
root@node2:~# echo $?
2
root@node2:~# 

salt run:

root@salt:~# salt '*' puppet.run

node2:
    ----------
    pid:
        7729
    retcode:
        0
    stderr:

    stdout:
        Info: Retrieving pluginfacts
        Info: Retrieving plugin
        Info: Loading facts
        Info: Caching catalog for node2
        Info: Applying configuration version '1424393615'
        Notice: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]/content: 
        --- /etc/apt/sources.list.d/docker.list 2015-02-19 16:50:43.617457550 -0800
        +++ /tmp/puppet-file20150219-7729-zgapli    2015-02-19 16:53:36.129451166 -0800
        @@ -1,2 +1,3 @@
         # This file is managed by Puppet. DO NOT EDIT.
         # docker
        +deb https://get.docker.com/ubuntu/ docker main
        Info: Computing checksum on file /etc/apt/sources.list.d/docker.list
        Info: FileBucket got a duplicate file {md5}9dada9398ca45b7a87098a786ab869bb
        Info: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]: Filebucketed /etc/apt/sources.list.d/docker.list to puppet with sum 9dada9398ca45b7a87098a786ab869bb
        Notice: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]/content: content changed '{md5}9dada9398ca45b7a87098a786ab869bb' to '{md5}84a0badbd70bfa2c6e118022c7612a89'
        Info: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]: Scheduling refresh of Exec[apt_update]
        Notice: /Stage[main]/Apt::Update/Exec[apt_update]: Triggered 'refresh' from 1 events
        Notice: Finished catalog run in 10.13 seconds
root@salt:~#   
@jfindlay jfindlay added the info-needed waiting for more info label Feb 20, 2015
@jfindlay jfindlay added this to the Blocked milestone Feb 20, 2015
@jfindlay jfindlay self-assigned this Feb 20, 2015
@jfindlay
Copy link
Contributor

@br0ch0n, what is the output of salt --versions-report? Also, what happens if you try salt '*' cmd.run_all 'puppet agent -t' && echo $?? Thanks.

@br0ch0n
Copy link
Contributor Author

br0ch0n commented Feb 21, 2015

root@salt:~# salt --versions-report
           Salt: 2014.7.0
         Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
         Jinja2: 2.7.2
       M2Crypto: 0.21.1
 msgpack-python: 0.3.0
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 14.0.1
           RAET: Not Installed
            ZMQ: 4.0.4
           Mako: 0.9.1

and a node:

root@node1:~# salt-minion --versions-report
           Salt: 2014.7.1
         Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
         Jinja2: 2.7.2
       M2Crypto: 0.21.1
 msgpack-python: 0.3.0
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 14.0.1
           RAET: Not Installed
            ZMQ: 4.0.4
           Mako: 0.9.1

Running salt '*' cmd.run_all 'puppet agent -t' returns what I'd expect is correct behavior, i.e. a non-zero retcode on a puppet change:

node1:
    ----------
    pid:
        12574
    retcode:
        2
    stderr:

    stdout:
        ?[0;32mInfo: Retrieving pluginfacts?[0m
        ?[0;32mInfo: Retrieving plugin?[0m
        ?[0;32mInfo: Loading facts?[0m
        ?[0;32mInfo: Caching catalog for node1?[0m
        ?[0;32mInfo: Applying configuration version '1424489323'?[0m
        ?[mNotice: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]/content: 
        --- /etc/apt/sources.list.d/docker.list 2015-02-20 19:30:09.865231644 -0800
        +++ /tmp/puppet-file20150220-12575-uukx8    2015-02-20 19:30:21.181231401 -0800
        @@ -1,2 +1,3 @@
         # This file is managed by Puppet. DO NOT EDIT.
         # docker
        +deb https://get.docker.com/ubuntu/ docker main
        ?[0m
        ?[0;32mInfo: Computing checksum on file /etc/apt/sources.list.d/docker.list?[0m
        ?[0;32mInfo: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]: Filebucketed /etc/apt/sources.list.d/docker.list to puppet with sum 9dada9398ca45b7a87098a786ab869bb?[0m
        ?[mNotice: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]/content: content changed '{md5}9dada9398ca45b7a87098a786ab869bb' to '{md5}84a0badbd70bfa2c6e118022c7612a89'?[0m
        ?[0;32mInfo: /Stage[main]/Role::Fake/Apt::Source[docker]/File[docker.list]: Scheduling refresh of Exec[apt_update]?[0m
        ?[mNotice: /Stage[main]/Apt::Update/Exec[apt_update]: Triggered 'refresh' from 1 events?[0m
        ?[mNotice: Finished catalog run in 242.71 seconds?[0m

As for passing the $?, you put it outside of the quotes, so via that, the return from salt itself is 0, which I imagine is normal. If you meant for me to pass it to the nodes (inside the quotes), it retcode is 0, a zero is printed after the output, though interestingly when the retcode is nonzero, the corresponding $? is blank:

root@salt:~# salt '*' cmd.run_all 'puppet agent -t && echo $?'
node3:
    ----------
    pid:
        31869
    retcode:
        0
    stderr:

    stdout:
        ?[0;32mInfo: Retrieving pluginfacts?[0m
        ?[0;32mInfo: Retrieving plugin?[0m
        ?[0;32mInfo: Loading facts?[0m
        ?[0;32mInfo: Caching catalog for node3?[0m
        ?[0;32mInfo: Applying configuration version '1424490800'?[0m
        ?[mNotice: Finished catalog run in 1.08 seconds?[0m
        0
node2:
    ----------
    pid:
        32620
    retcode:
        0
    stderr:

    stdout:
        ?[0;32mInfo: Retrieving pluginfacts?[0m
        ?[0;32mInfo: Retrieving plugin?[0m
        ?[0;32mInfo: Loading facts?[0m
        ?[0;32mInfo: Caching catalog for node2?[0m
        ?[0;32mInfo: Applying configuration version '1424490906'?[0m
        ?[mNotice: Finished catalog run in 1.43 seconds?[0m
        0
node1:
    ----------
    pid:
        31702
    retcode:
        2
    stderr:

    stdout:
        ?[0;32mInfo: Retrieving pluginfacts?[0m
        ?[0;32mInfo: Retrieving plugin?[0m
        ?[0;32mInfo: Loading facts?[0m
        ?[0;32mInfo: Caching catalog for node1?[0m
        ?[0;32mInfo: Applying configuration version '1424490906'?[0m
        ?[mNotice: /Stage[main]/Role::Fake/Package[figlet]/ensure: ensure changed 'purged' to 'present'?[0m
        ?[mNotice: Finished catalog run in 78.62 seconds?[0m

Don't know if that's significant or not. What does seem important tho, is that cmd.run_all 'puppet agent -t 'seems good yet puppet.run (which basically just wraps the former) is not. So somehow the proper retcode is getting obscured away even though it's out there.

@jfindlay
Copy link
Contributor

Thanks for the extra info, @br0ch0n. Yeah, it is especially confusing since I read over the associated code and didn't see anything obvious.

There is a goal for beryllium to normalize and standardize module return codes, although it seems that something else might be going on here.

@jfindlay jfindlay modified the milestones: Approved, Blocked Feb 21, 2015
@jfindlay jfindlay added Bug broken, incorrect, or confusing behavior severity-low 4th level, cosemtic problems, work around exists and removed info-needed waiting for more info labels Feb 21, 2015
@dhs-rec
Copy link
Contributor

dhs-rec commented Mar 11, 2015

A return code of 2 from the puppet agent actually isn't an error. It just tells us that it succesfully applied one or more changes. What happens if the agent returns 4 or 6?

@dhs-rec
Copy link
Contributor

dhs-rec commented Mar 30, 2015

I'd suggest adding "detailed-exitcodes" to the arguments for the puppet agent here:

        if self.subcmd == 'agent':
            # no arguments are required
            args.extend([
                'onetime', 'verbose', 'ignorecache', 'no-daemonize',
                'no-usecacheonfailure', 'no-splay', 'show_diff'
            ])

or (even better) simply change the whole list to just contain 'test', which implies all the other entries + --detailed-exitcodes, according to puppet agent --help:

* --test:
  Enable the most common options used for testing. These are 'onetime',
  'verbose', 'ignorecache', 'no-daemonize', 'no-usecacheonfailure',
  'detailed-exitcodes', 'no-splay', and 'show_diff'.

With this in place, the agent returns 0 or 2 for success, everything else is failure.

@jfindlay
Copy link
Contributor

@dhs-rec, do you think you could submit a pull request against the 2014.7 branch to fix this?

@br0ch0n
Copy link
Contributor Author

br0ch0n commented Mar 30, 2015

Ah, this makes a bit more sense now. I should've looked closer at the code. The comments say that we're running 'puppet agent --test' but in actuality, the actual arguments implied by --test are being explicitly called instead (but no detailed-exitcodes, which I could've sworn was default anyway, but I guess not).
But yes, I'd say replace all of those array elements with just 'test'.

@jfindlay jfindlay removed their assignment Mar 30, 2015
@dhs-rec
Copy link
Contributor

dhs-rec commented Apr 1, 2015

Created #22234 against develop and #22235 against 2014.7.

jfindlay added a commit that referenced this issue Apr 1, 2015
Possible fix for 'puppet.run always returns 0 #20850'
@rallytime rallytime added the fixed-pls-verify fix is linked, bug author to confirm fix label Apr 1, 2015
@br0ch0n
Copy link
Contributor Author

br0ch0n commented Apr 1, 2015

If I'm reading this right, won't this change still obscure the exit code to either 0 or 1? It would be desirable to return the actual puppet agent code since one might want to see if anything changed during the run. It should be left up to the caller to interpret the exit codes IMO.

@dhs-rec
Copy link
Contributor

dhs-rec commented Apr 2, 2015

Yes. Thought about this, too. However, I wasn't sure whether an exit code of 2 would make the caller think there was an error, so I combined the "good" cases into one exit code (0).

Feel free to change that.

@br0ch0n
Copy link
Contributor Author

br0ch0n commented Apr 7, 2015

Have created #22446 for the above

jfindlay added a commit that referenced this issue Apr 9, 2015
Issue #20850 puppet run should return actual code
@dhs-rec
Copy link
Contributor

dhs-rec commented Apr 10, 2015

@br0ch0n, you're right, what I did was wrong. Wasn't aware that it actually returns a dict which contains the return code in a field "retcode". However, the code as it is now still treats 2 as failure, which is wrong. 2 means "Puppet agent has successfully applied some changes". 0 means "Agent had nothing to do". All other return codes indicate failure.

Have created #22526 which does it better.

@rallytime
Copy link
Contributor

@br0ch0n With the recent work you and @dhs-rec have been doing on this issue, is this bug report considered resolved and ready to be closed?

@br0ch0n
Copy link
Contributor Author

br0ch0n commented Apr 10, 2015

@dhs-rec Thanks, but I don't understand what you mean by the code treating the return value as a failure? Salt isn't doing any interpretation of the return value one way or another. It just sends pid/retcode/stderr/stdout back to the caller, who can interpret it as needed.
Your latest commit has really just reverted my revert, with the end result being that puppet.run still doesn't return the proper code, but just an obscured 0 or 1 again. The actual return code (for example 2) is useful to the caller, since you often wish to know if something changed or not. The whole point of detailed exit codes is that they provide potentially useful information.

@rallytime I don't know that I'd consider this resolved since we don't seem to have an agreement on desired behavior.

@dhs-rec
Copy link
Contributor

dhs-rec commented Apr 13, 2015

@br0ch0n: If I run "salt myminion state.highstate", and one of the states is

puppet:
  module.run:
    - name: puppet.run

with a puppet agent returning 2, I get something like this (relevant output only):

                  retcode:
                      2
 [...]
Summary
------------
Succeeded: 1 (changed=1)
Failed:    1
------------
Total states run:     2

So yes, the returncode IS actually interpreted by salt. And it gets it wrong. Returncode 2 doesn't indicate failure.

Reg my pull request: No, it does not just revert your previous commit. The code is different. It takes into account that cmd.run returns a dict, which my previous commit (the one you reverted) did not.

I agree that the detailed exit codes provide useful information, but they also lead to false negatives. Maybe a proper way to fix this would be to allow cmd.run to accept a list of "good" exit codes, like Puppet's exec resources do. But as long as it doesn't do that, the next best thing IMHO is simply not to fail on those good cases.

@jfindlay
Copy link
Contributor

Return codes are problematic as a nonzero value can vary widely among all the utilities that salt interacts with. There is an open issue, #18510, to address this in a standard way.

@br0ch0n
Copy link
Contributor Author

br0ch0n commented Apr 20, 2015

Sorry for not commenting on this for awhile. Per @dhs-rec 's use case, I agree that it's best to stick to the lowest common denominator and not fail on nonzero. In my use case, i.e. using just the puppet module (salt '*' puppet.run) directly rather than calling the module from a state, no interpretation takes place, but I see that that doesn't cover all cases. So, I guess we should just close this but with the caveat that a better solution should be found once there is progress on #18510 . In the meantime, I'll just have to use cmd.run if I want to get detailed exit info from puppet runs.

@rallytime
Copy link
Contributor

Thanks for the update @br0ch0n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix severity-low 4th level, cosemtic problems, work around exists
Projects
None yet
Development

No branches or pull requests

4 participants