Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'which' probes output should be hidden in Rex::Interface::Exec::Base::can_run etc... #1521

Closed
uralm1 opened this issue Dec 20, 2021 · 19 comments · Fixed by #1522
Closed

'which' probes output should be hidden in Rex::Interface::Exec::Base::can_run etc... #1521

uralm1 opened this issue Dec 20, 2021 · 19 comments · Fixed by #1522
Labels
bug Confirmed bugs

Comments

@uralm1
Copy link
Contributor

uralm1 commented Dec 20, 2021

Describe the bug

How to reproduce it

On pretty old slackware 13.37 system (without dmidecode for example):
#rex -T
which: no dmidecode in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
Tasks
testrun Test run

or (without lsb_release)

#rex -T
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no dmidecode in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
Tasks
testrun Test run

Expected behavior

which file probes should hide its output.
Where: Rex::Interface::Exec::Base::can_run, and (probably) in Rex::Interface::Exec::Local or in i_run() etc...

Circumstances

  • Rex version: 1.13.4
  • Perl version: 5.12.3 oops...but it doesn't matter
  • OS running rex: Slackware64 13.37 running in lxc container on Debian 10 :)
  • OS managed by rex: local
  • How rex was installed: cpan

Debug log

not needed

@uralm1 uralm1 added the triage needed A potential bug that needs to be reproduced and understood label Dec 20, 2021
@uralm1
Copy link
Contributor Author

uralm1 commented Dec 30, 2021

Old which version prints 'no cmd in ...' message to stderr, so proposed fix is:
Rex::Interface::Exec::Base.pm

--- Base.pm.orig	2021-12-30 10:06:17.479354544 +0500
+++ Base.pm	2021-12-30 10:06:55.644286280 +0500
@@ -68,7 +68,7 @@
     }
 
     my @output = Rex::Helper::Run::i_run "$check_with_command $command",
-      fail_ok => 1;
+      fail_ok => 1, stderr_to_stdout => 1;
 
     next if ( $? != 0 );
     next if ( grep { /^no $command in/ } @output ); # for solaris

@uralm1
Copy link
Contributor Author

uralm1 commented Dec 30, 2021

linked with PR #1497, same problem

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 17, 2022

also which command is deprecated now: https://www.mail-archive.com/[email protected]/msg1814961.html

@ferki
Copy link
Member

ferki commented Jan 22, 2022

Thanks for your report, @uralm1! 👍

I agree that it's most possibly related to a specific build or version of the which command, that is not returning the expected exit code and/or message.

I couldn't reproduce it so far, though. Could you share the exact version of the which command that produces this behavior? If you have a link to its source code or repository, that would also be interesting. Alternatively I might be able to run a Slackware 13.37 VM or container if you have a link for a good image.

so proposed fix is:

Thanks for the code suggestion, we might end up with this approach once I can reproduce the problem (and ideally also write tests for it).

linked with PR #1497, same problem

They do seem to be the same type of problem. Not sure if exactly the same, though, but it's good to have them cross-linked for future reference.

also which command is deprecated now: https://www.mail-archive.com/[email protected]/msg1814961.html

Yeah, I've read some of the original discussions at the time about deprecating it in Debian. I'm not sure if replacing which portably for all the currently supported shells (ash, bash, csh, ksh, sh, tcsh, zsh) are easier than fixing this corner case now, though. We'll probably need to address this separately in its own issue sooner or later, though.

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 22, 2022

I found this spam output issue takes place only when CMDB is used. Moreover, Rexfile should use modules=packages.

I created bash script that build docker image and demonstrates this issue:
https://gist.github.com/uralm1/dd66c8816f89e0db2484f3183f08aa3b

Spam output from PR #1497 (from dmidecode) can be reproduced on any modern linux and it also takes place when CMDB activates hardware inventory.

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 22, 2022

About which deprecation: I forget that recommended command -v is bash builtin. Reimplementing which in perl is a variant for future.

@uralm1 uralm1 closed this as completed Jan 22, 2022
@uralm1 uralm1 reopened this Jan 22, 2022
@ferki
Copy link
Member

ferki commented Jan 22, 2022

I created bash script that build docker image and demonstrates this issue: https://gist.github.com/uralm1/dd66c8816f89e0db2484f3183f08aa3b

Thanks! 👍 It seems like these kind of instructions were missing in the original report, so I've wasted time trying to guess the details unsuccessfully :( If you can compile a more minimal version (like a Dockerfile, or just the Rexfile), please keep up posted!

Now I see the "Describe the bug" info ended up under "How to reproduce it", and the reproduction steps themselves are posted now. I've converted the issue templates into forms, so these kind of crucial information is harder to be mixed up or left out in the future.

OS running rex: Slackware64 13.37 running in lxc container on Debian 10 :)
OS managed by rex: local

Please note that managing Slackware is not supported currently. There were two PR attempts to add support: #1387 and #1485, but they did not make their way into extension modules yet by their authors (none of them seemed complete though, and were sadly abandoned).

Also, Slackware 13.37 went end of life on 2018-07-05. I don't think it's worth putting a lot of effort into trying to support an OS version that is unsupported even by their own upstream. In fact, we actively try to avoid it. I believe it's much more responsible to encourage migrating to a supported OS version.

Now that we invested a few hours in it, I'll try to take one more look later at the reproduction through the docker example to learn more. Let's see if this issue here is something we can solve independently of Slackware support or not, and if we are facing a more wide-spread general issue or not.

I found this spam output issue takes place only when CMDB is used.

Oh, that sounds like crucial new information. Querying the CMDB triggers detailed discovery of the remote environment (which is when dmidecode, lsb_release, and similar would be run to determine system properties, which are in turn exposed as built-in CMDB data). It uses can_run and thus which on non-Windows managed endpoints to detect the presence and path of those tools, so that might explain why it only shows up with CMDB.

All my local attempts to reproduce the bug were also using CMDB, though, so it should be somehow specific to the environment you described, not to CMDB directly.

Moreover, Rexfile should use modules=packages.

I don't understand that part. Could you elaborate more, please?

Spam output from PR #1497 (from dmidecode) can be reproduced on any modern linux and it also takes place when CMDB activates hardware inventory.

As I commented there, I can't reproduce that behavior on Gentoo by always using the latest dmidecode. Let's not mix issues just yet, though. If we prove they are duplicates, we will close one of them later.

That PR was also coming out of the blue without describing all the circumstances. If you know more about how to reproduce that too, please consider opening a full bug report. I would love to learn more about these corner cases in order to find an optimal implementation to solve the root cause itself.

Debug log

not needed

There might be additional info in there, though. I might ask for a debug output if I still can't reproduce it based on your examples later.

About which deprecation: I forget that recommended command -v is bash builtin.

Yes, that would work on managed endpoints with bash, but Rex should support many other shells too (ash, csh, ksh, sh, tcsh, zsh). Rex also uses where instead of which on Windows.

Reimplementing which in perl is a variant for future.

That's certainly an option, though I'd avoid it if possible.

It either needs a Perl-core-only one-liner approach to be runnable via i_run, or requires uploading a perl script as a string into a file on the managed endpoint, and running that with i_run. Even then, perl might be only partially available on some managed systems (e.g. Red Hat family OSes doesn't install all core modules...), or might not be available at all.

Therefore I generally would prefer to rely on platform-specific direct solutions whenever possible.

Let's focus on tracking down the root cause of this issue before deciding on how to solve it :)

@ferki
Copy link
Member

ferki commented Jan 22, 2022

Hmm, as noted on the gist, that container image segfaults for me both with podman and docker, so I can't use that example.

I'm trying to download a Slackware 13.37 VM image or ISO, and check with that.

@ferki ferki added bug Confirmed bugs and removed triage needed A potential bug that needs to be reproduced and understood labels Jan 22, 2022
@ferki
Copy link
Member

ferki commented Jan 22, 2022

Soo, in a fresh Slackware 13.37 install, I can reproduce it while using the following Rexfile (triggering CMDB query, without actual CMDB files though, but for hardware discovery that should be irrelevant):

use Rex -feature => ["1.4"];
use Rex::CMDB;
set cmdb => { type => "YAML", path => ["cmdb/config.yml"] };
user get cmdb("user");
task tt => sub { say "tt" };

package Mod2;
use Rex -feature => ["1.4"];
task tt2 => sub { say "tt2" };

Then rex -T prints:

# rex -T
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
which: no lsb_release in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin)
Tasks
 tt

 Mod2:tt2

Having package Mod2; in the same Rexfile seems irrelevant, I can reproduce without that too.

Now that I understand the issue better, I can also reproduce it locally.

It seems Rex::Interface::Exec::IOReader->io_read() is being unable to drain stderr properly for a local command execution via open3. I have to think more about how to test the behavior better 🤔 I might also need to open a separate bug about that.

Adding stdout_to_stderr => 1 could work, and I believe that would be required even if io_read would work with stderr correctly. Another workaround could be setting Rex::Config->set_no_tty(0) (which is effectively also redirecting stderr to stdout in this case, as well as avoiding the io_read code path).

edit: we could also consider setting set_no_tty(0) for i_run calls 🤔 Not sure about all the implications yet though, we need clear test cases first.

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 22, 2022

Moreover, Rexfile should use modules=packages.
I don't understand that part. Could you elaborate more, please?

All are tested with rex -dT.
This Rexfile does NOT print which errors (it calls which in can_run() with errors, but this errors are hidden):

use Rex -feature=>['1.4'];
use Rex::CMDB;
set cmdb => {type=>"YAML", path=>["cmdb/config.yml"]};
user get cmdb("user");
task tt => sub {say "tt"};

But this prints:

use Rex -feature=>['1.4'];
use Rex::CMDB;
set cmdb => {type=>"YAML", path=>["cmdb/config.yml"]};
user get cmdb("user");
task tt => sub {say "tt"};
package Mod2;
use Rex -feature=>["1.4"];
task tt2 => sub {say "tt2"};

This prints too (+Mod2.pm in external package file):

use Rex -feature=>['1.4'];
use Rex::CMDB;
require Mod2;
set cmdb => {type=>"YAML", path=>["cmdb/config.yml"]};
user get cmdb("user");
task tt => sub {say "tt"};

Don't understand why yet.

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 22, 2022

Hmm, yes, with probably broken stderr reader we can face with different variants of this bug...

@ferki
Copy link
Member

ferki commented Jan 22, 2022

I tried to track it down to differences in perl versions and/or perl build with threads, etc., but it seems the behavior depends on where get cmdb (and thus i_run and io_read) is called from:

  • outside a task (in the Rexfile directly): stderr is not read properly (and printed to the screen) ❌
  • inside a task: stderr is read properly ✔️

I'm not sure how much sense it makes to call CMDB from outside a task (since the CMDB is intended to hold data for the managed machine, and anything outside a task is not in management context).

But it would still be nice track down this weird behavior, I feel like the stderr should be properly drained outside tasks too.

@ferki
Copy link
Member

ferki commented Jan 22, 2022

I believe the problem with get cmdb being outside a task is that it gets evaluated during compilation time, when rex e.g. checks if all the code from Rexfile and included module code are valid. Many of the internal structures might just not be there properly at that time yet (we are not running yet).

@ferki
Copy link
Member

ferki commented Jan 22, 2022

So, behind the scenes Rex::CLI does a few hoops to load the Rexfile properly, do some sanity check, and make the Rexfile available in the same way as if it was a module.

Part of that is to temporarily hijack STDERR into an in-memory file in order to hide any error output coming from perl itself (which was considered ugly by various folks over time, and it's friendlier to report any such output in a more controlled manner). If we comment the lines of the hijacking out, the stderr output from the early which call is not printed on the screen (and io_read drains it correctly). That explains the behavior we are seeing here.

I'm not yet sure yet how it would be best to fix:

  • Currently I'm leaning towards banning get cmdb (and probably other) calls outside tasks by printing a warning and documenting the intended usage more closely, because now it gets called quite unexpectedly early.
  • We might not need all the bits of this loading machinery anymore (some of them were there to support perl-5.8 at the time).
  • There might be another way to go about this and both collect errors in a controlled way and let stderr still work in other parts.

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 23, 2022

That's it!
My resume: cmdb() -> get_operating_system() -> can_run() -> which stderr output doesn't work properly when called from early eval { Rexfile } having stderr hijacked.

Currently I'm leaning towards banning get cmdb (and probably other) calls outside tasks by printing a warning and documenting the intended usage more closely, because now it gets called quite unexpectedly early.

Not a good solution. get cmdb mostly work. There were no other bugreports that something else is broken outside tasks. This stops an opportunity of using CMDB as a simple config file for keeping passwords, keys and usernames - needed when main Rexfile is placed to public git.

Let's keep this issue open for now.

@ferki
Copy link
Member

ferki commented Jan 23, 2022

Currently I'm leaning towards banning get cmdb (and probably other) calls outside tasks by printing a warning and documenting the intended usage more closely, because now it gets called quite unexpectedly early.

Not a good solution. get cmdb mostly work. There were no other bugreports that something else is broken outside tasks.

This stops an opportunity of using CMDB as a simple config file for keeping passwords, keys and usernames - needed when main Rexfile is placed to public git.

Yeah, I guess "banning" was a too strong word here. I still don't think this fits the intended usage of the CMDB, though.

The CMDB is intended to hold data about the configuration that has to be enforced on the managed targets (=Configuration Management DataBase), but not intended to configure Rex itself. The CMDB contains auto-discovered bits (amount of RAM, networking data, etc.), so it normally can only be considered complete after a connection is already made to the managed target (usually from a task, and I'd say definitely not during initialization).

It's good news that it might be close to work during init, and it may even be considered working "good enough" for somc use cases outside tasks (like in your case getting the user). I wouldn't consider it completely functional for its intended purpose, though. Therefore we should either make it fully support these use cases as well, or at least put up warnings to at least raise awareness and prevent other lengthy debugging next time. Something along the lines of "hey, using CMDB outside tasks is unsupported, so all bets are off, but I think I can keep running".

To configure Rex, or more precisely, authentication info (like passwords, keys, usernames), there are multiple possibilities (in order of my preference):

  • ~/.ssh/config (my view: auth details belong primarily to SSH instead of Rex; historically, rex does partial parsing of the ssh config in order to support passing some options to the Net::SSH2 backend module (based on libssh2); there's ongoing work to make the ssh config completely pass-through for Net::OpenSSH backend, which supports all options out of the box, since it just uses the ssh binary under the hood)
  • ~/.rex/config.yml (hmm, interestingly not much documentation about it yet, here's an example though)
  • auth commands (good for getting auth data from external sources programmatically, including reading it from arbitrary custom sources like files or API queries, etc.)
  • the INI inventory format via Rex::Group::Lookup::INI

Let's keep this issue open for now.

Oh, definitely! We need to address what we uncovered one way or another, I'm just not sure what exactly to change yet :)

@ferki
Copy link
Member

ferki commented Jan 23, 2022

My current idea is to use capture_stderr or tee_stderr from Capture::Tiny instead of the current local "hijacking" in Rex::CLI. I'm pretty sure that module will come in handy elsewhere in the codebase as well 🚀

My first local evaluation indicate it would work as expected (let's both load_rexfile and io_read do its thing correctly while avoiding all the hoops and loops around manual STDERR handling). It's also good sign that all the tests seem to pass with the Capture::Tiny approach, though I'd really prefer adding some new tests for these cases for various sample Rexfiles:

  • with/without compile errors
  • with/without printing to STDERR outside tasks
  • with/without calling (something like) get cmdb outside tasks (which does not directly write to STDERR, but its calls try to catch STDERR traffic during require time)

These can probably powered by Capture::Tiny or Test::Output.

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 24, 2022

Yes I'm going to create some prototypes, but not so fast... offtopic: I used Capture::Tiny in the past tried to create web interface to run Rex tasks. Ended installing Rundeck, created a rex plugin for Rundeck and share rex tasks via rundeck.
Thank you @ferki for in-depth paticipation!

@uralm1
Copy link
Contributor Author

uralm1 commented Jan 24, 2022

@ferki look at compile2.pl !
https://gist.github.com/uralm1/7a605925ca27b98848b43c232251aede

Can we use $SIG{__WARN__} technic? What about compatibility?

compile1.pl is a prototype of the original method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
2 participants