Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[networking] plugin can load 7 kernel modules #1435

Closed
pmoravec opened this issue Sep 17, 2018 · 25 comments
Closed

[networking] plugin can load 7 kernel modules #1435

pmoravec opened this issue Sep 17, 2018 · 25 comments

Comments

@pmoravec
Copy link
Contributor

See reproducer:

rmmod tcp_diag udp_diag inet_diag macsec unix_diag af_packet_diag netlink_diag
sosreport -o networking --batch --build
lsmod > lsmod.after
diff lsmod.before lsmod.after

returns:

1a2,8
> macsec                 35971  0 
> tcp_diag               12591  0 
> udp_diag               12801  0 
> inet_diag              18949  2 tcp_diag,udp_diag
> unix_diag              12601  0 
> af_packet_diag         12611  0 
> netlink_diag           12669  0 

macsec is loaded due to collecting ​ip -s macsec show command output, the remaining 6 ones due to ss -peaonmi. While ip -s macsec show can be easily put under a condition "if macsec module is loaded", ss is bit more tricky, since running naked ss can load up to 4 modules ([tcp|udp|inet|unix]_diag).

@pmoravec
Copy link
Contributor Author

Possible approaches to fix:

  • execute ss only if the 6 modules are loaded (optionally call ss only when the 4 but not all 6 modules are loaded only)
  • execute ss either way and unload extra loaded modules afterwards

I tried writing patch for 2nd option (pep8-noncompliant) but it still shows the very first extra-loaded module as loaded:

--- a/sos/plugins/networking.py	2018-09-14 08:35:38.478475452 +0200
+++ b/sos/plugins/networking.py	2018-09-17 22:38:17.762792066 +0200
@@ -172,7 +172,6 @@ class Networking(Plugin):
         self.add_cmd_output([
             "netstat -s",
             "netstat %s -agn" % self.ns_wide,
-            "ss -peaonmi",
             "ip route show table all",
             "ip -6 route show table all",
             "ip -4 rule",
@@ -186,16 +185,26 @@ class Networking(Plugin):
             "ip neigh show nud noarp",
             "biosdevname -d",
             "tc -s qdisc show",
-            "ip -s macsec show",
         ])
 
+        # "ss -peaonmi" can load several kernel modules - if we do so, unload
+        # them afterwards
+        loaded_kernel_modules = set(e.split()[0] for e in open("/proc/modules").read().splitlines())
+        self.get_cmd_output_now("ss -peaonmi")
+        loaded_kernel_modules_2 = set(e.split()[0] for e in open("/proc/modules").read().splitlines())
+	diff = loaded_kernel_modules_2 - loaded_kernel_modules
+        if diff is not None:
+            self.call_ext_prog("rmmod %s" %(' '.join(diff)))
+
         # When iptables is called it will load the modules
         # iptables and iptables_filter if they are not loaded.
-        # The same goes for ipv6.
-        if self.check_ext_prog("grep -q iptable_filter /proc/modules"):
+        # The same goes for ipv6 and "ip -s macsec show".
+        if "iptable_filter" in loaded_kernel_modules:
             self.add_cmd_output("iptables -vnxL")
-        if self.check_ext_prog("grep -q ip6table_filter /proc/modules"):
+        if "ip6table_filter" in loaded_kernel_modules:
             self.add_cmd_output("ip6tables -vnxL")
+        if "macsec" in loaded_kernel_modules:
+            self.add_cmd_output("ip -s macsec show")
 
         # Get ethtool output for every device that does not exist in a
         # namespace.

Running that:

# rmmod tcp_diag udp_diag inet_diag macsec unix_diag af_packet_diag netlink_diag; lsmod > lsmod.before; ./sos-master/sosreport --batch --build -o networking > /dev/null 2>&1; lsmod > lsmod.after; diff lsmod.before lsmod.after
1a2
> inet_diag              18949  0 
#

@bmr-cymru
Copy link
Member

Absolutely no way on earth.

You cannot ever do this correctly and safely, since the kernel module infrastructure is shared system wide and has no locking mechanism for us to exclude other users while we do our work. You cannot ever know that between your 1st list and 2nd list that the admin did not change the set of loaded modules. That means you may randomly try to unload modules that the admin just deliberately loaded.

This has to be handled by detecting when something would be changed, and not doing that thing.

@pmoravec
Copy link
Contributor Author

Right, havent realized that. So we have to:

  • detect what kernel modules are loaded (Plugin class to have API for kernel_mods?)
  • only if the 6 *_diagmodules are loaded, collect the ss command
  • only if macsecloaded, collect ip -s macsec show
  • replace the "grep -q ip[|6]table_filter test by equivalent test like above

@bmr-cymru
Copy link
Member

bmr-cymru commented Sep 18, 2018

Yeah, I think that's the way. We already have Plugin.is_module_loaded(), and we have the Plugin.kernel_mods member that's initialised for the plugin triggers checks (historically I kept is_module_loaded() doing a scan of /proc/modules, so that it is always up-to-date, but maybe that can be optimised).

I was talking on IRC about how this might be a good use of the "predicates" idea. We can of course do it with branches, but I envisaged something like this:

ss_pred = SoSPredicate(kmods=["tcp_diag", "udp_diag", "ip_diag", "unix_diag", "netlink_diag"])
self.add_cmd_output("ss -peaonmi", pred=ss_pred)

Then the command is collected only if ss_pred.eval() returns True.

And yes - those grep -q should have already been changed to is_module_loaded() - I missed it.

@bmr-cymru
Copy link
Member

bmr-cymru commented Sep 18, 2018

I thought we could also make it more fine-grained, but it looks like just running ss with no arguments is enough to load most of the modules (although it would be more complicated to put it in a single command call. I can't think of a neat way to do that yet).

If we could split it on options we could do something like:

ss_opts = SoSOptionArgs(kmods={"tcp_diag":"-t", "udp_diag": "-u"})
self.get_cmd_output("ss -peaonmi", opt_args=ss_opts)

But that's not going to work with the way ss actually behaves (good thing maybe, it looks complicated :) ).

@pmoravec
Copy link
Contributor Author

I like the idea of SoSPredicate but what predicate types could be there? As looking to plugins calling add_copy_spec or add_cmd_output under some condition, I see mainly if self.get_option("some-option"). Replacing this simple check by traversal of a singleton list of options to compare sounds to me bit overkill - or the SoSPredicate feature wont be much utilized (now).

I am in favour of:

  • collecting loaded kernel modules once at the start (and storing them in Policy class - similarly like we store package list within PackageManager) - just for the sake of optimization of repeated traversing the /proc/modules directory for each and every is_module_loaded call
  • updating is_module_loaded method accordingly
  • calling either ss or with options depending what of the 6 modules are loaded
  • (and using the is_module_loaded for iptables and macsec stuff)

@bmr-cymru
Copy link
Member

collecting loaded kernel modules once at the start (and storing them in Policy class - similarly like we store package list within PackageManager)

We already collect and store that in Plugin (using the lsmod() hook from the policy). What's the benefit of doing it in the policy class directly?

updating is_module_loaded method accordingly

Has the drawback that we will miss modules that were loaded after the list was initialised. That's the reason the method does not currently use it and always looks at the live /proc/modules.

I like the idea of SoSPredicate but what predicate types could be there?

Whatever we find need for: it's a general boolean object that evaluates True or False according to how it is initialised. Plugin options, kernel modules, packages, executable command exit status etc. are all possible predicates.

You are right that if this was just about module options then it would be overkill, but there are other benefits to using a predicate that are not so obvious, but are potentially very powerful.

Right now there is no way to answer the "what would sos do?" question other than to run it on a specific system and find out. With predicates we can force these globally, for one, some or all predicate options, and see exactly what sos would do in those circumstances. I see this as useful for testing, bug reproduction, and auditing, as well as finally implementing a long-requested --dry-run option.

@ptalbert
Copy link

ptalbert commented Apr 4, 2019

Current upstream kernels will only load a protocol diag module if the underlying protocol module is loaded. As basic protocols like TCP, UDP, INET, netlink, unix, etc, are typically compiled in, then their diag modules are almost certainly always going to be loaded when 'ss' is invoked. But for example, you won't find sctp_diag loaded any more unless 'sctp' was already loaded.

I appreciate the concern for leaving the system in the same state after running sosreport but with the proposed changes we'll almost never get this valuable 'ss' output unless some one happens to have ran ss on the system some time in the past.

Also, the ipvs plugin runs ipvsadm commands which will trigger the loading of the ip_vs module if it is not already there. That could be fixed in the same way as the ip macsec case.

@bmr-cymru
Copy link
Member

I appreciate the concern for leaving the system in the same state after running sosreport but with the proposed changes we'll almost never get this valuable 'ss' output unless some one happens to have ran ss on the system some time in the past.

This isn't our thing. It's a requirement from users (customers), and generally involves claims about auditing etc. We have to be able to leave the system in the same state that we found it - even if that means losing this data.

If it is so vital, then support associates should be explaining the need - I'd be fine adding an option to override this on request, but a default run for something unrelated should not load these modules.

@ptalbert
Copy link

ptalbert commented Apr 4, 2019

I appreciate the concern for leaving the system in the same state after running sosreport but with the proposed changes we'll almost never get this valuable 'ss' output unless some one happens to have ran ss on the system some time in the past.

This isn't our thing. It's a requirement from users (customers), and generally involves claims about auditing etc. We have to be able to leave the system in the same state that we found it - even if that means losing this data.

Sorry, I wasn't arguing to ignore it; just lamenting the loss.

If the modules are not seen to be loaded when the plugin runs, it theoretically would be safe to unload them after the fact as the kernel will auto-load them whenever an associated netlink request comes through from userspace. There is nothing stateful about the diag modules so even in the rare case some user (a regular user can run ss and load them) used them while the sosreport network plugin was running it should not matter if the plugin unloads them a moment later. But, as much as I'd like to have the 'ss' output in all cases, this solution seems ugly to me and I'm not going to advocate for it.

If it is so vital, then support associates should be explaining the need - I'd be fine adding an option to override this on request, but a default run for something unrelated should not load these modules.

An option like this would definitely be appreciated.

@bmr-cymru
Copy link
Member

Sorry, I wasn't arguing to ignore it; just lamenting the loss.

Yeah, it's an unfortunate tension that we're always forced to find a balance for. Sometimes these things go away naturally, as more modules become built-in, but I think we'll always have corner cases to handle.

it theoretically would be safe to unload them after the fact as the kernel will auto-load them

I don't much like these solutions - it's inherently racey, and a bit "dodgy", even if the only side effect is ping-ponging modules in and out of the kernel (or trying to).

An option like this would definitely be appreciated.

OK. That's definitely something we can do. In the meantime, I think it would be worth running this past our support colleagues again for an opinion. If there is a consensus that it's OK to have a way to prevent these commands running if not desired, but to still have them on by default, then I'd be OK with that and it's a bit closer to the ideal you'd like.

@superjamie
Copy link
Contributor

If it is so vital, then support associates should be explaining the need - I'd be fine adding an option to override this on request, but a default run for something unrelated should not load these modules.

Do the inverse. Run the audit-safe sos on request.

If there is a consensus that it's OK to have a way to prevent these commands running if not desired, but to still have them on by default

Yes, this.

The majority of users are not asking for this feature. The majority of users do not audit their systems to the point they worry about which (safe, signed) kernel modules are loaded before and after system-provided troubleshooting commands are run.

However, here's a likely usage pattern:

  • Support engineer asks user to run a sosreport without the proposed "override" option.
  • The support issue progresses and requires ss output.
  • Support engineer now asks for a second sosreport with the "override" option.

The majority of users will definitely be frustrated with being asked to provide "the same thing" twice.

Write to the majority usecase. Detailed audit users can implement an organisation policy to "always run sosreport --audit-safe" or whatever you make the option. Leave the useful data collection in for most users, and reduce the frustrating back-and-forth and repetition of work out for them too.

@superjamie
Copy link
Contributor

Detailed audit users can implement an organisation policy to "always run sosreport --audit-safe" or whatever you make the option.

We had one suggestion that this could also be implented as a sos.conf option, so organisations can apply this policy with configuration management where required. That seems a really good idea.

@bmr-cymru
Copy link
Member

Do the inverse. Run the audit-safe sos on request.

Is literally what I suggested in the following quoted paragraph :-D

To be completely clear, what I am saying is: I am absolutely fine making this optional/non-default. No further discussion needed. I am also open to the idea of making it default/option-to-disable iff there is a consensus among our users (and especially support associates at the big commercial distros), and iff it passes all necessary reviews (including legal).

Write to the majority usecase

Once again, we are good, but we are not psychic. We find these things out by asking, or when people come to us and tell us. We also generally work to the princopal that support data collection should not modify the system. There is a long, long catalog of bugs (some with very angry users behind them), complaining about this type of behaviour. We have historically optimised for their wishes because they have been the vocal group, and it fits with our existing aims.

If that's wrong, then we can work to change it, but we don't do so on a whim.

This also isn't as simple and clear cut as just changing the default and walking away - we have for a very long time communicated (in documentation and our disclaimer text) that we will not modify system configuration. Making this a default and accepting that we will is a break from that. Perhaps it is as simple as inserting "persistent" into that phrase, or some similar tweak; we would need to go back to the legal team to understand the significance.

sosreport --audit-safe

As an aside, I don't like this option name or what it implies. I think for the time being (until/unless we make some broader change in our disclaimer and what we communicate) that any option like this should be specific and targeted to one instance or class of data collection.

@bmr-cymru
Copy link
Member

bmr-cymru commented Apr 5, 2019

We had one suggestion that this could also be implented as a sos.conf option, so organisations can apply this policy with configuration management where required. That seems a really good idea.

We always try to map every sosreport command line option to an sos.conf configuration item. This is now much stronger since Pavel's recent commit:

commit aeade1c831eb700c7fab3ae5aa3e1d6ef9f0c483
Author: Pavel Moravec <[email protected]>
Date:   Mon Jan 7 13:20:04 2019 +0100

    [sosreport] support all command line options in /etc/sos.conf
    
    - SoSOptions class changes:
      - from_file method to load load options from file
      - encapsulation of particular options into __init__ method
      - updated merge logic to replace non-default values only
    
    - sosreport.py changes:
      - _parse_args (renamed to _get_parser) newly returns just the
        ArgumentParser object as parsing itself is done elsewhere
      - SoSOptions are built in ordering:
        - defaults from parser (with emtpy cmdline) loaded
        - cmdline options replace defaults
        - config file known even now
        - config.file options replace defaults until already set by
          cmdline (i.e. until SoSOptions marked them as nondefault)
        - presets can be known even now
        - options from given preset replace defaults, until the options
          are already updated from cmdline or config.file
      - some extra logging/formatting added
      - some extra comments added
    
    Fixes: #855
    Resolves: #1530
    
    Signed-off-by: Pavel Moravec <[email protected]>
    Signed-off-by: Bryn M. Reeves <[email protected]>

Also note that from 3.6 onwards you have access to the preset facility - this means that developers, distribution maintainers, and users can define named combinations of sos command line options to be recalled for later use. This is more flexible than the configuration file, since multiple presets can co-exist (and be combined with one another, or with manually entered command line options), and also allows for distributors to use their product specific knowledge to come up with combinations that are useful for particular products or scenarios (for e.g. the cantboot preset for collecting boot debugging information).

@bmr-cymru
Copy link
Member

with configuration management

Presets may also be subject to configuration management by dropping JSON formatted files into /var/lib/sos/presets, ex:

{
    "rhatomic": {
        "desc": "Red Hat Atomic Host",
        "note": "This preset may significantly increase run time.",
        "args": {
            "all_logs": true,
            "verify": true
        }
    }
}

@superjamie
Copy link
Contributor

I am absolutely fine making this optional/non-default. No further discussion needed. I am also open to the idea of making it default/option-to-disable iff there is a consensus among our users (and especially support associates at the big commercial distros), and iff it passes all necessary reviews (including legal).

Cool, all is well :)

As an aside, I don't like this option name or what it implies.

Nor do I, it was just an example placeholder.

@pmoravec
Copy link
Contributor Author

Please correct me if my understanding isn't right:

  • current plan will be to:
    • as a default behaviour, check for loaded kernel modules and skip collecting ip -s macsec and/or ss -peaonmi in case the relevant kernel modules are not loaded
    • there will be an option like --allow-kmod-load that will force collecting the commands in either case - but the default will be --allow-kmod-load=false
    • I can implement the change within this issue
  • explore "opposite default" option (i.e. collect all commands every time even with the side effect of potential kernel module load, and have an option --no-kmod-load to skip collecting commands that can load the modules)

Just a question to @superjamie , @ptalbert , @battlemidget or any other kernel/networking domain expert: We work here with an assumption that if macsec kmod isn't loaded, then execution of ip -s macsec command will load it (and the same for the ss command and the 4 [tcp\|udp\|inet\|unix]_diag modules). Does this assumption hold on all distros / platforms? Isn't there a use case where, say, ss command does not require unix_diag kernel module? (I just want to prevent false detection when to run / dont run those commands)

pmoravec added a commit to pmoravec/sos that referenced this issue Jun 11, 2019
Running some commands can load kernel modules what disqualifies
them from being called by default.

Add an option that overrides this default behaviour.

Related to: sosreport#1435

Signed-off-by: Pavel Moravec <[email protected]>
pmoravec added a commit to pmoravec/sos that referenced this issue Jun 11, 2019
…oaded

- "ip -s macsec show" requires "macsec" kmod loaded
- "ss -peaonmi" requires 6 *_diag kernel modules

Execute the commands only when the modules are loaded, or when explicitly
requested via --allow-kmod-load option.

Resolves: sosreport#1435

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec
Copy link
Contributor Author

I decided to add explicit warning to the two commands if sosreport does not collect them, like:

 Setting up plugins ...
[plugin:networking] skipped command ip -s macsec show as it requires kernel module 'macsecs' that is unloaded; use --allow-kmod-load to collect it
[plugin:networking] skipped command ss -peaonmi as it requires some *_diag kernel module that is unloaded; use --allow-kmod-load to collect it
 Running plugins. Please wait ...

to warn user we intentionally skip some commands (and to potentially let them re-run sosreport with the option enabled). Predicates can log that as well but on lower verbosity level only (let shout out if you think that is sufficient).

@superjamie
Copy link
Contributor

I would prefer if --allow-kmod-load=true was the default. Not collecting these commands makes a default sosreport useless for us in many situations. Very strong dislike.

I haven't looked into situations where the diag modules load, but I also assume it's when the related protocol is queried. For example, if ss -S is used to query SCTP, then I expect sctp_diag would be loaded to service that request.

@bmr-cymru
Copy link
Member

I would prefer if --allow-kmod-load=true was the default. Not collecting these commands makes a default sosreport useless for us in many situations. Very strong dislike.

As I keep explaining, this runs completely counter to what we have told our users for many years, as well as being a contradiction of our current disclaimer text. If you want to take on board changing that, I'd have no objection - but a side comment in a general PR is not the place for it. Start with an issue, then explore the legal issues, draft a new disclaimer, get it signed off etc. - the technical matters are not the big issue here. In the mean time I suggest your group communicate your needs to your support technicians and users more clearly up-front so that you can avoid receiving useless reports.

To re-iterate, I have no problem reconsidering this long standing behaviour if that is truly desired, but this is not something we're going to change on a whim and without the proper ground work.

@pmoravec
Copy link
Contributor Author

The warning:

[plugin:networking] skipped command ip -s macsec show as it requires kernel module 'macsecs' that is unloaded; use --allow-kmod-load to collect it
[plugin:networking] skipped command ss -peaonmi as it requires some *_diag kernel module that is unloaded; use --allow-kmod-load to collect it

makes Travis tests to fail. Since _log_warn (the best way to print something to terminal) prints to stderr due to

sos/sos/sosreport.py

Lines 455 to 466 in d0dcb39

if not self.opts.quiet:
console = logging.StreamHandler(sys.stderr)
console.setFormatter(logging.Formatter('%(message)s'))
if self.opts.verbosity and self.opts.verbosity > 1:
console.setLevel(logging.DEBUG)
flog.setLevel(logging.DEBUG)
elif self.opts.verbosity and self.opts.verbosity > 0:
console.setLevel(logging.INFO)
flog.setLevel(logging.DEBUG)
else:
console.setLevel(logging.WARNING)
self.soslog.addHandler(console)
and at

sos/.travis.yml

Lines 29 to 30 in 6581488

- "sudo ~/virtualenv/python$TRAVIS_PYTHON_VERSION/bin/python ./sosreport --batch --config-file=sos.conf 2> errors | tee batch_output"
- "[[ ! -s errors ]]"
we check whether nothing is printed to stderr.

@bmr-cymru and @TurboTurtle , isn't it worth to log warnings to stdout while errors and more severe logs to stderr?

@TurboTurtle
Copy link
Member

I'd agree - warn level messages should go to stdout, not stderr.

pmoravec added a commit to pmoravec/sos that referenced this issue Jun 12, 2019
…oaded

- "ip -s macsec show" requires "macsec" kmod loaded
- "ss -peaonmi" requires 6 *_diag kernel modules

Execute the commands only when the modules are loaded, or when explicitly
requested via --allow-kmod-load option.

Resolves: sosreport#1435

Signed-off-by: Pavel Moravec <[email protected]>
@TurboTurtle
Copy link
Member

Even though I personally don't like the idea of changing the longstanding behavior of not changing the system at all, ever - I think we should perhaps change --allow-kmod-load to something akin to --allow-system-changes so it's super obvious what's going on, and we can gate all the bits that could potentially alter a system behind it which would probably check (most of) the legal boxes for us.

@pmoravec
Copy link
Contributor Author

+1 for the option (re)name, will respin the PR now.

About the users' preferences of sosreport's default behaviour: this must have been driven or requested by the community of users (to outweight the current undertaking "sosreport does not alter systems it runs on"). I raised this topic internally within Red Hat support, and majority responses agree with following this paradigm. This is not the voice of whole sosreport community, for sure. But it is the only feedback we have.

pmoravec added a commit to pmoravec/sos that referenced this issue Jun 29, 2019
Running some commands can change the system e.g. by loading a kernel
modules. That disqualifies the commands from being called by default.

Add an option that overrides this default behaviour.

Related to: sosreport#1435

Signed-off-by: Pavel Moravec <[email protected]>
pmoravec added a commit to pmoravec/sos that referenced this issue Jun 29, 2019
…oaded

- "ip -s macsec show" requires "macsec" kmod loaded
- "ss -peaonmi" requires 6 *_diag kernel modules

Execute the commands only when the modules are loaded, or when explicitly
requested via --allow-system-changes option.

Resolves: sosreport#1435

Signed-off-by: Pavel Moravec <[email protected]>
bmr-cymru pushed a commit that referenced this issue Aug 16, 2019
Running some commands can change the system e.g. by loading a kernel
modules. That disqualifies the commands from being called by default.

Add an option that overrides this default behaviour.

Related to: #1435

Signed-off-by: Pavel Moravec <[email protected]>
pmoravec added a commit to pmoravec/sos that referenced this issue Nov 5, 2019
In sosreport#1435, --allow-system-changes option was added that is documented
in sosreport --help but not in manpages.

Resolves: sosreport#1850

Signed-off-by: Pavel Moravec <[email protected]>
BryanQuigley pushed a commit that referenced this issue Nov 12, 2019
In #1435, --allow-system-changes option was added that is documented
in sosreport --help but not in manpages.

Resolves: #1850

Signed-off-by: Pavel Moravec <[email protected]>
Signed-off-by: Bryan Quigley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants