Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wireless does not automatically reconnect on resume on R4-rc1 #3151

Closed
jpouellet opened this issue Oct 6, 2017 · 4 comments
Closed

wireless does not automatically reconnect on resume on R4-rc1 #3151

jpouellet opened this issue Oct 6, 2017 · 4 comments
Assignees
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Milestone

Comments

@jpouellet
Copy link
Contributor

observed behavior: NetworkManager is not running in sys-net on resume, and blacklisted drivers (/rw/config/suspend-module-blacklist) need to be manually reloaded

A reliable work-around is manually starting NetworkManager and reloading the blacklisted drivers, but that got annoying pretty quickly, so down the rabbit hole we go...

Sorry the following source references are from an installed machine instead of the repos, but I don't have the source code downloaded on this machine and didn't have internet access while debugging this.

in sys-net journal:

Oct 04 14:17:25 sys-net qubes.SuspendPre-dom0[2456]: Error org.freedesktop.DBus.Error.AccessDenied: Rejected send message, 4 matched rules; type="method_call", sender=":1.31" (uid=1000 pid=2460 comm="dbus-send 
Oct 04 14:17:25 sys-net qubes.SuspendPre-dom0[2456]: Redirecting to /bin/systemctl stop  NetworkManager.service
Oct 04 14:17:25 sys-net qubes.SuspendPre-dom0[2456]: RTNETLINK answers: Operation not permitted
Oct 04 14:17:25 sys-net qubes.SuspendPre-dom0[2456]: modprobe: ERROR: could not remove 'iwlmvm': Operation not permitted
Oct 04 14:17:25 sys-net qubes.SuspendPre-dom0[2456]: modprobe: FATAL: Module iwlwifi is in use.

NetworkManager stopped in sys-net by /usr/lib/qubes/prepare-suspend:

#!/bin/bash
[...]
if [ x"$action" = x"suspend" ]; then
    if qsvc network-manager ; then
        dbus-send --system --print-reply          \
            --dest=org.freedesktop.NetworkManager \
            /org/freedesktop/NetworkManager       \
            org.freedesktop.NetworkManager.Sleep boolean:true ||  \
            service NetworkManager stop
    fi
[...]
else
[...]
    if qsvc network-manager ; then
        dbus-send --system --print-reply          \
            --dest=org.freedesktop.NetworkManager \
            /org/freedesktop/NetworkManager       \
            org.freedesktop.NetworkManager.Sleep boolean:false ||  \
            { under_systemd && systemctl start NetworkManager.service; } || service qubes-core-netvm start
    fi
fi

instrumenting that script, it indeed gets called with $1=suspend, but not called on resume.

Also, it gets invoked with uid=1000, which explains the

RTNETLINK answers: Operation not permitted
modprobe: ERROR: could not remove 'iwlmvm': Operation not permitted

in the logs above.

running that script with arg "resume" manually starts NetworkManager, but does not bring up my interface (wlp0s0). reloading the modules afterwards makes it work again

So there appear to be two problems:

  1. that script not getting called on resume
  2. that script not being called with permissions to do what it needs to do

I wish I had a R3.2 box on hand to trace how this used to work

side note: the service stuff appears outdated, and qubes-core-netvm doesn't appear to exist anymore. can it be removed and simplified to just systemctl?

[user@dom0 ~]$ systemctl status qubes-suspend
● qubes-suspend.service - Qubes suspend hooks
   Loaded: loaded (/usr/lib/systemd/system/qubes-suspend.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Wed 2017-10-04 17:02:28 EDT; 27min ago
  Process: 7851 ExecStop=/usr/lib64/pm-utils/sleep.d/52qubes-pause-vms resume suspend (code=killed, signal=TERM)
  Process: 7823 ExecStart=/usr/lib64/pm-utils/sleep.d/52qubes-pause-vms suspend suspend (code=exited, status=0/SUCCESS)
 Main PID: 7823 (code=exited, status=0/SUCCESS)

Oct 04 14:17:24 dom0 systemd[1]: Starting Qubes suspend hooks...
Oct 04 14:17:25 dom0 52qubes-pause-vms[7823]: 0
Oct 04 14:17:25 dom0 systemd[1]: Started Qubes suspend hooks.
Oct 04 17:00:57 dom0 systemd[1]: qubes-suspend.service: Unit not needed anymore. Stopping.
Oct 04 17:00:57 dom0 systemd[1]: Stopping Qubes suspend hooks...
Oct 04 17:02:28 dom0 systemd[1]: qubes-suspend.service: Stopping timed out. Terminating.
Oct 04 17:02:28 dom0 systemd[1]: Stopped Qubes suspend hooks.
Oct 04 17:02:28 dom0 systemd[1]: qubes-suspend.service: Unit entered failed state.
Oct 04 17:02:28 dom0 systemd[1]: qubes-suspend.service: Failed with result 'timeout'.

systemd suspend hook in dom0 (sleep.target.wants -> qubes-suspend.service, Before=sleep.target)

[Unit]
Description=Qubes suspend hooks
Before=sleep.target
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes
StandardOutput=syslog
ExecStart=/usr/lib64/pm-utils/sleep.d/52qubes-pause-vms suspend suspend
ExecStop=/usr/lib64/pm-utils/sleep.d/52qubes-pause-vms resume suspend

[Install]
WantedBy=sleep.target

side note: the above invokes 52qubes-pause-vms with two arguments, when it only takes one:

dom0's /usr/lib64/pm-utils/sleep.d/52qubes-pause-vms:

#!/bin/sh

case "$1" in
    suspend|hibernate)
        qubesd-query -e --fail -c /var/run/qubesd.internal.sock \
            dom0 internal.SuspendPre dom0
        ;;
    resume|thaw)
        qubesd-query -e --fail -c /var/run/qubesd.internal.sock \
            dom0 internal.SuspendPost dom0
        ;;
esac

side note: there are no /etc/qubes-rpc/{,policy/}internal.*, I wonder if this may have hidden side-effects

in dom0's /usr/lib/python3.5/site-packages/qubes/api/internal.py:

    @qubes.api.method('internal.SuspendPre', no_payload=True)
    @asyncio.coroutine
    def suspend_pre(self):
        '''
        Method called before host system goes to sleep.

        :return:
        '''

        # first notify all VMs
        processes = []
        for vm in self.app.domains:
            if isinstance(vm, qubes.vm.adminvm.AdminVM):
                continue
            if vm.is_running():
                proc = yield from vm.run_service(
                    'qubes.SuspendPreAll', user='root',
                    stdin=subprocess.DEVNULL,
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.DEVNULL)
                processes.append(proc)

        # FIXME: some timeout?
        if processes:
            yield from asyncio.wait([p.wait() for p in processes])

        coros = []
        # then suspend/pause VMs
        for vm in self.app.domains:
            if isinstance(vm, qubes.vm.adminvm.AdminVM):
                continue
            if vm.is_running():
                coros.append(vm.suspend())
        if coros:
            yield from asyncio.wait(coros)

    @qubes.api.method('internal.SuspendPost', no_payload=True)
    @asyncio.coroutine
    def suspend_post(self):
[...]
            if vm.get_power_state() in ["Paused", "Suspended"]:
                coros.append(vm.resume())
[...]
                proc = yield from vm.run_service(
                    'qubes.SuspendPostAll', user='root',
[...]

In sys-net, there exist all of:

  • qubes.SuspendPre
  • qubes.SuspendPreAll
  • qubes.SuspendPost
  • qubes.SuspendPostAll

sys-net's /etc/qubes-rpc/qubes.SuspendPreAll (the one that's called on suspend):

#!/bin/sh
[...license...]
for script in /etc/qubes/suspend-pre.d/*.sh; do
    if [ -x "$script" ]; then
        "$script"
    fi
done

but, there are no hooks there!

[user@sys-net qubes-rpc]$ ls -a /etc/qubes/suspend-pre.d/README
/etc/qubes/suspend-pre.d/README
[user@sys-net qubes-rpc]$ cat /etc/qubes/suspend-pre.d/README 
All executable files with `.sh` suffix in this directory will be executed as
root just before host suspend.

so it does nothing.

qubes.SuspendPre seems to do what's intended:

sys-net's /etc/qubes-rpc/qubes.SuspendPre (apparently not called?):

#!/bin/sh
/usr/lib/qubes/prepare-suspend suspend

SuspendPost[All] is a similar story, only difference being existence of /etc/qubes/suspend-post.d/qvm-sync-clock.sh and calling /usr/lib/qubes/prepare-suspend with "resume"

But if qubes.SuspendPre (not SuspendPreAll) isn't called, how is NetworkManager getting unloaded in the first place? Maybe through somewhere deeper in that vm.suspend() and vm.resume() call in qubes/api/internal.py above? Lets see...

Instrumenting the rpc service scripts and suspending/resuming, I observe the following sequence:

Thu Oct 5 12:30:52 EDT 2017 uid=0(root) gid=0(root) groups=0(root) /etc/qubes-rpc/qubes.SuspendPreAll
Thu Oct 5 12:30:52 EDT 2017 uid=1000(user) gid=1000(user) groups=1000(user),98(qubes) /etc/qubes-rpc/qubes.SuspendPre
Thu Oct 5 12:31:12 EDT 2017 uid=1000(user) gid=1000(user) groups=1000(user),98(qubes) /etc/qubes-rpc/qubes.SuspendPost
Thu Oct 5 12:31:12 EDT 2017 uid=0(root) gid=0(root) groups=0(root) /etc/qubes-rpc/qubes.SuspendPostAll

so qubes.Suspend{Pre,Post}All are getting called as root, and qubes.Suspend{Pre,Post} (no All) are getting called as the user, but they are indeed all getting called.

[user@dom0 ~]$ sudo find /usr -type f -ipath '*qubes*' -exec grep -F SuspendPre {} +
/usr/lib64/pm-utils/sleep.d/52qubes-pause-vms:            dom0 internal.SuspendPre dom0
/usr/lib/python3.5/site-packages/qubes/api/internal.py:    @qubes.api.method('internal.SuspendPre', no_payload=True)
/usr/lib/python3.5/site-packages/qubes/api/internal.py:                    'qubes.SuspendPreAll', user='root',
Binary file /usr/lib/python3.5/site-packages/qubes/api/__pycache__/internal.cpython-35.opt-1.pyc matches
Binary file /usr/lib/python3.5/site-packages/qubes/api/__pycache__/internal.cpython-35.pyc matches
/usr/lib/python3.5/site-packages/qubes/vm/qubesvm.py:            yield from self.run_service_for_stdio('qubes.SuspendPre')
Binary file /usr/lib/python3.5/site-packages/qubes/vm/__pycache__/qubesvm.cpython-35.pyc matches
Binary file /usr/lib/python3.5/site-packages/qubes/vm/__pycache__/qubesvm.cpython-35.opt-1.pyc matches

From /usr/lib/python3.5/site-packages/qubes/vm/qubesvm.py:

    @asyncio.coroutine
    def suspend(self):
        '''Suspend (pause) domain.

        :raises qubes.exc.QubesVMNotRunnignError: \
            when domain is already shut down.
        '''

        if not self.is_running() and not self.is_paused():
            raise qubes.exc.QubesVMNotRunningError(self)

        if list(self.devices['pci'].attached()):
            yield from self.run_service_for_stdio('qubes.SuspendPre')
            self.libvirt_domain.pMSuspendForDuration(
                libvirt.VIR_NODE_SUSPEND_TARGET_MEM, 0, 0)
        else:
            self.libvirt_domain.suspend()

        return self
[...]
    @asyncio.coroutine
    def resume(self):
        '''Resume suspended domain.

        :raises qubes.exc.QubesVMNotSuspendedError: when machine is not paused
        :raises qubes.exc.QubesVMError: when machine is suspended
        '''

        # pylint: disable=not-an-iterable
        if self.get_power_state() == "Suspended":
            self.libvirt_domain.pMWakeup()
            yield from self.run_service_for_stdio('qubes.SuspendPost')
        else:
            yield from self.unpause()

        return self

Sure enough...

[user@dom0 ~]$ python3 -ic ''
>>> import qubes
>>> import asyncio
>>> sn = qubes.Qubes().domains['sys-net']
  WARNING: Running as a non-root user. Functionality may be unavailable.
  /run/lvm/lvmetad.socket: access failed: Permission denied
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  /dev/mapper/control: open failed: Permission denied
  Failure to communicate with kernel device-mapper driver.
  Incompatible libdevmapper 1.02.136 (2016-11-05) and kernel driver (unknown version).
>>> rl = asyncio.get_event_loop()
>>> sn.get_power_state()
'Running'

vm.suspend():

>>> rl.run_until_complete(sn.suspend())
<AppVM object at 0x7da9e4de1ba8 autostart=True backup_timestamp=None debug=False default_user=user gateway=10.137.0.3 include_in_backups=True installed_by_rpm=False ip=10.137.0.3 kernel=4.9.45-21 label=red mac=00:16:3E:5E:6C:00 maxmem=4000 memory=400 name=sys-net provides_network=True qid=3 qrexec_timeout=60 stubdom_mem=None stubdom_xid=-1 template_for_dispvms=False updateable=False uuid=80f768d8-1bda-4a4a-8c66-f5d2b5787f92 vcpus=2 virt_mode=pv visible_ip=10.137.0.3 xid=1 default_dispvm=fedora-25-dvm kernelopts=nopat iommu=soft swiotlb=8192 netvm=None template=fedora-25>
>>> sn.get_power_state()
'Suspended'

causes:

Fri Oct 6 13:14:13 EDT 2017 uid=1000(user) gid=1000(user) groups=1000(user),98(qubes) /etc/qubes-rpc/qubes.SuspendPre

and:

>>> rl.run_until_complete(sn.resume())
<AppVM object at 0x7da9e4de1ba8 autostart=True backup_timestamp=None debug=False default_user=user gateway=10.137.0.3 include_in_backups=True installed_by_rpm=False ip=10.137.0.3 kernel=4.9.45-21 label=red mac=00:16:3E:5E:6C:00 maxmem=4000 memory=400 name=sys-net provides_network=True qid=3 qrexec_timeout=60 stubdom_mem=None stubdom_xid=-1 template_for_dispvms=False updateable=False uuid=80f768d8-1bda-4a4a-8c66-f5d2b5787f92 vcpus=2 virt_mode=pv visible_ip=10.137.0.3 xid=1 default_dispvm=fedora-25-dvm kernelopts=nopat iommu=soft swiotlb=8192 netvm=None template=fedora-25>
>>> sn.get_power_state()
'Running'

causes:

Fri Oct 6 13:14:28 EDT 2017 uid=1000(user) gid=1000(user) groups=1000(user),98(qubes) /etc/qubes-rpc/qubes.SuspendPost

Invoking qubes.Prepare{Suspend,Resume} as root makes wireless work again on resume:

--- qubesvm.py.orig	2017-10-06 10:43:56.429960425 -0700
+++ qubesvm.py	2017-10-06 10:25:42.947664673 -0700
@@ -967,7 +967,8 @@
             raise qubes.exc.QubesVMNotRunningError(self)
 
         if list(self.devices['pci'].attached()):
-            yield from self.run_service_for_stdio('qubes.SuspendPre')
+            yield from self.run_service_for_stdio(
+                'qubes.SuspendPre', user='root')
             self.libvirt_domain.pMSuspendForDuration(
                 libvirt.VIR_NODE_SUSPEND_TARGET_MEM, 0, 0)
         else:
@@ -997,7 +998,8 @@
         # pylint: disable=not-an-iterable
         if self.get_power_state() == "Suspended":
             self.libvirt_domain.pMWakeup()
-            yield from self.run_service_for_stdio('qubes.SuspendPost')
+            yield from self.run_service_for_stdio(
+                'qubes.SuspendPost', user='root')
         else:
             yield from self.unpause()
 

The contents of /usr/lib/qubes/prepare-suspend (invoked by qubes.Prepare{Suspend,Resume}) do appear to be intended to run as root (it expects modprobe to work, etc.). I would confirm whether they were invoked as root on a 3.2 machine, but I don't have one with me, and don't have source on hand right now either (and github's search feature apparently only indexes master which is now R4).

final side-note: having both qubes.SuspendPre and qubes.SuspendPreAll in the first place seems kindof weird to me. Why should a VM care if it is the only one being suspended or if other VMs are being suspended as well? Would a better solution be to simplify the hooks altogther and just invoke invoke /usr/lib/qubes/prepare-suspend {suspend,resume} via /etc/qubes/suspend-{pre,post}.d/...?

@marmarek (or whoever): Feel free to commit the preferred fix without me, it might be some time before I can get around to submitting a PR.

@jpouellet
Copy link
Contributor Author

Also, sorry for the journal-rant-style report above, it started as just notes to myself to remember what I was doing while working on this offline in the few spare bits of free time I had over a couple days.

@marmarek
Copy link
Member

marmarek commented Oct 7, 2017

side note: the above invokes 52qubes-pause-vms with two arguments, when it only takes one:

Original standard of pm-utils required two parameters, so plugging this script into systemd equivalent follow it. Now probably unneeded as pm-utils isn't used in fc25-based dom0 anymore.

Also, it gets invoked with uid=1000, which explains the

This is the actual problem.

Oct 04 17:02:28 dom0 systemd[1]: qubes-suspend.service: Stopping timed out. Terminating.
Oct 04 17:02:28 dom0 systemd[1]: Stopped Qubes suspend hooks.
Oct 04 17:02:28 dom0 systemd[1]: qubes-suspend.service: Unit entered failed state.
Oct 04 17:02:28 dom0 systemd[1]: qubes-suspend.service: Failed with result 'timeout'.

This is also a problem, probably #3142

side note: the service stuff appears outdated, and qubes-core-netvm doesn't appear to exist anymore. can it be removed and simplified to just systemctl?

It is there, but not installed on Fedora. Generally we do keep scripts for non-systemd VMs, to ease porting *-agent to such systems too.

final side-note: having both qubes.SuspendPre and qubes.SuspendPreAll in the first place seems kindof weird to me. Why should a VM care if it is the only one being suspended or if other VMs are being suspended as well? Would a better solution be to simplify the hooks altogther and just invoke invoke /usr/lib/qubes/prepare-suspend {suspend,resume} via /etc/qubes/suspend-{pre,post}.d/...?

Theoretically yes, but that would break compatibility - we try to maintain compatibility of dom0-VM interface, even between 3.x and 4.x. So it's possible to use VMs from 3.x on 4.x (including custom templates etc).

side note: there are no /etc/qubes-rpc/{,policy/}internal.*, I wonder if this may have hidden side-effects

This is intentional. Those calls to qubesd are not meant to be called through qrexec (from outside of dom0). Only by dom0, through a separate socket.

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core P: major Priority: major. Between "default" and "critical" in severity. labels Oct 7, 2017
@marmarek marmarek added this to the Release 4.0 milestone Oct 7, 2017
@marmarek marmarek self-assigned this Oct 7, 2017
@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.0.8-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.0.11-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

3 participants