Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wpa_supplicant service fails at boot, but starts fine after boot (with systemctl start wpa_supplicant) #101963

Closed
codyps opened this issue Oct 28, 2020 · 19 comments
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@codyps
Copy link
Contributor

codyps commented Oct 28, 2020

Describe the bug
On the Raspberry Pi 4B, I've enabled wireless with the following configuration.nix fragment:

  networking.wireless.enable = true;
  networking.wireless.networks = {
    "my_network" = {
      psk = "my passphrase";
    };
  };

When the system is rebooted, wpa_supplicant tries to start, but instead we just get a print out of the wpa_supplicant usage. Starting it manually after the system boots with systemctl start wpa_supplicant causes it to start without issue

`journalctl -u wpa_supplicant` from after a reboot and a `systemctl start wpa_supplicant`
-- Reboot --
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: Successfully initialized wpa_supplicant
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: wpa_supplicant v2.9
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: Copyright (c) 2003-2019, Jouni Malinen <[email protected]> and contributors
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: This software may be distributed under the terms of the BSD license.
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: See README for more details.
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: This product includes software developed by the OpenSSL Project
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: for use in the OpenSSL Toolkit (http://www.openssl.org/)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: usage:
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   wpa_supplicant [-BddhKLqqstuvW] [-P<pid file>] [-g<global ctrl>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         [-G<group>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         -i<ifname> -c<config file> [-C<ctrl>] [-D<driver>] [-p<driver_param>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         [-b<br_ifname>] [-e<entropy file>] [-f<debug file>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         [-o<override driver>] [-O<override ctrl>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         [-N -i<ifname> -c<conf> [-C<ctrl>] [-D<driver>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         [-m<P2P Device config file>] \
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:         [-p<driver_param>] [-b<br_ifname>] [-I<config file>] ...]
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: drivers:
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   nl80211 = Linux nl80211/cfg80211
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   wext = Linux wireless extensions (generic)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   wired = Wired Ethernet driver
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: options:
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -b = optional bridge interface name
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -B = run daemon in the background
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -c = Configuration file
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -C = ctrl_interface parameter (only used if -c is not)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -d = increase debugging verbosity (-dd even more)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -D = driver name (can be multiple drivers: nl80211,wext)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -e = entropy file
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -f = log output to debug file instead of stdout
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -g = global ctrl_interface
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -G = global ctrl_interface group
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -h = show this help text
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -i = interface name
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -I = additional configuration file
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -K = include keys (passwords, etc.) in debug output
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -L = show license (BSD)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -m = Configuration file for the P2P Device interface
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -N = start describing new interface
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -o = override driver parameter for new interfaces
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -O = override ctrl_interface parameter for new interfaces
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -p = driver parameters
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -P = PID file
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -q = decrease debugging verbosity (-qq even less)
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -s = log output to syslog instead of stdout
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -t = include timestamp in debug messages
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -u = enable DBus control interface
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -v = show version
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   -W = wait for a control interface monitor before starting
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]: example:
Oct 25 23:53:01 skeeter wpa_supplicant-start[633]:   wpa_supplicant -Dnl80211 -iwlan0 -c/etc/wpa_supplicant.conf
Oct 25 23:53:01 skeeter systemd[1]: Started WPA Supplicant.
Oct 25 23:53:01 skeeter systemd[1]: wpa_supplicant.service: Main process exited, code=exited, status=255/EXCEPTION
Oct 25 23:53:01 skeeter systemd[1]: wpa_supplicant.service: Failed with result 'exit-code'.
Oct 25 23:55:25 skeeter systemd[1]: Started WPA Supplicant.
Oct 25 23:55:25 skeeter wpa_supplicant[789]: Successfully initialized wpa_supplicant
Oct 25 23:55:27 skeeter wpa_supplicant[789]: wlan0: Trying to associate with SSID '123456'
Oct 25 23:55:30 skeeter wpa_supplicant[789]: wlan0: Associated with ec:08:6b:bb:06:82
Oct 25 23:55:30 skeeter wpa_supplicant[789]: wlan0: CTRL-EVENT-CONNECTED - Connection to ec:08:6b:bb:06:82 completed [id=0 id_str=]
Oct 25 23:55:30 skeeter wpa_supplicant[789]: wlan0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0

I believe what is happening here is likely the wifi interface (wlan0) not showing up until after the wpa_supplicant-start script (which tries to enumerate the wifi interfaces) runs, resulting in a bad set of args being passed to wpa_supplicant.

To Reproduce
Steps to reproduce the behavior:

  1. Install nixos on a rpi4
  2. Add the wireless configuration noted above to /etc/nixos/configuration.nix
  3. nixos-rebuild switch
  4. reboot
  5. note that wpa_supplicant has not started.

Expected behavior
wpa_supplicant service starts after reboot without intervention

Additional context
This is a new rpi4 install using the sd-card image from hydra

Notify maintainers

Metadata

# nix-shell -p nix-info --run "nix-info -m"
these paths will be fetched (0.05 MiB download, 0.28 MiB unpacked):
  /nix/store/99dkh8kwwzbs1pc8qy4p02v1l3ai5a4m-bash-interactive-4.4-p23-dev
copying path '/nix/store/99dkh8kwwzbs1pc8qy4p02v1l3ai5a4m-bash-interactive-4.4-p23-dev' from 'https://cache.nixos.org'...
 - system: `"aarch64-linux"`
 - host os: `Linux 4.19.118, NixOS, 21.03pre248334.02daf3f0768 (Okapi)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.7`
 - channels(root): `"nixos-21.03pre248236.bd017760509"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
@codyps codyps added the 0.kind: bug Something is broken label Oct 28, 2020
@codyps
Copy link
Contributor Author

codyps commented Oct 28, 2020

Note that while #82462 seems similar, it doesn't note the failure pattern with wpa_supplicant and is instead focused on the iwd failure.

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Oct 30, 2020
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Nov 23, 2020

What does the journal say? What is the exit status?

@codyps
Copy link
Contributor Author

codyps commented Nov 23, 2020

The content of the journal is included in the issue under the details block with the summary text "journalctl -u wpa_supplicant from after a reboot and a systemctl start wpa_supplicant" (click on that text to see the journal).

Is there more extensive journal output that would be helpful here? I'm happy to provide more journal output if you have a commandline for journalctl in mind.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Nov 23, 2020

Oh, sorry I missed the little arrow. The exit status is 255 (-1) and it's printing the usage: this confirms my theory that wpa_supplicant is being passed invalid options at boot. The fault is most likely in the following shell script:

@codyps
Copy link
Contributor Author

codyps commented Nov 23, 2020

Yes, I suspect that it's due to a delay in the wlan interface appearing. As I said in the issue itself:

I believe what is happening here is likely the wifi interface (wlan0) not showing up until after the wpa_supplicant-start script (which tries to enumerate the wifi interfaces) runs, resulting in a bad set of args being passed to wpa_supplicant.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Nov 23, 2020

I can confirm the suspicion: I ran the script with set -x at boot and indeed no card was found, which resulted in wpa_supplicant being started without -i and the usage error.
Technically, this shouldn't be a problem because the module also adds a udev rule to restart wpa_supplicant when an interface appears, but it doesn't look like it's working.
Until I figure this out, a workaround is to specify the interface with wireless.interfaces: this should configure a dependency on the unit file.

rnhmjoj added a commit to rnhmjoj/nixpkgs that referenced this issue Nov 23, 2020
This resolves issue NixOS#101963.

When the service is started and no interface is ready yet, wpa_supplicant
is being exec'd with no `-i` flags, thus failing. Once the interfaces
are ready, the udev rule would fire but wouldn't restart the unit because
it wasn't currently running (see systemctl(1) try-restart).

The solution is to exit (with a clear error message) but always restart
wpa_supplicant when the interfaces are modified.
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Nov 25, 2020

@Jamesmon Can you test my patch in #104722?

@8573

This comment has been minimized.

@rnhmjoj

This comment has been minimized.

rnhmjoj added a commit to rnhmjoj/nixpkgs that referenced this issue Dec 15, 2020
This resolves issue NixOS#101963.

When the service is started and no interface is ready yet, wpa_supplicant
is being exec'd with no `-i` flags, thus failing. Once the interfaces
are ready, the udev rule would fire but wouldn't restart the unit because
it wasn't currently running (see systemctl(1) try-restart).

The solution is to exit (with a clear error message) but always restart
wpa_supplicant when the interfaces are modified.

(cherry picked from commit 8f17761)
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Dec 15, 2020

Should be fixed in the next channel update.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Dec 22, 2020

Unfortunately, I had to revert the fix: it's causing a long delay in the boot of a system with systemd-udev-settle.
I can only suggest to specify the interfaces manually with networking.wireless.interfaces, for now.

sballert added a commit to sballert/nixos-config that referenced this issue Mar 17, 2021
@domenkozar
Copy link
Member

domenkozar commented May 10, 2021

Maybe we should make interfaces a required attribute until this is fixed properly?

Following NixOS manual this is quite a surprising failure.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented May 10, 2021

It looks like that it's the only way. I wonder how distros deal with this problem: I think on arch you have to specify an interface, not sure about others.

@zackelan
Copy link

zackelan commented Jun 1, 2021

Tripped over this almost immediately when upgrading my testing laptop to 21.05. I think a mention in the release notes would be very helpful.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Jun 1, 2021

Since I couldn't get rid of all the systemd-udev-settle for 21.05, #104722 is still not applicable. I guess I'll do what domen suggested.

rnhmjoj added a commit to rnhmjoj/nixpkgs that referenced this issue Jun 1, 2021
This is the only way to solve issue NixOS#101963, for now.
mkf added a commit to mkf/etc-nixos that referenced this issue Jun 3, 2021
github-actions bot pushed a commit that referenced this issue Jun 6, 2021
This is the only way to solve issue #101963, for now.

(cherry picked from commit 030a521)
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Jun 6, 2021

The problem should be "fixed" for now. When systemd-udev-settle is fully removed (only four modules left: #73095 (comment)) I'll reintroduce the automatic selection of a network card.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Jun 21, 2021

I made a new patch to fix this problem, this one doesn't depend on systemd-udev-settle being removed.
It would help If you could try it: #127595

rnhmjoj added a commit to rnhmjoj/nixpkgs that referenced this issue Aug 11, 2021
I may have finally found a clean solution to the issues[1][2][3] with
the automatic discovery of wireless network interfaces.

[1]: NixOS#101963
[2]: NixOS#23196
[3]: NixOS#125917 (comment)

Currently the start script fails right away if no interface is available
by the time it's running, possibly leaving the system without network.
This happens when running a little early in the boot. A solution is to
instead wait for at least one interface to appear before scanning the
/sys/class/net/ directory. This is done here by listening for the right
udev events (from the net/wlan subsystem) using the `udevadm monitor`
command and grep to match its output.

This methods guarantees the availability of at least one interface to
wpa_supplicant, but won't add additional interfaces once it has started.
However, if the current interface is lost, say unplugged, the service is
automatically stopped and will be restarted as soon as a one (not
necessarily the same) is detected. It would be possible make this fully
dynamic by running another service that continously listen for udev
events and manages the main wpa_supplicant daemon, but this is probably
overkill.

I tested the following cases:

  - one interface, starting at boot, w/o predictable naming scheme
  - two interfaces, starting at boot (intel wireless and a usb adapter),
    w/o predictable naming scheme
  - one interface after the system booted, w/o predictable naming scheme
  - two interfaces after the system booted, w/o predictable naming scheme
  - unplugging and plugging back the current interface
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Aug 16, 2021

This should be completely resolved by #127595 (now in nixos-unstable).

@rnhmjoj rnhmjoj closed this as completed Aug 16, 2021
@johnrichardrinehart
Copy link
Contributor

@rnhmjoj You saved me: johnrichardrinehart/nix@ff35906 . I use that flake output for ephemeral OS instances on a flash drive. Before pinning to nixos-unstable I was having the problem described here. Now, everything works.

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests

7 participants