Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/networkd: wait for udev to settle before starting networkd #39340

Merged
merged 1 commit into from
Apr 29, 2018

Conversation

xeji
Copy link
Contributor

@xeji xeji commented Apr 22, 2018

Motivation for this change

Fix #39069: Avoid a race condition between udevd renaming and networkd configuring interfaces when using networkd with predictable interface names.

Please backport to 18.03 because a resulting non-deterministic test failure has frequently delayed the nixos-18.03-small channel.

This is an interim fix to get rid of the race condition and resulting test failures. It does not break existing configs that use the old ethX interface names for stage 1 (initrd) networking, so it should be safe to backport to 18.03.

(#39329 proposes a better long-term solution: rename interfaces in stage 1 already, so interface names are consistently the same in initrd and fully booted system. But that change will break initrd networking in some existing configs, so we should make it carefully.)

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

... to avoid race condition between udevd renaming and
networkd configuring interfaces (39069)
@GrahamcOfBorg GrahamcOfBorg added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Apr 22, 2018
@GrahamcOfBorg GrahamcOfBorg added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Apr 22, 2018
@Mic92
Copy link
Member

Mic92 commented Apr 23, 2018

Upstream systemd-networkd now renames network interfaces itself, which seems like a good idea, no?
That would prevent race conditions, but require us to have different udev rules, when networkd is enabled vs. disabled.

@Mic92
Copy link
Member

Mic92 commented Apr 23, 2018

We would need udev rules from here:

$ find $buildInputs | grep 80-net-setup-link.rules
/nix/store/64xpvhqlvvl9vvvvbih0b767lfpbg74y-systemd-238/lib/udev/rules.d/80-net-setup-link.rules

and also this pull request, which was reverted: #29768

Back then I made the mistake merging/reverting both features independently.
However the combination of both should make renaming interfaces to work properly

@xeji
Copy link
Contributor Author

xeji commented Apr 23, 2018

Upstream systemd-networkd now renames network interfaces itself, which seems like a good idea, no?

it's still udev, not networkd. The .link files control the net_setup_link builtin of udev, see systemd.link(5).

That would prevent race conditions,

Unfortunately no. The race condition occurs because nixos brings up interfaces in stage 1 without renaming them, we only add the renaming rule in stage 2. networkd just tries to configure all interfaces that it finds, hence a race condition with udev trying to rename them in stage 2.

The best solution is to do renaming when the interfaces are first created in stage 1, see #39329.

but require us to have different udev rules, when networkd is enabled vs. disabled.

Actually, no if we do the renaming in stage 1.

Back then I made the mistake merging/reverting both features independently. However the combination of both should make renaming interfaces to work properly

Thanks for the references, I had not seen your previous PRs. Yes, switching back to upstream rules and .link files is a good idea, we should integrate this into #39329.

There's a small drawback: the builtin mechanism is controlled by the net.ifnames kernel boot parameter, so changing predictableInterfaceNames will require a reboot. I don't mind at all but there was a discussion about this a long time ago, f756369 and 1def5ba.

@@ -712,6 +712,9 @@ in
systemd.services.systemd-networkd = {
wantedBy = [ "multi-user.target" ];
restartTriggers = map (f: f.source) (unitFiles);
# prevent race condition with interface renaming (#39069)
requires = [ "systemd-udev-settle.service" ];
after = [ "systemd-udev-settle.service" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I have:

systemd.services.systemd-udev-settle.serviceConfig.ExecStart = ["" "${pkgs.coreutils}/bin/true"];

in my configuration because systemd-udev-settle timeouts on my machine sometimes.
That's why I would be glad if we could get rid of that service sometimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is not ideal but the best we can do in stage 2 to prevent the race condition. We can revert this as soon as we do the renaming in stage 1.

Copy link
Contributor Author

@xeji xeji Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using systemd-udev-settle, we could add
serviceConfig.ExecStartPre="${systemd.package}/bin/udevadm settle -t 10";
directly to systemd-networkd to use a shorter timeout than the default 120s. I guess 10s will be enough to get interface renaming done... ( on a real system, but on hydra you never know).

@oxij
Copy link
Member

oxij commented Apr 24, 2018

LGTM

@xeji xeji merged commit 1937b81 into NixOS:master Apr 29, 2018
@xeji xeji deleted the interim-fix-39069 branch April 29, 2018 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nixos/networkd: fails non-deterministically with predictable interface names
4 participants