Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployment.autoLuks deprecation #62211

Closed
flokli opened this issue May 29, 2019 · 4 comments
Closed

deployment.autoLuks deprecation #62211

flokli opened this issue May 29, 2019 · 4 comments

Comments

@flokli
Copy link
Contributor

flokli commented May 29, 2019

Issue description

This issue is a placeholder for all those users raising their voices in response to the deprecation of NixOps deployment.autoLuks introduced in #61321 (and backported to 19.03).

Please let us know if you are using the feature!

NixOps deployment.autoLuks is a feature to automatically handle block devices and luks encryption without storing secrets on the target devices.

Even in its current state it seems to be halfway broken (e.g. removing a LUKS device panics systemd), and people expressed doubts on whether it's being used at all.

Looking at the NixOps repository and searching for public infrastructure repositories didn't yield a large (or any) userbase of the feature. Thus we are asking for feedback if you are using it.

The changes previously done to our systemd fork included changes to the startup unit ordering. The local filesystems were no longer part of the very basic system init, allowing sshd and similar processes to start without finishing all mount units.

Due to those relaxed boot requirements a bunch of errors with state and runtime directories appeared. There were some fixes but they are still incomplete (e.g. nixos-rebuild switch regenerates all the state directories but reboots do not have the same guarantee).
Backing out of these changes and restoring a sane boot order for the price of requiring a few more lines of configuration in NixOps setups seems like a reasonable tradeoff.

Why did this become necessary?

In the past our systemd fork carried a patch (NixOS/systemd@ce79214) that removed the local-fs.target from the sysinit.target. This allowed services such as sshd to start while not all of the local filesystems were mounted, thus making it possible to send over keys using sshd.service. While probably a plausible workaround at the time this caused a bit of weird behavior down the road.

Systemd didn't support _netdev and subsequently struggled with all kinds of network block devices until roughly 2014.

Since systemd supports managing StateDirectory, RuntimeDirectory , etc (https://www.freedesktop.org/software/systemd/man/systemd.exec.html) and systemd-tmpfiles (https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html) and their usage increased (even inside systemd itself), the amount of unexpected side effects did increase.

While probably not noticeable for most people there is a race condition between the folders in /run/ and /var/lib/ being generated and the remaining system coming up. In many cases we might just be lucky that all the directories exist. In general it lead to many PreStart scripts that created those directories, if they are missing. Those in turn required to be priviledged since most daemons are not being run with root privileges. The option we used to turn those scripts into privileged scripts is now deprecated. We have an ongoing effort to replace them where possible (#56265, #62050, …).

Besides those, we are trying to reduce the amount of custom patches that are being applied to systemd. In the long run it should become easier to maintain our systemd package. Eventually we would like to upstream some of our changes in a portable way. Things that aren't strictly required for systemd to work on NixOS should therefore go away.

What can I do to make it work again?

Make sure you add _netdev to all the filesystems you are mounting via the autoLuks module. Adding that option moves them from the local-fs.target to remote-fs.target which will allow your system to start the sshd even without the luks volumes. Afterwards you can use nixops send-keys again.

Do not forget to read the error message you got and set the option that was mentioned there.

@andir andir mentioned this issue May 29, 2019
10 tasks
@AmineChikhaoui
Copy link
Member

AmineChikhaoui commented Jun 3, 2019

We're using this feature at Infor for hundreds of deployments, not all deployments are using 19.03 yet though but this will be a problem soon.

@JohnAZoidberg
Copy link
Member

Adding _netdev to the autoLuks devices seems like a good option. We just need to document it together with the autoLuks option.

That's what the mount(8) says about it:

_netdev
The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system).

@rbvermaa
Copy link
Member

We use it as well.

@flokli
Copy link
Contributor Author

flokli commented May 14, 2020

Judging from the very sparse feedback even after the "auto-close" of NixOS/nixops#1156 (comment) end of march, and from the fact this whole thing was interesting for the upgrade to NixOS 19.03, we can probably safely close this.

Given NixOS 18.09 got out of support in April 2019, and pre-19.03 machines have been running with an unsupported version for over a year, we can assume people found workarounds, or just don't ever update, at least it's not something immediately actionable, so I'll close this for now.

@flokli flokli closed this as completed May 14, 2020
ajs124 pushed a commit to helsinki-systems/nixpkgs that referenced this issue Dec 6, 2021
It was originally moved because of nixops autoLuks feature which
has been unsupported for a while.

See:
* NixOS#62211
* NixOS/nixops#1156 (comment)

systemd-tmpfiles-setup-dev.service needs to run very  early (even before
udev runs) because udev rules assume static device nodes already exist
even before udev is started. If these static device nodes do not exist;
systemd might have trouble mounting filesystems that require static
device nodes (like loopfs and btrfs).
github-actions bot pushed a commit that referenced this issue Dec 8, 2021
It was originally moved because of nixops autoLuks feature which
has been unsupported for a while.

See:
* #62211
* NixOS/nixops#1156 (comment)

systemd-tmpfiles-setup-dev.service needs to run very  early (even before
udev runs) because udev rules assume static device nodes already exist
even before udev is started. If these static device nodes do not exist;
systemd might have trouble mounting filesystems that require static
device nodes (like loopfs and btrfs).

(cherry picked from commit d4e4d27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants