Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: initrd: Use systemd #120015

Closed

Conversation

ElvishJerricco
Copy link
Contributor

Motivation for this change
  • This parallelizes early boot.
  • Systemd generators like the cryptsetup and fstab ones can be used to handle mounting and prompting for passwords.
  • Plymouth can start much earlier, in early boot, making the boot graphics much cleaner.
  • Systemd services from other packages can be used.
  • systemd-ask-password can be used.
Things done
  • Set initrd's /init to the systemd binary.
  • Use systemctl switch-root instead of switch_root.
  • Minimize the size inflation. Currently this only inflates a base initrd by 1M. I'd like to get this down further though.
  • Switch to fstab and crypttab generators.
  • Use systemd's emergency.target on failure to boot.
  • Make sure none of the NixOS tests have broken.
  • Start plymouth in initrd.
  • Use systemd for setting up the initrd network stack.
  • Offer a trivial NixOS module interface for adding packages' systemd services to initrd.
  • Backward compat systemd units for things like postDeviceCommands

This is different approach than #74842. In the previous attempt, systemd was essentially only used to spawn the init script we had before. The goal was to gradually move code out of the init script and into systemd units. The new approach is to start with just the minimal systemd units and go from there. This ensures that things are working the way they're supposed to, and that the init script isn't acting as a crutch for a badly implemented migration to systemd.

The downside is that this version supports only a fraction of the existing initrd configuration options of NixOS. Many of these things will be easy to migrate, but I suspect getting everything implemented will be tedious. So I think it would be best to add special service units that run all the various pre/postFooCommands style options at the appropriate times. This should be able to bring back most or all of NixOS's initrd options. But these units should be disabled while working on migrating a feature to systemd units, in order to prevent the aforementioned crutch.

Another change in this approach has to do with extra-utils and the makeInitrd function. Systemd has a number of paths hard coded, such as the paths to the default generators, and to tools like the mount binary. It's usually possible to work around this as we do with extra-utils, but it can be challenging in some cases. It's far easier to put everything at their original paths and to rely on find-libs and manual effort to find dependencies rather than Nix closures, considering how few dependencies there tend to be that aren't found by find-libs and how obvious they usually are.

The initrd produced currently is quite a bit larger, at around 20M, but there's a lot of excess in it at the moment, so I'm sure it can be cut down quite a bit. This branch is no where near ready for any kind of serious use, but testing and feedback is appreciated.

PS: Having the emergency shell in initrd is extraordinarily useful, and being able to mess with things like devices or systemd units from initrd and retry the boot with systemctl default without rebooting or building a new initrd is a big time saver.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Apr 21, 2021
@ElvishJerricco ElvishJerricco marked this pull request as draft April 21, 2021 01:45
@Ericson2314
Copy link
Member

Good luck! :)

@zimbatm
Copy link
Member

zimbatm commented Jun 7, 2021

/cc @flokli

@ElvishJerricco
Copy link
Contributor Author

I'm thinking I can rework this PR to be something like boot.initrd.useSystemd, similar to networking.useNetworkd, i.e. it could be merged and used as an opt-in feature rather than switching nixos over entirely.

@Ericson2314
Copy link
Member

Sounds good, and then someday we can change the default :).

@github-actions github-actions bot added the 6.topic: kernel The Linux kernel label Jul 9, 2021
@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 9, 2021
@ElvishJerricco ElvishJerricco force-pushed the rewrite-initrd-systemd branch 2 times, most recently from afb5257 to baa14e3 Compare July 9, 2021 17:57
@ElvishJerricco ElvishJerricco force-pushed the rewrite-initrd-systemd branch from baa14e3 to d3f0dcd Compare July 9, 2021 18:01
@arianvp
Copy link
Member

arianvp commented Jul 10, 2021 via email

@arianvp
Copy link
Member

arianvp commented Jul 10, 2021

Sorry I commented without reading the rationale (was replying from my email client). So ignore my previous comment.

If i understand correctly from the PR description the rust program does more than just find the nix closure. I'll have to take a closer look to fully understand what's going on here. A commit message with the rationale would definitely help me understand better.

@ElvishJerricco
Copy link
Contributor Author

Sorry I commented without reading the rationale (was replying from my email client). So ignore my previous comment.

If i understand correctly from the PR description the rust program does more than just find the nix closure. I'll have to take a closer look to fully understand what's going on here. A commit message with the rationale would definitely help me understand better.

It does less than finding the closure. It only resolves and preserves symlinks and finds shared libraries. It doesn't copy whole paths. This way I don't have to patch all kinds of things like systemd files or the systemd binary itself in ways that patchelf doesn't help with.

@ElvishJerricco
Copy link
Contributor Author

To clarify, the point of the rust program is to do away with the "extra utils" idea in its entirety. Programs placed in their original paths in the initrd. So we don't want to copy whole closures, or even whole store paths. Just the files indicated by the nixos configuration, and minimal set of files those strictly depend on.

# Program used to compress the cpio archive; use "cat" for no compression.
# This can also be a function which takes a package set and returns the path to the compressor,
# such as `pkgs: "${pkgs.lzop}/bin/lzop"`.
, compressor ? "gzip"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity - what is the reason of choosing gzip? No need for extra dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copying this:

@ElvishJerricco ElvishJerricco force-pushed the rewrite-initrd-systemd branch from d3f0dcd to 78d09d5 Compare July 11, 2021 19:20
@flokli
Copy link
Contributor

flokli commented Jul 21, 2021

First, I'm very happy to see someone picking this up. systemd-in-initrd will solve many long-standing issues.

Some questions:

Can you elaborate on what make-initrd-ng is doing? Wondering if we need to write that in rust, or if a bash script might be more readable.

As for the apporach of not using the closure-info, but synthesizing and patchelf'ing something FHS-y: It seems to be very likely people might want to add their own services/port over some of the existing functionality into the initrd. I wonder if that approach will make this way more complicated, and how chasing dependencies (and the interface for it will look like).

I couldn't find the issue I discovered this in, but at least for sytemd-makefs, systemd-growfs etc., mkfs.* and mount should just be picked up from $PATH - so if we extend $PATH in that unit file to whatever binaries we have support enabled for, this should work without patching binaries.

@ElvishJerricco
Copy link
Contributor Author

Wondering if we need to write that in rust, or if a bash script might be more readable.

I really tried to keep it down to bash, trust me :P But when I finally got a nearly functional bash implementation, 1) it took 3min to run (compared to this rust program's 2sec), and 2) I would hardly have called it readable.

Ultimately what the program does is copy objects into the target, respecting and replicating symlink structures, as well as recursing on ELF files or on directories. Symlinks are definitely the main reason it's complicated, which is kind of critical considering how systemd relies quite heavily on them. NixOS's plymouth implementation also relies on some symlinks that are ultimately what caused me to create this program.

but synthesizing and patchelf'ing something FHS-y:

And to clarify, the program does not patch any files at all; it copies them to their original locations and just puts some symlinks at places like /lib, /etc, and /bin so that some things are more trivial to make work. There's a number of binaries introduced in this approach that are very annoying to effectively patch; e.g. systemd hard codes the path to the mount command.

@Ericson2314
Copy link
Member

I also wonder if this rust script could be reused for a souped-up buildFHSEnv. That could be neat, and share the "burden" of this more wildly (though I am not concerned, bash is bad.)

@zhaofengli zhaofengli mentioned this pull request Aug 24, 2021
11 tasks
@@ -0,0 +1,207 @@
use std::collections::{HashSet, VecDeque};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high-level description of what this tool is doing would be good to add here. It doesn't do a lot and you can grasp it by reading through it but the signal/noise ratio isn't that great. Helps future readers/debuggers to have a "one paragraph" idea of what goes in and out of here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -0,0 +1,10 @@
{ rustPlatform }:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the cargoSha256 and use the cargoLock attribute instead:

buildRustPackage {
# ...
   cargoLock = {
     lockFile = ./Cargo.lock;
   };
# ...
}

This will also allow pulling in dependencies and updating them without having to care about the vendored package hashsum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, will do.


src = ./make-initrd-ng;

cargoSha256 = "0vx7whn47372mg19i3vbf7ag5g0ylifx8rin2hcy81fyhmms16h6";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add the usual meta attributes here as well.

Copy link
Member

@andir andir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I've had this on my backlog for a while now.

My primary concern with this big-bang migration is that we will break use-cases and only hear about it 6 months in (during the next release).

I think for a proper migration to systemd in stage1 we should

a) extract all the refactoring and land them already, basically being no-ops,
b) introduce a module system option to opt-in to systemd in stage1 to allow for early adopters. We can switch the default based on state version one release after initial support (or whatever the timeframe is).
d) Start with the minimal boot case at first & add a NixOS test for systemd in stage1 which we must extend with every new feature.
e) Start adding incremental support for features as we get to it / they are requested.

I am inline with your custom targets & units idea for backwards compatibility.

What do you say?

I am planning to work on this during the spring in December.

@@ -34,6 +34,7 @@ in

{
inherit (eval) pkgs config options;
inherit vmConfig;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover from debugging/still working? I don't think it is technically required, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not technically required, so yea I can remove it from this PR. I do find it tremendously useful to have in nixos/default.nix though, FWIW.

@@ -0,0 +1,46 @@
{ writeShellScriptBin, patchelf }:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we split out the refactoring of moving these two scripts into different files? I'd love to pick them into master already (as long as they are a no-op). This reduces the amount of changes this PR introduces and we can rule out some issues during the transition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find-libs gets removed in a later commit. The rust program handles that work now.

@@ -714,6 +731,16 @@ in
noCheck = true; # fsck fails on a r/o filesystem
};
});
boot.initrd.extraUnits."rw-store.service" = lib.mkIf cfg.writableStore (pkgs.writeText "rw-store.service" ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use our regular systemd unit generation for this? It is a trivial unit yet doesn't allow any composition (without adding a unit named rw-store.service.d/overrides.conf or such?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I wasn't 100% sure if that would be too much. I'll look into it...

@@ -454,7 +454,7 @@ in
{
symlink = "/etc/modules-load.d/nixos.conf";
object = pkgs.writeText "nixos.conf"
(lib.concatStringsSep "\n" (config.boot.initrd.availableKernelModules ++ config.boot.initrd.kernelModules));
(lib.concatStringsSep "\n" config.boot.initrd.kernelModules);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some rational to "only load the minimum in modules-load.d"? Perhaps we did it over excessive? Would be good to add to the commit message in case this backfires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, including availableKernelModules was a mistake in my code in the first place. NixOS has never auto-loaded those; they're there for the kernel to request to be loaded if needed.


# For some reason, running 'udevadm info --cleanup-db' in initrd causes
# LUKS devices to get SYSTEMD_READY=0 in stage 2 (but not other devices)
# which prevents LUKS encrypted file systems from being mounted in stage 2.
rm $out/system/initrd-udevadm-cleanup-db.service

ln -s ../plymouth-start.service $out/system/initrd.target.wants/plymouth-start.service

rm $out/system/proc-sys-fs-binfmt_misc.automount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good advice of developers when to add a line here? Previously units were opt-in and now we have some list of units that we don't want anymore.

For example I could see why we want the random seed in the initrd when unlocking the system using some network interaction that requires cryptography.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These services were simply the ones that created errors during boot. I didn't spend much time trying to figure out why they caused errors.

};

fstab = pkgs.writeText "fstab" (lib.concatMapStringsSep "\n"
({ fsType, mountPoint, device, options, ... }:
"${device} /sysroot${mountPoint} ${fsType} ${lib.concatStringsSep "," options}") fileSystems);

groups = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the source of this list? When do we add groups here? Is that the list of groups that systemd somewhere specifies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

udev warned about non-existent groups in the journal, so I threw these in, in hopes they'd fix whatever I was debugging at the time. I don't think they're necessary at all, and can probably be removed.

@ElvishJerricco
Copy link
Contributor Author

@andir I'm working on a re-tooling of this to address some of the things you mentioned, in particular separating this out into an alternative initrd, as opposed to a total replacement. I'll also check out if reusing the existing systemd module is a good idea (though I worry that it won't play nicely with the fact that the new initrd doesn't implicitly include the entire closure).

@ElvishJerricco
Copy link
Contributor Author

ElvishJerricco commented Aug 30, 2021

Would it be too aggressive to factor out all the parts of nixos/modules/system/boot/systemd.nix that are relevant to both initrd and stage 2 into a separate file? Because certainly not all of it is relevant, and dotting the whole systemd.nix file with mkIfs depending on if we're evaluating initrd or stage 2 options seems confusing and error prone.

@bobvanderlinden
Copy link
Member

Really cool PR! Thanks for the work so far. I tried to get systemd in initrd in the past, but got stuck a number of times. This PR was a good starting point for me 👍

I worked a bit on this PR today. Brought it up-to-date with nixos-unstable. I also worked on another improvement in makeInitrdNG so that it can avoid doing copying of files (it currently copies files to a temporary directory to make cpio happy). It speeds up the initrd creation a tad bit more. The changes are here: https://github.com/bobvanderlinden/nixpkgs/tree/rewrite-initrd-systemd

Also, I was made aware on Matrix that there is a room dedicated to Stage 1 Systemd. I thought it might be useful to mention it here too: https://matrix.to/#/#stage1systemd:helsinki-systems.de

@samueldr
Copy link
Member

About making cpio archives, Linux provides a utility to do so without copying files around, and I have something made here in a WIP project to "borrow" the utility from Linux:

And a module that produces the file format used by that utility:

I have no idea if it's any better than what you're using in your rework, @bobvanderlinden, but that's another option to look at I guess.

@bobvanderlinden
Copy link
Member

bobvanderlinden commented Mar 13, 2022

@samueldr That does also look like quite an improvement. The usage of https://github.com/torvalds/linux/blob/master/usr/gen_init_cpio.c is nice, it has a better interface than cpio and would avoid the use of proot in my changes. I might look into integrating gen_init_cpio.c into my changes.

Determining which files need to go on the initrd is better in @ElvishJerricco's branch and the additional changes in mine. Instead of specifying each package (or file) that needs to go on the initrd, it can find the dependencies of files and only include those. So, instead of including the full systemd package, it'll only include the files that ${systemd}/lib/systemd/systemd refers to. It does this recursively. @ElvishJerricco did this using patch_elf. It results in a smaller initrd.

I also added that it would look for /nix/store/XXXXX... paths in other (non-ELF) files. That means that whenever a config file (like systemd service files) refer to an executable, those executables (and all of their .so dependencies) are included in the initrd automatically. Any implicit dependencies do need to be added manually. I think this would help with other NixOS modules that need to add something to initrd: only define a systemd service file and it should have most of the dependencies available.


Today I worked on refactoring the NixOS systemd module so that it will be easier to reuse unit configuration for initrd's systemd. See #164016

I think next up is looking for a way to make this PR compatible with extraUtilsCommands, postDeviceCommands and postMountCommands.


@dasJ gave some feedback on Matrix (btw, stage1-systemd room there is very appropriate for this PR: https://matrix.to/#/#stage1systemd:helsinki-systems.de). I revise my above comment: the next step is making sure systemd can be used as an option instead of a replacement of stage-1.nix.

@ElvishJerricco
Copy link
Contributor Author

Closing in favor of #164943

[Unit]
IgnoreOnIsolate=yes
[Service]
Environment=PATH=${lib.concatMapStringsSep ":" (p: "${lib.getBin p}/bin") config.system.fsPackages}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this adding to $PATH end up somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flokli I believe it's covered by the DefaultEnvironment in system.conf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: kernel The Linux kernel 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants