Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not nixos-install a system with system.etc.overlay.enable = true #319533

Closed
arianvp opened this issue Jun 13, 2024 · 7 comments · Fixed by #364239
Closed

Can not nixos-install a system with system.etc.overlay.enable = true #319533

arianvp opened this issue Jun 13, 2024 · 7 comments · Fixed by #364239
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@arianvp
Copy link
Member

arianvp commented Jun 13, 2024

Describe the bug

After #319524 bricking my machine I'm trying to nixos-install now. However nixos-install also doesn't work.

Steps To Reproduce

set system.etc.overlay.enable = true in configuration.nix and run nixos-install

image

Expected behavior

NixOS-install succeeds

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here

Add a 👍 reaction to issues you find important.

@arianvp arianvp added the 0.kind: bug Something is broken label Jun 13, 2024
@arianvp
Copy link
Member Author

arianvp commented Jun 13, 2024

Could it be that the overlay kernel module is not loaded?

@arianvp
Copy link
Member Author

arianvp commented Jun 13, 2024

Also weird that systemd-boot-builder.py is also not working. Why does /tmp break when enabling this option? Really confused

@Qyriad Qyriad added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: installer Related to graphical Calamares Installer labels Jun 13, 2024
@eclairevoyant eclairevoyant removed the 6.topic: installer Related to graphical Calamares Installer label Jun 13, 2024
@andre4ik3
Copy link
Member

This is still present in latest NixOS unstable, it can be reproduced by going into the minimal installer and making a simple flake configuration with the following (in addition to normal stuff like filesystems bootloader etc.):

{
  system.etc.overlay = {
    enable = true;
    mutable = false;
  };

  boot.initrd.systemd.enable = true;
  services.userborn.enable = true;

  users.users.foobar = {
    isNormalUser = true;
    extraGroups = [ "wheel" ];
    initialPassword = "foobar";
  };
}

Doing a nixos-install --no-channel-copy --flake path:/path/to/flake#foobar yields this:

Broken installer logs
installing the boot loader...
remounting /etc...
mktemp: failed to create directory via template ‘/mnt/tmp.IT2ZwvUdyW/nixos-etc-m
etadata.XXXXXXXXXX’: No such file or directory
mount: /nix/store/58wzhs5xx51zyghmawm1l5s0kbn3mlwp-etc-metadata.erofs: can't fin
d in /etc/fstab.
mktemp: failed to create directory via template ‘/mnt/tmp.IT2ZwvUdyW/nixos-etc.X
XXXXXXXXX’: No such file or directory
mount: bad usage
Try 'mount --help' for more information.
mount: overlay: can't find in /etc/fstab.
findmnt: can't read /proc/mounts: No such file or directory
Moving mount
Mounting beneath top mount
Success | move-mount.c: 443: main: Invalid number of arguments 1
umount: failed to parse /proc/self/mountinfo: No such file or directory
umount: : umount failed: No such file or directory.
rmdir: failed to remove '': No such file or directory
findmnt: can't read /proc/mounts: No such file or directory
Activation script snippet 'etc' failed (1)
Error: Failed to parse os-release

Caused by:
0: Failed to read /etc/os-release
1: No such file or directory (os error 2)

From my understanding, some part of nixos-install creates files in /etc, namely /etc/NIXOS and /etc/mtab. Then, when it tries to switch to configuration, it sees /etc already exists and has files in it, thereby assuming that it's a running system with an existing /etc, so it tries to remount it, which fails disastrously since it's not a booted system.

I have found that it can be kinda worked around by doing this:

mkdir /mnt/tmp
mount -o bind /tmp /mnt/tmp
mount -o bind /proc /mnt/proc
mount -o bind /sys /mnt/sys
mount -o bind /dev /mnt/dev
cp /etc/os-release /mnt/etc/

Then running nixos-install again, it will still fail to mount /etc, with the following logs:

Almost-working installer logs
building the flake in path:/tmp/etc/nixos?lastModified=1731562727&narHash=sha256
-IVvqimmdY/4mDWtUSVaV11oafe7xUmZQZhe8mphCptM%3D...
installing the boot loader...
remounting /etc...
mktemp: failed to create directory via template ‘/mnt/tmp.HXdCRHTu4x/nixos-etc-m
etadata.XXXXXXXXXX’: No such file or directory
mount: /nix/store/58wzhs5xx51zyghmawm1l5s0kbn3mlwp-etc-metadata.erofs: can't fin
d in /etc/fstab.
mktemp: failed to create directory via template ‘/mnt/tmp.HXdCRHTu4x/nixos-etc.X
XXXXXXXXX’: No such file or directory
mount: bad usage
Try 'mount --help' for more information.
mount: overlay: can't find in /etc/fstab.
Moving mount
Mounting beneath top mount
Success | move-mount.c: 443: main: Invalid number of arguments 1
umount: /etc: not mounted
umount: : no mount point specified.
rmdir: failed to remove '': No such file or directory
Activation script snippet 'etc' failed (1)
Initializing machine ID from random generator.
Created "/efi/EFI".
Created "/efi/EFI/systemd".
Created "/efi/EFI/BOOT".
Created "/efi/loader".
Created "/efi/loader/entries".
Created "/efi/EFI/Linux".
Copied "/nix/store/319jgkg1cbmgz076yq2pxpnxpd526cjg-systemd-256.7/lib/systemd/bo
ot/efi/systemd-bootaa64.efi" to "/efi/EFI/systemd/systemd-bootaa64.efi".
Copied "/nix/store/319jgkg1cbmgz076yq2pxpnxpd526cjg-systemd-256.7/lib/systemd/bo
ot/efi/systemd-bootaa64.efi" to "/efi/EFI/BOOT/BOOTAA64.EFI".
Random seed file /efi/loader/random-seed successfully written (32 bytes).
Successfully initialized system token in EFI variable with 32 bytes.
Created EFI boot entry "Linux Boot Manager".
warning: the group 'nixbld' specified in 'build-users-group' does not exist
remounting /etc...
mktemp: failed to create directory via template ‘/mnt/tmp.HXdCRHTu4x/nixos-etc-m
etadata.XXXXXXXXXX’: No such file or directory
mount: /nix/store/58wzhs5xx51zyghmawm1l5s0kbn3mlwp-etc-metadata.erofs: can't fin
d in /etc/fstab.
mktemp: failed to create directory via template ‘/mnt/tmp.HXdCRHTu4x/nixos-etc.X
XXXXXXXXX’: No such file or directory
mount: bad usage
Try 'mount --help' for more information.
mount: overlay: can't find in /etc/fstab.
Moving mount
Mounting beneath top mount
Success | move-mount.c: 443: main: Invalid number of arguments 1
umount: /etc: not mounted
umount: : no mount point specified.
rmdir: failed to remove '': No such file or directory
Activation script snippet 'etc' failed (1)
remounting /etc...
mktemp: failed to create directory via template ‘/mnt/tmp.HXdCRHTu4x/nixos-etc-m
etadata.XXXXXXXXXX’: No such file or directory
mount: /nix/store/58wzhs5xx51zyghmawm1l5s0kbn3mlwp-etc-metadata.erofs: can't fin
d in /etc/fstab.
mktemp: failed to create directory via template ‘/mnt/tmp.HXdCRHTu4x/nixos-etc.X
XXXXXXXXX’: No such file or directory
mount: bad usage
Try 'mount --help' for more information.
mount: overlay: can't find in /etc/fstab.
Moving mount
Mounting beneath top mount
Success | move-mount.c: 443: main: Invalid number of arguments 1
umount: /etc: not mounted
umount: : no mount point specified.
rmdir: failed to remove '': No such file or directory
Activation script snippet 'etc' failed (1)
setting root password...
passwd: Cannot determine your user name.
Setting a root password failed with the above printed error.
You can set the root password manually by executing `nixos-enter --root '/mnt'` 
and then running `passwd` in the shell of the new system.

It looks like it failed. But if you reboot... it boots fine! (and make sure to rm -rf the fake /mnt/etc that is created by nixos-install)

From a bit of digging looks like the offending script is here, and it's compounded by some other bug that requires /etc/os-release to be present in the mountpoint? I think there is 2 separate bugs here.

@phaer
Copy link
Member

phaer commented Dec 11, 2024

I ran into this as well, a few observations:

  • Locally worked around the mktemp error, with an export TMPDIR=/tmp. (didn't check root cause yet)
  • Encountered another error, because mountpoints for the overlay were missing: mkdir -p /.rw-etc/upper /.rw-etc/work (didn't check root cause yet)
  • Finally ran into move-mount failing, see log below.
  • Notice that nixos-install uses nixos-enter under the hood and that nixos-enter uses a private mount namespace + chrooting for some kind of home-brew containerization.
  • There's a /etc/mtab file created before the etc activation phase. I believe that's for grub only and shouldn't cause issues here, but it's worth noting as it could interfere with some of the mounting logic here iiuc.

Not sure yet, but I suspect that the move-mount logic might not work as expected in the mount namespace? Happy on hints & tricks on how to check that effectively.

I think this is an important issue as I think it must affect at least everyone who's using etc.overlay.enable & nixos-install in tandem? It's not clear to me whether our custom code here works properly and whether it's worth it to avoid a dependency on composefs.

etc activation log (failing move-mount)
remounting /etc...
++ mktemp --directory -t nixos-etc-metadata.XXXXXXXXXX
+ tmpMetadataMount=/tmp/nixos-etc-metadata.Dg2aHgRWMX
+ mount --type erofs /nix/store/si8r2zvyfbsf0z81n3crszq8mjycd1sy-etc-metadata.erofs /tmp/nixos-etc-metadata.Dg2aHgRWMX
++ mktemp --directory -t nixos-etc.XXXXXXXXXX
+ tmpEtcMount=/tmp/nixos-etc.bh0AAE2Jh1
+ mount --bind --make-private /tmp/nixos-etc.bh0AAE2Jh1 /tmp/nixos-etc.bh0AAE2Jh1
+ mount --type overlay overlay --options lowerdir=/tmp/nixos-etc-metadata.Dg2aHgRWMX::/nix/store/jv2lvcn9083pdm2bnjb63a7chc405gm9-etc-lowerdir,relatime,redirect_dir=on,metacopy=on,upperdir=/.rw-etc/upper,workdir=/.rw-etc/work /tmp/nixos-etc.bh0AAE2Jh1
+ findmnt /etc --submounts --list --noheading --kernel --output TARGET
+ read -r mountPoint
+ /nix/store/ij0iq1jfp4zzhp32n9fh49cgns4p204x-move-mount-beneath-unstable-2023-11-26/bin/move-mount --move --beneath /tmp/nixos-etc.bh0AAE2Jh1 /etc
Moving mount
Mounting beneath top mount
Invalid argument | move-mount.c: 553: main: move_mount
Attaching mount /tmp/nixos-etc.bh0AAE2Jh1 -> /etc
Moving single attached mount
++ _status=1
++ _localstatus=1
+ umount --lazy --recursive /etc
umount: /etc: not mounted
++ _status=1
++ _localstatus=1
+ umount --lazy /tmp/nixos-etc.bh0AAE2Jh1
+ rmdir /tmp/nixos-etc.bh0AAE2Jh1
rmdir: failed to remove '/tmp/nixos-etc.bh0AAE2Jh1': Device or resource busy

@phaer
Copy link
Member

phaer commented Dec 11, 2024

@andre4ik3 I think your example works because you are using EFI boot with systemd-boot which doesn't require /etc for installation. So it succeeds to install your NixOS closure and bootloader. It then fails to complete activation and do things like setting a root password.
If you don't need that activation you still end up with a bootable system and should be then be able to rebuild that system without running into this issue again.

It's still blocking installation for users who i.e. boot with grub or require the activation during installation to succeed for some reason (such as setting a root password)

@andre4ik3
Copy link
Member

@phaer I was able to hack around this in the installer, however I think the changes are quite invasive (adding environment variables and command line flags to nixos-enter and nixos-install) and so I didn't make a PR. And if GRUB needs files in /etc it wouldn't work anyway, as my "solution" is to avoid touching /etc at all (neither creating files nor attempting to mount it) during installation if a special flag is passed (--no-etc-create). But for my use cases it works quite nicely:

master...andre4ik3:nixpkgs:andre4ik3-fix-nixos-install-overlay

I think the "proper" solution would be to split installation of bootloader from system activation? Then the activation scripts wouldn't need to run until the first boot, but the bootloader would still be installed.

@r-vdp
Copy link
Contributor

r-vdp commented Dec 11, 2024

I proposed a potential fix in #364239

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants