Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub efi fails on latest nixos-unstable #61718

Closed
ghost opened this issue May 19, 2019 · 48 comments
Closed

grub efi fails on latest nixos-unstable #61718

ghost opened this issue May 19, 2019 · 48 comments
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 1.severity: blocker This is preventing another PR or issue from being completed 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Milestone

Comments

@ghost
Copy link

ghost commented May 19, 2019

Issue description

After upgrading bootloader breaks with

error: symbol `grub_file_filters' not found
Entering rescue mode

Steps to reproduce

Technical details

  • system: "x86_64-linux"
  • host os: Linux 4.19.44, NixOS, 19.09.git.2439b30 (Loris)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.2.2
@TomSmeets
Copy link
Contributor

TomSmeets commented May 20, 2019

Hi gnidorah,
I recently had the same issue.

A nixos-rebuild switch failed with some error about a full filesystem.
In my case the cause was a full efivars partition. This left me a broken boot like you have.

See #27821 for more info.

This is how I restored my system:

  • Put the NixOS installer on a usb stick
  • Boot into the NixOS installer
  • rm /sys/firmware/efi/efivars/dump-* (don't delete anything else in that directory)
  • Reboot again into the NixOS installer
  • Mount your partitions to the correct locations in /mnt/
  • nixos-enter
  • echo 'nameserver 8.8.8.8' >> /etc/resolv.conf to fix network access.
  • nixos-rebuild boot --install-bootloader (install-bootloader might be optional, no idea)
  • This command should complete successfully.
  • reboot

This is all I had to do. However, for now I also disabled boot.loader.efi.canTouchEfiVariables which was initially set by nixos-generate-config. I have no idea what this setting does when enabled. It might be related.

I hope this helps,
Tom

@ghost
Copy link
Author

ghost commented May 20, 2019

@TomSmeets Unfortunately its not the case. My config is the following:

  fileSystems."/boot/efi" =
    { device = "/dev/disk/by-uuid/58D0-2B0F";
      fsType = "vfat";
      options = [ "defaults,noauto" ];
    };
  boot.loader.efi.efiSysMountPoint = "/boot/efi";
  boot.loader.grub = {
    efiSupport = true;
    device = "nodev";
  };

So fat partition only stores

/boot/efi
└── EFI
    ├── BOOT
    │   └── BOOTX64.EFI
    └── grub
        └── grubx64.efi

3 directories, 2 files

While system partition (/boot folder) stores everything else so I never run into #23926
It worked great for year or so until I got above issue. The latest nixos-unstable that currently works for me is bc94dcf Perhaps its a time for bisect 😞

@ghost
Copy link
Author

ghost commented May 20, 2019

Also a small hint. If you use recent NixOS installation usb stick, then you could boot from it, choose refind in menu, then choose bootx64 entry it will load directry to your nixos installation, so there is no real need for nixos-enter tricks

@ghost
Copy link
Author

ghost commented May 23, 2019

Done bisect. This is the commit that broke my layout:
df4d0fa
grub: 2.02 -> 2.04-rc1
cc @volth @NeQuissimus

@aakropotkin
Copy link
Contributor

I have this error as well. I had to boot from a stick and rescue my system about a week ago.
I just tried today to update channels and the exact same bug popped up. Luckily I haven't shut down my machine yet. Did anybody find a fix?

@aakropotkin
Copy link
Contributor

Follow up:
I accidentally updated yet again and had to go through the annoying rescue process another time.
I finally just cut grub out altogether and now it boots fine.
I'm sad though, I had a nicely customized grub loader that I am going to dearly miss :(

If you run into this issue in the future follow these steps:

  1. Boot with a live USB
  2. Mount as if you were installing nixos:
# Change mounts to your actual `nixos` and `boot` partitions.
mount /dev/sda1 /mnt
mount /dev/sda2 /mnt/boot
vim /mnt/etc/nixos/configuration.nix
  1. Remove grub and fall back to the minimal loader.
    My new boot.loader looks like:
boot.loader = {
    systemd-boot.enable = true;
    efi.canTouchEfiVariables = true;
    efi.efiSysMountPoint = "/boot";
};
  1. nixos-install --root /mnt
  2. Go get a coffee.

@ghost
Copy link
Author

ghost commented Jun 14, 2019

@BadDecisionsAlex
There is no need in steps 2,4 if you're booting using unstable live USB

Remove grub and fall back to the minimal loader

Sorry, but no. I want to keep EFI partition as small as possible and I also want to run garbage collection as seldom as possible. For now I just reverted commit df4d0fa locally

TBH I don't understand why we are pulling release candidates for such critical components as boot loaders.

@aakropotkin
Copy link
Contributor

@gnidorah I completely agree that our repo should roll back.

If you have other notes about my rescue process please let me know. I was just hoping to leave breadcrumbs for anybody else who bumps into the issue; but I am by no means an expert here.

@ghost
Copy link
Author

ghost commented Jul 12, 2019

@volth Once there, I will test it using following configuration https://nixos.wiki/wiki/Bootloader#Keeping_kernels.2Finitrd_on_the_main_partition

@ghost
Copy link
Author

ghost commented Jul 15, 2019

Tried grub 2.04 locally and it didn't work too.

I'm leaving this issue open, but since there is now a solution for out of space problem for systemd-boot #23926 (comment) I've switched to that

@worldofpeace worldofpeace added this to the 19.09 milestone Aug 28, 2019
@worldofpeace worldofpeace added 0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 1.severity: blocker This is preventing another PR or issue from being completed 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS labels Aug 28, 2019
@obadz
Copy link
Contributor

obadz commented Sep 9, 2019

I use grub on EFI and can't repro this bug.

My boot.loader.efi.canTouchEfiVariables is set to false.

@amahoneyLIT
Copy link

I just did a fresh install on unstable (on ZFS root).
@obadz setting boot.loader.efi.canTouchEfiVariables to false initially resulted in no boot devices.

But essentially what was on the wiki worked:

  boot.supportedFilesystems = [ "zfs" ];
  boot.tmpOnTmpfs = true;
  boot.loader.systemd-boot.enable = false;
  boot.loader.efi.canTouchEfiVariables = true;
  boot.loader.efi.efiSysMountPoint = "/boot/efi";
  boot.loader.grub.efiSupport = true;
  boot.loader.grub.device = "nodev";

partition scheme

DISK=/dev/disk/by-id/<my disk>
sgdisk --zap-all $DISK
sgdisk -n2:1M:+512M -t2:EF00 $DISK
sgdisk -n1:0:0 -t1:BF01 $DISK

mkfs.vfat $DISK-part2

zpool create ...

And boot with grub efi seems to work fine

@jacereda
Copy link
Contributor

I switched yesterday from 19.03 to the 19.09-release channel on my tablet and the boot now complains with the same message. This is one of those x64 tablets that has a i686 EFI and thus requires boot.loader.grub.forcei686.

In the meantime, I'm booting via 32-bit USB grub with

configfile (hd1,3)/grub/grub.cfg

I tried the rm /sys/firmware/efi/efivars/dump-*, same result.

What does grub 2.04 offer? Couldn't it just be rolled back to 2.02 in the release-19.09 branch?

@jacereda
Copy link
Contributor

Same problem on debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931896

@jacereda
Copy link
Contributor

jacereda commented Sep 22, 2019

I fixed my boot by just moving away all files in /boot and nixos-rebuild --install-bootloader switch.

@asymmetric
Copy link
Contributor

asymmetric commented Sep 25, 2019

I just had the same issue after switching to 19.09 and switching back to 19.03.

Timeline:

  • nix-channel --remove nixos
  • nix-channel add https://nixos.org/channels/nixos-19.09 nixos
  • nixos-rebuild switch
  • reboot now
  • booted fine, but there was an unrelated problem, so decided to rollback
  • nixos-rebuild switch --rollback
  • reboot now
  • got the error: symbol 'grub_file_filters' not found" error

I don't use EFI.

Solved it by reinstalling the bootloader with a usb stick.

@lheckemann
Copy link
Member

In order to avoid breaking boots on the 19.09 release, I'm in favour of rolling back to 2.02 on 19.09 before the release (but leaving 2.04 on master).

I'm unsure how to address this problem in general. A naive solution based on my incomplete understanding of the issue seems to me to have versioning for grub's module directories, e.g. have /boot/grub-2.02/x86_64-efi and /boot/grub-2.04/x86_64-efi distinct so that both versions of GRUB can work. Overall though, this seems like another case of bootloaders being hard to upgrade, a problem that @samueldr has been thinking about a fair bit iirc. Maybe you can say something about this?

lheckemann added a commit that referenced this issue Oct 4, 2019
This reverts commit 8ba94a8.

See #61718 for rationale.
lheckemann added a commit that referenced this issue Oct 4, 2019
This reverts commit df4d0fa.

See #61718 for rationale.
@lheckemann
Copy link
Member

Reverted to 2.02: 4eb9725, 862f05c

@zeratax
Copy link
Contributor

zeratax commented Apr 8, 2020

just tried again to upgrade to unstable and again get the grub_file_filters not found error

$ nixos-version
19.09.2370.e10c65cdb35 (Loris)
$ sudo nix-channel --add https://nixos.org/channels/nixos-unstable nixos
$ sudo nixos-rebuild switch --upgrade
$ reboot

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/go-no-go-meeting-nixos-20-03-markhor/6495/19

@davidak
Copy link
Member

davidak commented Apr 9, 2020

That was probably already discussed somewhere, but:

Is there any problem with rolling back to 2.02 again? That version works, right?

elementary OS actually still uses 2.02 (5.1.2 release from 2020-02-07) also Fedora 31 from 2019-10-29, but even the two last debian releases use 2.04. but having a broken bootloader is worse than using older versions than debian

We could also implement this logic

IF efiSupport = true; THEN package 2.02 ELSE 2.04

@lheckemann
Copy link
Member

True, or it could be based on stateVersion.

infinisil pushed a commit to infinisil/nixpkgs that referenced this issue Apr 21, 2020
This reverts commit 8ba94a8.

See NixOS#61718 for rationale.

(cherry picked from commit 4eb9725)
infinisil pushed a commit to infinisil/nixpkgs that referenced this issue Apr 21, 2020
This reverts commit df4d0fa.

See NixOS#61718 for rationale.

(cherry picked from commit 862f05c)
@emptyflask
Copy link
Contributor

I just upgraded 19.09 -> 20.03 this morning and ran into this issue. I attempted to reinstall the bootloader using the little script on https://nixos.wiki/wiki/Bootloader, but no luck.

Here's my relevant config:
https://gist.github.com/emptyflask/7aa04f800321c2483574f8985e26bea0

@FRidh FRidh modified the milestones: 20.03, 20.09 Apr 21, 2020
@worldofpeace worldofpeace added the 1.severity: blocker This is preventing another PR or issue from being completed label Apr 21, 2020
@emptyflask
Copy link
Contributor

Still trying to restore my system...

I noticed that /boot/EFI/NixOS-boot/grubx64.efi hasn't been touched -- it still has a timestamp of Mar 15 2019. Shouldn't that be replaced?

@primeos
Copy link
Member

primeos commented Apr 21, 2020

@emptyflask in case it helps:

Workaround for booting with GRUB 2.04:

  • Boot the NixOS installation image (or something else that has/is rEFInd) via e.g. an USB stick
  • Select Boot Fallback boot loader from EFI
  • Now you should see the usual GRUB boot menu (with all NixOS generations, etc.)
  • After a successful boot: Reverting to GRUB 2.02 should permanently fix the problem(?)

At least that worked for me (but: This is from my memory and online screenshots, therefore I might have missed some steps). I think these steps where posted in a related issue, but I didn't find the link again :o

Regarding the GRUB regression (2.02 -> 2.04)

The issue is most likely hardware related. E.g. in my case Boot Fallback boot loader from EFI should boot the same EFI binary as without rEFInd (but I forgot to verify that last time). So this might actually be some weird issue during the transition to GRUB during the boot process (and therefore only affecting some devices).

@emptyflask
Copy link
Contributor

That actually does help, I was able to boot into my normal system using rEFInd. Thanks!

@domenkozar
Copy link
Member

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931896#10 offers some explanation

@asymmetric
Copy link
Contributor

asymmetric commented Apr 22, 2020

So to summarize the issue above, there are 3 components:

  • the firmware (BIOS or UEFI)
  • the grub core image
  • grub modules

The issue boils down to your firmware and your OS disagreeing about where the grub core image is.

The firmware points to the grub core image. The core image loads modules. The interface between core and modules is not stable.

If your OS is installing core image and modules to location A, but your firmware is loading the core image from location B, then at some point the (old, non-updated) core image at location B is not able to load the (new, updated) modules at location A.

The problem has always been there (mismatched core and modules), but you hadn't noticed until the interface between the two broke.

@emptyflask
Copy link
Contributor

emptyflask commented Apr 22, 2020

It does appear that there might be some duplication in /boot, maybe from attempting to use boot.loader.grub.efiInstallAsRemovable? I'm particularly wondering about /boot/EFI/NixOS-boot/grubx64.efi, nothing seems to touch it, and it's got a timestamp from a year ago.

(I've omitted a bunch of modules and Microsoft-related things)

.
├── background.png
├── converted-font.pf2
├── EFI
│   ├── Boot
│   │   └── bootx64.efi
│   ├── grub
│   │   └── grubx64.efi
│   ├── Microsoft
│   │   └── Boot
│   ├── nixos
│   │   ├── 09iivcwr1b2ijxxa4z7bcnpfjwq9cap7-initrd-linux-4.19.101-initrd.efi
│   │   ├── 502dhxra32pxv99zm63m2si006bdaijc-linux-4.19.109-bzImage.efi
│   │   ├── aqdfl1zp8d767nng5w3wh3wv54npjb8y-initrd-linux-5.4.33-initrd.efi
│   │   ├── dpp68ayvn1xmx9d8wld1fjp8ax6lz2k5-initrd-linux-4.19.109-initrd.efi
│   │   ├── h9w801h7y09a315xi7x4pskpn8y4i7xf-initrd-linux-4.19.113-initrd.efi
│   │   ├── ia4zbwrkcigbiil3vhhfwjji0gn7m9yr-linux-4.19.96-bzImage.efi
│   │   ├── jg9gfc84svh76nkj2am2jxq977bqd2k8-linux-4.19.113-bzImage.efi
│   │   ├── k7f7l104af1ny3sliwpxybzf6dy5060l-linux-4.19.101-bzImage.efi
│   │   ├── l3389n3q8cas7z5ybbwga251hwx9m6gv-initrd-linux-5.4.33-initrd.efi
│   │   ├── li3v6p9mqsspm0zgglzba2szab2zdmiv-linux-5.4.33-bzImage.efi
│   │   ├── w8yq408lfmszipyjxl9swbajjmsmkyza-initrd-linux-4.19.113-initrd.efi
│   │   └── xmknk1cjh8kgpl6j1n95a7b2fhmk7saz-initrd-linux-4.19.96-initrd.efi
│   └── NixOS-boot
│       └── grubx64.efi
├── grub
│   ├── fonts
│   ├── grub.cfg
│   ├── grubenv
│   ├── locale
│   ├── state
│   └── x86_64-efi
│       ├── core.efi
│       └── grub.efi
├── kernels
│   ├── aqdfl1zp8d767nng5w3wh3wv54npjb8y-initrd-linux-5.4.33-initrd
│   ├── jg9gfc84svh76nkj2am2jxq977bqd2k8-linux-4.19.113-bzImage
│   ├── l3389n3q8cas7z5ybbwga251hwx9m6gv-initrd-linux-5.4.33-initrd
│   ├── li3v6p9mqsspm0zgglzba2szab2zdmiv-linux-5.4.33-bzImage
│   └── w8yq408lfmszipyjxl9swbajjmsmkyza-initrd-linux-4.19.113-initrd
├── loader
│   ├── entries
│   │   └── nixos-generation-154.conf
│   └── loader.conf
└── System Volume Information

@lheckemann
Copy link
Member

If you're not dual-booting, it should be safe to remove all of /boot (keep a backup in order to be able to reproduce the error again), then rerun nixos-rebuild boot. AFAIU, that should either (a) make everything work correctly, since the grub image will only be in one place or (b) break your boot (have your rEFInd USB stick at the ready!). The latter should only happen if either (i) both canTouchEfiVariables and efiInstallAsRemovable are set to false or (ii) your firmware is broken (disregards the boot order specified by the OS) or (iii) your firmware is not configured to use the fallback path (bootx64.efi). In any case, I'd consider removing the bad state as the right solution here.

@joriatyBen
Copy link

joriatyBen commented Apr 25, 2020

@lheckemann I removed all of /boot and then nixos-rebuild boot. I am not able to boot anymore without usb. If boot with usb i stuck at grub (GNU GRUB version 2.04 Minimal BASH-line editing is supported. ...).
At this point I tried:
grub> set root=(hd1,gpt1)
grub> linux /efi/nixos/hash-linux-<number>-bzImage.efi .... root=LABEL=NIXOS_ISO (i previously set this label to my usb)
grub> initrd /efi/nixos/initrd-linux-<number> ....
grub> boot
I end up at this problem #6265:
like

timed out waiting for device /dev/root, trying to mount anyway.
mounting /dev/root on /iso...
mount: mounting /dev/root on /mnt-root/iso failed: No such file or directory

An error occurred in stage 1 of the boot process, which must mount the root filesystem on /mnt-root' and then start stage 2. Press one of the following keys:
  r) to reboot immediately
  *) to ignore the error and continue

If I continue i end up in kernel panic.
Any ideas?

Update1: Found an old USB with nixos-19.03 with GRUB 2.02. Seems like the installer is working. rEFInd aint working. This means i have to totally fresh install nixos to upgrade it then to 20.03. which is not the optimal option, but ok! Keep you updated.

Update2: This actually could be an option too: https://gist.github.com/chris-martin/4ead9b0acbd2e3ce084576ee06961000

@lheckemann
Copy link
Member

lheckemann commented Apr 25, 2020

@saggzz sorry about that! Do you have either or both of canTouchEfiVariables or efiInstallAsRemovable set?

As for reinstalling: you don't have to do a totally fresh install. You can boot the installation ISO, mount the filesystems, and run nixos-install. It will rebuild the system profile and set up the bootloader etc. without wiping anything. If you pass -I nixpkgs=channel:nixos-20.03 it should directly install NixOS 20.03, even from the 19.03 installer image. The one thing you may need to do after installation is make sure that the channel is set correctly (sudo nix-channel --add https://nixos.org/channels/nixos-20.03 nixos && sudo nix-channel --update) so that you don't accidentally downgrade later.

EDIT: you can also get a shell in the initramfs to try and mount your root filesystem manually by passing boot.shell_on_fail on the kernel command line. Then you can try and enter the system by pressing f at the prompt and then using the following commands:

mount /dev/sda2 /mnt-root # substitute device name as appropriate
exec switch_root /mnt-root /nix/var/nix/profiles/system/init

@asymmetric
Copy link
Contributor

@saggzz when I had problems with GRUB, I followed the steps here to re-install.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-grub-doesnt-see-windows/6811/4

@joriatyBen
Copy link

@lheckemann thank you for the hints. Either canTouchEfiVariables nor efiInstallAsRemovable had been set. I tried to nixos-install and mount the filesystems, but no filesystems were found. This brought me to the conclusion that something was totally messed up there. I fresh installed nixos und used the -I nixpkgs=channel:nixos-20.03 option - worked!
@asymmetric thank you, i switched to systemd-boot for now!

@zeratax
Copy link
Contributor

zeratax commented Apr 27, 2020

so I finally now have it booting normally and all I did afaik is change efiInstallAsRemovable to true
and canTouchEfiVariables to false instead of the inverse.
Seems like I can perfectly dual boot now and I'm on 20.03

Oh I also changed the efi mount point around and I think that may have cleaned up the efi directory, though it's now back to where I started.

@AleXoundOS
Copy link
Contributor

I do confirm that grub2 2.04 boots NixOS in the following scenario:

  • nixos-version: 19.09.1320.4ad6f1404a8
  • {
      boot.loader.systemd-boot.enable = false;
      boot.loader.efi.efiSysMountPoint = "/boot/efi";
      boot.loader.efi.canTouchEfiVariables = false;
      boot.loader.grub.efiInstallAsRemovable = true;
      boot.loader.grub.device = "nodev";
    }
  • zstd btrfs compression affected files: init, initrd
  • packageOverrides: grub2 from unstable

Notably, I was unable to boot with grub2 2.02 until adding the override. zstd compression support appeared in grub2 2.04.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/good-filesystem-for-the-nix-store/3566/10

@maxdevjs
Copy link

Today I got this upgrading to 20.03. I tried a couple times following @lheckemann steps:

  • boot the installation ISO
  • mount the filesystems
  • run nixos-install

with no success until I removed the content of /boot(recreating it anew).

Does this kind of error happen also in unstable channels or it is solved there?

@lheckemann
Copy link
Member

Glad to hear removing all of /boot helped. This isn't fixed anywhere because it's a bit difficult to detect — on EFI systems, where it's most likely to occur, we could check the BootCurrent EFI variable and the corresponding boot entry, then throw a warning if it doesn't match the current bootloader installation path. Someone™ would need to implement that though :)

@maxdevjs
Copy link

maxdevjs commented Jun 30, 2020

Thank you again @lheckemann

In fact, after recreating it, I had to fix it manually (it created EFI/EFI/etc). Not sure if this is strictly related to the issue or I just messed it during attempts :)

@ghost ghost closed this as completed Nov 16, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 1.severity: blocker This is preventing another PR or issue from being completed 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests