-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 20.04 Root on ZFS: Mistake in how bpool is configured causes boot to get garbled, ends up with unbootable machine #54
Comments
This makes it easier to follow (and specifically, harder to miss the second step). Reported-by: Niall Douglas <ned14> Issue #54
This is intentional. Note that
That's not good, and I can see how that set off a cascade of problems resulting in an unbootable server, but without knowing the root cause of this, it's hard to know what to do next.
I could, but without more detail on what happened, it's really hard to say how many times a person should reboot. And that's really not a great step to write into instructions... "Hey, reboot a few times because this might intermittently break." That's scary and not particularly actionable. If there are intermittent problems that are reproducible enough that such an instruction would help, I'd like to just fix them.
When I read the email from this issue on my phone, my thought was, "Didn't I already do that?" I'm looking at the HOWTO now, and I think I see what you're asking for. I've flattened two steps and the associated notes:
Great news on that front. "The AES-GCM patches [to userspace] were applied to zfs 0.8.3-1ubuntu12.1" That is in focal-updates, which currently has 0.8.3-1ubuntu12.4. The userspace side is primarily making There was a PAM module merged upstream, but that's not in 20.04 and I haven't tested it: openzfs/zfs@221e670 The Ubuntu folks are (or were, last we talked) working on encryption integration. You mentioned a remote server, so the dropbear SSH support might be interesting to you: #46 (comment) |
Ah, you're right. That can't be the cause then.
Ok, my next idea for the cause then is that there is some sort of race in the ZFS dataset mounting. /etc/zfs/zfs-list.cache/bpool and /etc/zfs/zfs-list.cache/rpool say what order to mount right? Could this from my rpool cache be the cause?
Y'see, if the mounting process mounts bpool and rpool concurrently, then there is a race on whether /boot/userdata.key is available by the time that rpool/USERDATA gets mounted. If it can't find the key, the whole mounting process would surely abort right there. If sufficient datasets have indeed mounted to allow the system to boot, it comes up, next time we touch /root or /home/ned it could then be automounting those. But the remainder of the mounting session i.e. the entries in /etc/fstab never get mounted. Thus /boot/efi never gets mounted. Then grub install goes to the wrong place. Does this sound plausible to you?
Cool, thanks.
Mine is zfs-0.8.3-1ubuntu12.4 and kernel 5.4.0-47-generic using aes-128-gcm. I see an 8x performance loss using encryption. This is for a 1.7Ghz Intel Atom with AES-NI. More suspiciously, I get identically slow results for any crypto algorithm. It could just be the Atom CPU of course, they have unusual bottlenecks.
You can replicate that same right now using https://talldanestale.dk/2020/04/06/zfs-and-homedir-encryption/. It doesn't support SSH key authentication obviously because the dataset is only mounted when the password is checked.
Ultimately for a remote server I only care about the server dying suddenly, and the cheapo hosting provider failing to securely wipe the drive before it goes onto ebay. You don't need secure crypto for this, just obfuscation to defeat the automated scanner programs ebay buyers use to hunt for personal info, credit cards etc. For that, a global static crypto key is just fine, though I had rather not wanted to put it into the root directory with the suffix I'll close this issue now as I think your HOWTO is no longer the cause. Thanks for the useful response. |
They are involved, but there's relevant indirection. Those are cache files which are read by a systemd mount generator There have been quite a few changes to the zfs-mount-generator recently. It's hard for me to keep track in my head which ones have landed in 20.04. There are definitely things that haven't landed that could be relevant. You might want to grab OpenZFS from git You might try something like this too, which avoids the need to
That should give you a working zfs-mount-generator in the tree, then use that and the system one to generate mounts, then show you the difference between them. If the changes look sane, try running with the newer mount generator: Putting the mount generator in |
That was useful. I traced through all the files, building up a dependency graph. There is no dependency chain between the dataset needing the key from /boot upon the /boot dataset. So I guess it's reasonable that both trees could be executed in any order, or concurrently. I fixed this by moving the key into the root directory. I did check the systemd journal for dependency loops long before filing this issue. It complains about cryptsetup, but that's totally unrelated to this (cryptsetup cannot determine the root mount if it's ZFS on 20.04, so it adds an unnecessary entry. They've fixed it upstream. The cycle gets broken during boot in non-harmful way). There were no other entries about dependency loops. Thanks for your help on this. Here's hoping these servers stay up longer this time! |
@rlaager
Firstly thank you SO MUCH for the Ubuntu 20.04 Root on ZFS HOWTO. Using qemu, last month I installed a Ubuntu 20.04 root on ZFS on two budget Intel Atom dedicated servers from their rescue boot, they work surprisingly well, considering.
One of the servers two days ago suddenly vanished however. It took some effort to figure out why, but I narrowed it down to a problem in the current HOWTO. I used the HOWTO from August, so post the current Erratum.
Right now, you say:
Note that two separate datasets both have a mountpoint of /boot.
Now, I'm not sure exactly how it happened, but I believe that on some boot or other of that server, the boot mounting service got confused, and didn't mount /boot. The system booted just fine though. However, when unattended-upgrades ran at some point, it called update-grub, that installed into the root ZFS pool which has flags grub can't parse, and BOOM bye bye server.
So, firstly can I suggest to not set the same mountpoint on two datasets?
Secondly, can I suggest that you recommend in the guide that people reboot a few times and make SURE that /boot, /boot/efi, and /boost/efi/grub are all coming up every time?
Finally, thirdly, I got badly caught out first install by missing the step which makes space for the MBR grub. May I suggest that you fuse the EFI and MBR partitioning instructions into one set which works fine for both schemes, so it's a single config right up to when you choose to install UEFI or MBR grub, and that's the only difference?
Thanks once again for the instructions, and taking all that time to write and maintain them. Indeed, if ZFS native encryption on the ZFS in Ubuntu 20.04 weren't so slow, I'd recommend a lot more hard key coded encryption, plus PAM-unlocked home drive encryption, so if your remote server ever dies, you don't leak all your secrets. However, ZFS native encryption is indeed very very slow in Ubuntu 20.04. It gets very much faster if you chose the gcm variant in future ZFSs.
The text was updated successfully, but these errors were encountered: