Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pools do no autoimport on fedora 19 with encrypted disks #1474

Closed
Firstyear opened this issue May 26, 2013 · 18 comments
Closed

Pools do no autoimport on fedora 19 with encrypted disks #1474

Firstyear opened this issue May 26, 2013 · 18 comments
Labels
Type: Documentation Indicates a requested change to the documentation
Milestone

Comments

@Firstyear
Copy link

I have a zpool with the following configuration:

pool: franky_home
state: ONLINE
scan: none requested
config:

NAME                  STATE     READ WRITE CKSUM
franky_home           ONLINE       0     0     0
  luks-sdb1           ONLINE       0     0     0
cache
  fedora-l2arc_cache  ONLINE       0     0     0

errors: No known data errors

Both luks-sdb1 and fedora-l2arc_cache are on luksFormatted devices. The latter is (for now) on lvm also.

When attempting to boot the system, the zpool is never imported. However, when running zpool import, it is not displayed. Running zpool import franky_home says "the pool is already imported".

Exporting the pool before reboot, upon the system booting, the pool is again, not imported, but this time, is listed in zpool import.

I have checked that the zfs and spl modules are in my dracut, and that they are built properly. This issue may be an issue in combination with the encrypted devices.

From dmesg you can see:
[ 2.861504] sd 0:0:0:0: [sda] Attached SCSI disk
[ 3.205946] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 15.873195] ZFS: Loaded module v0.6.1-1, ZFS pool version 5000, ZFS filesystem version 5

Leading me to suspect perhaps that ZFS is loaded after the password prompt and after the disks appear, yet the pool is still not auto imported?

Any ideas?

@behlendorf
Copy link
Contributor

@Firstyear It sounds like your pool is being correctly imported just not mounted. The zpool import command with no arguments will show the available pools which can be imported, if the pool is already imported it will not appear. You can list which pools are currently imported using zpool list.

-bash-4.1$ sudo ./cmd/zpool/zpool list  
no pools available
-bash-4.1$ sudo ./cmd/zpool/zpool import
   pool: hybrid
     id: 3809657651772709830
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        hybrid      ONLINE
          mirror-0  ONLINE
            ssd-1   ONLINE
            ssd-2   ONLINE
            ssd-3   ONLINE
            ssd-4   ONLINE
-bash-4.1$ sudo ./cmd/zpool/zpool import hybrid
-bash-4.1$ sudo ./cmd/zpool/zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
hybrid  55.5G  48.0G  7.49G    86%  1.00x  ONLINE  -

@Firstyear
Copy link
Author

This sounds correct. When I run zpool list I can see the pool, and zpool import shows it to already be imported. Yet the zfs filesystems do no mount.

@ryao
Copy link
Contributor

ryao commented Jun 8, 2013

Does zfs mount -a make them mount?

@bill-mcgonigle
Copy link

I just built packages yesterday from git and this appears to now work for me on Fedora 19 (3.10). Previously I was running rc.local compatibility and doing export/import to get them to mount. @Firstyear ?

@bill-mcgonigle
Copy link

I should add that I also have the same mounting problem on FC16/17, and EL6 (no LUKS on the EL6 machine). I haven't yet tested git on those (EL6 is all I care about from that set). F18 is a disaster from my perspective, but that's still the most current in the zfsonlinux.org repo, so WE might care. I was assuming that this was fixed by the zvol correctness patch rather than 3.10 but I'm not sure.

@behlendorf
Copy link
Contributor

@bill-mcgonigle @Firstyear Is this still an issue or does it work as expected with the latest source from git?

@Rudd-O
Copy link
Contributor

Rudd-O commented Sep 2, 2013

My tree has a fix fr this. In sum what happens is that udev rules cause the zfs module to be imported early, before mount-zfs.sh, and at that time the LUKS devices have not been set up yet, so ZFS pretends that the pool is not importable. My tree fixes it by loading the module early wwith zfs_autoimport_disable, which means that by the time mount-zfs.sh runs, all devices are available for a proper pool import. I just literally fixed this problem an hour ago.

Another fix that must go in is that F19 makes host only initrds, which causes crypt not to get loaded because dracut does not know about the physical devices backing the root pool. My tree fixes it by munging the variables used by crypt to determine if a crypto_LUKS device is backing the root pool.

Go check it out. It's working beautifully in F19 and F17. No need to force pool imports, none of that shit anymore.

@Rudd-O
Copy link
Contributor

Rudd-O commented Sep 2, 2013

My tree works in 17 18 and 19. I dropped support for F16 a while ago.

@behlendorf
Copy link
Contributor

Can you point us to the patches. Would changing the default zfs_autoimport_disable setting get us most of the way there?

@Rudd-O
Copy link
Contributor

Rudd-O commented Sep 4, 2013

The autoimport_disable being 0 (autoimport true) is highly toxic. But changing it would not get you most of the way there. I did a shit ton of work (three days straight) to get the use case of OP working in my home.

My fork of ZFS with the Dracut stuff automatically disables autoimport in all cases. Then it does whatever the old Dracut code would do, EXCEPT if the initrd was generated with the hostonly option (default in F19), in which case it cooperates with systemd to initialize the pools only after the component devices have appeared.

I mostly did this work because (a) systemd inside initrd was waiting for my swap device (which lives in ZFS) and failing to boot, because the pool never got to be imported since it was waiting for the swap device (b) a pool with many devices was being imported with only a few devices because autoimport took place before the devices were ready.

I have given up on non-forced imports so my code essentially does forced imports all the time now.

You can run a diff between your tree and mine fairly easily. There are quite a few differences these days. The code is incredibly complex and messy, but works. Not my finest hour, but fuck shell scripting.

@Rudd-O
Copy link
Contributor

Rudd-O commented Sep 18, 2013

Also, importing pools in USB drives with an initrd that was NOT generated as hostonly (with systemd inside) is also bound to fail, because USB drives take some time to appear and the initrd does not wait until all have been discovered and enumerated to proceed to the mount phase. I believe in this case, it is necessary to run within the initrd the same code I run today (which generates systemd units with a generator) to generate initqueue wait for device events for each device in the pool. I just cannot be busied with that at the moment cos I have no time and it "works for me" at the moment.

@FransUrbo
Copy link
Contributor

@Firstyear Is this still a problem, or could we close this issue?

@Rudd-O
Copy link
Contributor

Rudd-O commented Jun 14, 2014

@Firstyear can you try my fork of the ZFS code, specifically the zfs-dracut package? I got my systems importing fine, whether or not the initrd is a hostonly initrd.

@twosouls2gether
Copy link

This IS STILL A PROBLEM last I checked. I seem to be running git version 287_g2024041 from the rpm's filename. I had to compile from git source since my Fedora 20 x64 kernel was too new. I've had to change the systemd scripts since Fedora 18 I believe.

I changed zfs-import-scan.service to...
[Unit]
Description=Import ZFS pools by device scanning
DefaultDependencies=no
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=cryptsetup.target
ConditionPathExists=!/etc/zfs/zpool.cache

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -d /dev/disk/by-id -aN

and zfs-import-cache.service to
[Unit]
Description=Import ZFS pools by cache file
DefaultDependencies=no
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=cryptsetup.target
ConditionPathExists=/etc/zfs/zpool.cache

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN

and zfs-mount.service to

[Unit]
Description=Mount ZFS filesystems
DefaultDependencies=no
Wants=zfs-import-cache.service
Wants=zfs-import-scan.service
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=zfs-import-cache.service
After=zfs-import-scan.service
After=cryptsetup.target
Before=local-fs.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs mount -a

NOTE: I'm not a systemd expert, but I believe I only added the "After=cryptsetup.target"
dkms wasn't working for me several versions of Fedora ago so I used to compile my zfs modules by hand, and it would always not mount my pool after rebooting because it overwrote those systemd custom files. Seems to not do that now that I have dkms working, however zfs needs to wait until the luks mapped devices are ready.

@Rudd-O
Copy link
Contributor

Rudd-O commented Jul 1, 2014

however zfs needs to wait until the luks mapped devices are ready.

Absolutely. This should be merged.

@behlendorf
Copy link
Contributor

@twosouls2gether Could you open a pull request with the proposed change so we can get it reviewed by some people familiar with systemd.

@benjamin-d-zz
Copy link

@twosouls2gether Thank you this is exactly what I needed to fix automouting with v0.6.3-1 (master checked out on 13th June 2014) with Fedora 20.

Ought to be merged...

@behlendorf behlendorf modified the milestones: 0.6.4, 0.6.6 Jul 21, 2014
alteriks added a commit to alteriks/zfs that referenced this issue Jul 26, 2014
ZFS mount service should be started after cryptsetup opens all LUKS devices.
Fixes mounting issues with systemd openzfs#1474
@alteriks
Copy link
Contributor

I had the same issue. Only one out of three LUKS devices was created quickly enough so pool could be mounted. I've had analyzed problem with $systemd-analyze plot
Broken:
zfs-mount_broken

Fixed:
zfs-mount_fix

alteriks added a commit to alteriks/zfs that referenced this issue Aug 25, 2014
service should be started after cryptsetup opens all LUKS devices.
Fixes mounting issues with systemd openzfs#1474
alteriks added a commit to alteriks/zfs that referenced this issue Aug 25, 2014
service should be started after cryptsetup opens all LUKS devices.
Fixes mounting issues with systemd openzfs#1474
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Sep 3, 2014
The zfs-import-cache.service and zfs-import-scan.service should
should be started after cryptsetup to ensure all LUKS devices have
been opened.

Signed-off-by: alteriks <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1474
ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
The zfs-import-cache.service and zfs-import-scan.service should
should be started after cryptsetup to ensure all LUKS devices have
been opened.

Signed-off-by: alteriks <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1474
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants