Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS pool not mounted on boot on Ubuntu 14.04.1 (trusty) #2556

Closed
Zsub opened this issue Jul 30, 2014 · 5 comments
Closed

ZFS pool not mounted on boot on Ubuntu 14.04.1 (trusty) #2556

Zsub opened this issue Jul 30, 2014 · 5 comments
Milestone

Comments

@Zsub
Copy link

Zsub commented Jul 30, 2014

After buying a new PC, my exported and imported ZFS pool doesn't mount on boot anymore using Ubuntu 14.04.1. I have an SSD I use for / and for the ZIL and L2arc, and three WD Greens in RAIDZ1.

I installed using the stable ppa, through apt-get install ubuntu-zfs. I have also double-checked all the steps in the mountall-howto, but to no avail. It certainly does not 'just work' :-)

I have attached all the output as described in the final step of the mountall howto in this gist.

The only thing seems to be that the filesystem just doesn't get mounted, zfs knows about it and everything.

Thanks for taking a look!

@behlendorf behlendorf added the Bug label Jul 31, 2014
@behlendorf behlendorf added this to the 0.7.0 milestone Jul 31, 2014
@Zsub
Copy link
Author

Zsub commented Aug 3, 2014

I have done some more investigation, if I start a rescue shell during boot, which runs before the mountall upstart job (which I inferred by deleting its logfile, which is not yet created at that time, so I may be wrong still), zfs list and zpool status both show my pool as online and available.

I worked around my problem for now by setting ZFS_MOUNT=yes in /etc/default/zfs.

@zenny
Copy link

zenny commented Aug 11, 2014

Even appending ZFS_MOUNT=yes to /etc/default/zfs didn't work.

I just have to import the pool manually every time or appending a line in the /etc/rc.local.

@l1k
Copy link
Contributor

l1k commented Oct 6, 2014

Likely fixed by #2766 if Dracut is used.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Oct 7, 2014
Make use of Dracut's ability to restore the initramfs on shutdown and
pivot to it, allowing for a clean unmount and export of the ZFS root.
No need to force-import on every reboot anymore.

Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2195
Issue openzfs#2476
Issue openzfs#2498
Issue openzfs#2556
Issue openzfs#2563
Issue openzfs#2575
Issue openzfs#2600
Issue openzfs#2755
Issue openzfs#2766
ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
Make use of Dracut's ability to restore the initramfs on shutdown and
pivot to it, allowing for a clean unmount and export of the ZFS root.
No need to force-import on every reboot anymore.

Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2195
Issue openzfs#2476
Issue openzfs#2498
Issue openzfs#2556
Issue openzfs#2563
Issue openzfs#2575
Issue openzfs#2600
Issue openzfs#2755
Issue openzfs#2766
@borutmrak
Copy link

I have this same issue on multiple (not all) machines running Ubuntu 14.04.

One (my personal machine) has a SSD rpool (single drive) and raidz1 data pool (2x 1TB drives) - all of those are vdevs on LUKS. rpool mounts, data pool has datasets that are peppered all over (/home/username/Downloads etc). rpool datasets all mount, but none from the other pool.

Another machine has ext4 root on SSD and a similar (2x 1TB) raidz1 pool that has to be mounted manually after boot.

I was thinking that it might have something to do with the mount order. For instance on the first machine "zfs list" first lists the data pool and after that the rpool. If it tries to mount in the order listed, the mounpoints do not exist yet. I think I solved this on one other machine by renaming the datasets to change the sort list and it seems to work now.

The second machine, however, has all the mountpoints there (when ext4 root is mounted) but still requires manual "zfs mount -a" after boot.

This is zfs list from the first machine (I removed a few lines that I don't think are important here):

➜ ~ sudo zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 526G 389G 39K /data
data/home 443G 389G 10.4G none
data/home/b 432G 389G 6.42G legacy
data/home/b/Downloads 140G 389G 140G /home/b/Downloads
[snip]
data/home/b/cold-storage 18.9G 389G 18.9G /home/b/cold-storage
rpool 58.4G 39.5G 33K /rpool
rpool/ROOT 17.2G 39.5G 32K /rpool/ROOT
rpool/ROOT/ubuntu-trusty 17.2G 39.5G 10.5G /
rpool/home 20.4G 39.5G 30K /rpool/home
rpool/home/b 20.4G 39.5G 15.2G /home/b
rpool/vmimages 20.6G 39.5G 30K /rpool/vmimages
rpool/vmimages/vm-win7 20.6G 40.0G 20.2G -

I can reproduce this on every boot, I'll be happy to test anything you throw my way.

Using initramfs-tools though, not dracut (that one is not really useful as packaged in Ubuntu).

thanks,
B.

@ryao
Copy link
Contributor

ryao commented May 18, 2015

Pull request #3427 should fix this.

ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
libzfs_init()'s JIT load of the module before using it is racy because
Linux kernel module initialization is asynchronous. This causes a
sporadic failure whenever libzfs_init() is required to load the kernel
modules. This happens during the boot process on EPEL systems, Fedora
and likely others such as Ubuntu.

The general mode of failure is that libzfs_init() is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that libzfs_init() had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention gat ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with libzfs_init(). While all Linux kernel modules needed ASAP
during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of sched_yield() for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
libzfs_init()'s JIT load of the module before using it is racy because
Linux kernel module initialization is asynchronous. This causes a
sporadic failure whenever libzfs_init() is required to load the kernel
modules. This happens during the boot process on EPEL systems, Fedora
and likely others such as Ubuntu.

The general mode of failure is that libzfs_init() is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that libzfs_init() had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention gat ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with libzfs_init(). While all Linux kernel modules needed ASAP
during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of sched_yield() for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
libzfs_init()'s JIT load of the module before using it is racy because
Linux kernel module initialization is asynchronous. This causes a
sporadic failure whenever libzfs_init() is required to load the kernel
modules. This happens during the boot process on EPEL systems, Fedora
and likely others such as Ubuntu.

The general mode of failure is that libzfs_init() is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that libzfs_init() had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention gat ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with libzfs_init(). While all Linux kernel modules needed ASAP
during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of sched_yield() for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
libzfs_init()'s JIT load of the module before using it is racy because
Linux kernel module initialization is asynchronous. This causes a
sporadic failure whenever libzfs_init() is required to load the kernel
modules. This happens during the boot process on EPEL systems, Fedora
and likely others such as Ubuntu.

The general mode of failure is that libzfs_init() is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that libzfs_init() had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention gat ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with libzfs_init(). While all Linux kernel modules needed ASAP
during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of sched_yield() for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
libzfs_init()'s JIT load of the module before using it is racy because
Linux kernel module initialization is asynchronous. This causes a
sporadic failure whenever libzfs_init() is required to load the kernel
modules. This happens during the boot process on EPEL systems, Fedora
and likely others such as Ubuntu.

The general mode of failure is that libzfs_init() is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that libzfs_init() had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention at ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with libzfs_init(). While all Linux kernel modules needed ASAP
during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of sched_yield() for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
`libzfs_init()`'s just-in-time load of the module before using it is
racy because Linux kernel module initialization is asynchronous. This
causes a sporadic failure whenever `libzfs_init()` is required to load
the kernel modules. This happens during the boot process on EPEL
systems, Fedora and likely others such as Ubuntu.

The general mode of failure is that `libzfs_init()` is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that `libzfs_init()` had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention at ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with `libzfs_init()`. While all Linux kernel modules needed
ASAP during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of `sched_yield()` for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
`libzfs_init()`'s just-in-time load of the module before using it is
racy because Linux kernel module initialization is asynchronous. This
causes a sporadic failure whenever `libzfs_init()` is required to load
the kernel modules. This happens during the boot process on EPEL
systems, Fedora and likely others such as Ubuntu.

The general mode of failure is that `libzfs_init()` is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that `libzfs_init()` had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention at ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with `libzfs_init()`. While all Linux kernel modules needed
ASAP during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of `sched_yield()` for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a fallback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue May 18, 2015
`libzfs_init()`'s just-in-time load of the module before using it is
racy because Linux kernel module initialization is asynchronous. This
causes a sporadic failure whenever `libzfs_init()` is required to load
the kernel modules. This happens during the boot process on EPEL
systems, Fedora and likely others such as Ubuntu.

The general mode of failure is that `libzfs_init()` is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that `libzfs_init()` had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention at ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with `libzfs_init()`. While all Linux kernel modules needed
ASAP during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of `sched_yield()` for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a fallback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
behlendorf added a commit to behlendorf/zfs that referenced this issue May 19, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The additional small changes were also made.

* In libzfs_run_process() 'rc' variables was renamed to 'error' for
  consistency with the rest of the code base.

* All fprintf() error messages were moved out of the libzfs_init()
  library function where they never belonged in the first place.  A
  libzfs_error_init() function was added to provide useful error
  messages for the most common causes of failure.

* The zfs-import-* systemd service files have been updated to call
  '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
  behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
behlendorf added a commit to behlendorf/zfs that referenced this issue May 19, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The additional small changes were also made.

* In libzfs_run_process() 'rc' variables was renamed to 'error' for
  consistency with the rest of the code base.

* All fprintf() error messages were moved out of the libzfs_init()
  library function where they never belonged in the first place.  A
  libzfs_error_init() function was added to provide useful error
  messages for the most common causes of failure.

* The zfs-import-* systemd service files have been updated to call
  '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
  behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
behlendorf added a commit to behlendorf/zfs that referenced this issue May 19, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The additional small changes were also made.

* In libzfs_run_process() 'rc' variables was renamed to 'error' for
  consistency with the rest of the code base.

* All fprintf() error messages were moved out of the libzfs_init()
  library function where they never belonged in the first place.  A
  libzfs_error_init() function was added to provide useful error
  messages for the most common causes of failure.

* The zfs-import-* systemd service files have been updated to call
  '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
  behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
ryao added a commit to ClusterHQ/zfs that referenced this issue May 20, 2015
`libzfs_init()`'s just-in-time load of the module before using it is
racy because Linux kernel module initialization is asynchronous. This
causes a sporadic failure whenever `libzfs_init()` is required to load
the kernel modules. This happens during the boot process on EPEL
systems, Fedora and likely others such as Ubuntu.

The general mode of failure is that `libzfs_init()` is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.

A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307

The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code.  Unfortunately, it was not known at the
time that `libzfs_init()` had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention at ClusterHQ:

https://clusterhq.atlassian.net/browse/FLOC-1834

Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with `libzfs_init()`. While all Linux kernel modules needed
ASAP during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of `sched_yield()` for greater efficiency.

This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a fallback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.

Closes openzfs#2556

Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
behlendorf added a commit to behlendorf/zfs that referenced this issue May 20, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The additional small changes were also made.

* In libzfs_run_process() 'rc' variables was renamed to 'error' for
  consistency with the rest of the code base.

* All fprintf() error messages were moved out of the libzfs_init()
  library function where they never belonged in the first place.  A
  libzfs_error_init() function was added to provide useful error
  messages for the most common causes of failure.

* The zfs-import-* systemd service files have been updated to call
  '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
  behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
behlendorf added a commit to behlendorf/zfs that referenced this issue May 20, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
behlendorf added a commit to behlendorf/zfs that referenced this issue May 21, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
behlendorf added a commit that referenced this issue May 22, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

NOTE: Unlike the version of this patch which was merged to master the
default behavior is to auto-load the modules.  The default behavior
should not be changes for a point release.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlap <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #2556
dasjoe pushed a commit to dasjoe/zfs that referenced this issue May 24, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlap <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#2556
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue May 25, 2015
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlap <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#2556
MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this issue Aug 10, 2015
Updates ZFS and SPL to latest maintence version.  Includes the
following:

Bug Fixes:
* Fix panic due to corrupt nvlist when running utilities
(openzfs/zfs#3335)
* Fix hard lockup due to infinite loop in zfs_zget()
(openzfs/zfs#3349)
* Fix panic on unmount due to iput taskq (openzfs/zfs#3281)
* Improve metadata shrinker performance on pre-3.1 kernels
(openzfs/zfs#3501)
* Linux 4.1 compat: use read_iter() / write_iter()
* Linux 3.12 compat: NUMA-aware per-superblock shrinker
* Fix spurious hung task watchdog stack traces (openzfs/zfs#3402)
* Fix module loading in zfs import systemd service
(openzfs/zfs#3440)
* Fix intermittent libzfs_init() failure to open /dev/zfs
(openzfs/zfs#2556)

Signed-off-by: Nathaniel Clark <[email protected]>
Change-Id: I053087317ff9e5bedc1671bb46062e96bfe6f074
Reviewed-on: http://review.whamcloud.com/15481
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants