Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
libzfs_init() should busy-wait on module initialization
`libzfs_init()`'s just-in-time load of the module before using it is racy because Linux kernel module initialization is asynchronous. This causes a sporadic failure whenever `libzfs_init()` is required to load the kernel modules. This happens during the boot process on EPEL systems, Fedora and likely others such as Ubuntu. The general mode of failure is that `libzfs_init()` is expected to load the module, module initialization does not complete before /dev/zfs is opened and pool import fails. This could explain the infamous mountall failure on Ubuntu where pools will import, but things fail to mount. The general explanation is that the userland process expected to mount things fails because the module loses the race with libzfs_init(), the module loads the pools by reading the zpool.cache and nothing mounts because the userland process expected to perform the mount has already failed. A related issue can also manifest itself in initramfs archives that mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was implemented to ensure that the module loaded: https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784 https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307 The busy-wait approach was chosen because it imposed minimal latency and was implementable in shell code. Unfortunately, it was not known at the time that `libzfs_init()` had the same problem, so this went unfixed. It caused sporadic failures in the flocker tutorial, which caught our attention at ClusterHQ: https://clusterhq.atlassian.net/browse/FLOC-1834 Subsequent analysis following reproduction in a development environment concluded that the failures were caused by module initialization losing the race with `libzfs_init()`. While all Linux kernel modules needed ASAP during the boot process suffer from this race, the zfs module's dependence on additional modules make it particularly vulnerable to this issue. The solution that has been chosen mirrors the solution chosen for genkernel with the addition of `sched_yield()` for greater efficiency. This fails to close the race in the scenario where system execution in a virtual machine is paused in the exact window necessary to introduce a delay between a failure and subsequent try greater than the timeout. Closing the race in that situation would require hooking into udev and/or the kernel hotplug events. That has been left as a future improvement because it would require significant development time and it is quite likely that the busy-wait approach implemented here would be required for a fallback on exotic systems systems where neither are available. The chosen approach should be sufficient for achieving >99.999% reliability. Closes openzfs#2556 Signed-off-by: Richard Yao <[email protected]> Reviewed-by: Turbo Fredriksson <[email protected]>
- Loading branch information