-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libzfs_init() should busy-wait on module initialization #3427
Conversation
7fe9dab
to
d6fbda4
Compare
I am fairly confident this explains a sporadic failure on the buildbot: http://buildbot.zfsonlinux.org/builders/centos-7.0-x86_64-builder/builds/2202/steps/shell_14/logs/stdio |
@ryao @FransUrbo @dun good timing on this. This little bit of code had just come to my attention for similar reasons, specifically the updated init scripts. My inclination was actually to resolve this problem an entirely different way but I wanted your guys input. I definitely agree with the analysis, there is a race between module load and opening the With the notable exception of the mount command, which tries to load a module based on the filesystem type, this is the customary way of handling things under Linux. This code was added long ago only as a convenience and it make some dubious assumptions. Here are my thoughts for why not to do this.
So what do you think? What are the arguments for keeping this functionality? Did I miss any for removing it? |
I've said it before, I've never liked the autoloading of the module. NO OTHER (that I know of) system/software does this (in Linux at least). If I have a filesystem on a device which I have not loaded a module for, Having this is just a simplicity for new users, but it teaches them bad behavior. I personally would like to vote for the auto loading of modules to be removed. I prefer to do this outside of ZoL (in the init scripts, [my] initramfs scripts and what not). It's … "cleaner". HOWEVER, if we decide to keep the auto loading of modules, then this [PR] is the very least we can do. |
`libzfs_init()`'s just-in-time load of the module before using it is racy because Linux kernel module initialization is asynchronous. This causes a sporadic failure whenever `libzfs_init()` is required to load the kernel modules. This happens during the boot process on EPEL systems, Fedora and likely others such as Ubuntu. The general mode of failure is that `libzfs_init()` is expected to load the module, module initialization does not complete before /dev/zfs is opened and pool import fails. This could explain the infamous mountall failure on Ubuntu where pools will import, but things fail to mount. The general explanation is that the userland process expected to mount things fails because the module loses the race with libzfs_init(), the module loads the pools by reading the zpool.cache and nothing mounts because the userland process expected to perform the mount has already failed. A related issue can also manifest itself in initramfs archives that mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was implemented to ensure that the module loaded: https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784 https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307 The busy-wait approach was chosen because it imposed minimal latency and was implementable in shell code. Unfortunately, it was not known at the time that `libzfs_init()` had the same problem, so this went unfixed. It caused sporadic failures in the flocker tutorial, which caught our attention at ClusterHQ: https://clusterhq.atlassian.net/browse/FLOC-1834 Subsequent analysis following reproduction in a development environment concluded that the failures were caused by module initialization losing the race with `libzfs_init()`. While all Linux kernel modules needed ASAP during the boot process suffer from this race, the zfs module's dependence on additional modules make it particularly vulnerable to this issue. The solution that has been chosen mirrors the solution chosen for genkernel with the addition of `sched_yield()` for greater efficiency. This fails to close the race in the scenario where system execution in a virtual machine is paused in the exact window necessary to introduce a delay between a failure and subsequent try greater than the timeout. Closing the race in that situation would require hooking into udev and/or the kernel hotplug events. That has been left as a future improvement because it would require significant development time and it is quite likely that the busy-wait approach implemented here would be required for a fallback on exotic systems systems where neither are available. The chosen approach should be sufficient for achieving >99.999% reliability. Closes openzfs#2556 Signed-off-by: Richard Yao <[email protected]> Reviewed-by: Turbo Fredriksson <[email protected]>
I have never liked it either. However, removing it as soon as we do something better could cause problems for people updating to the next release like the hostid change did in 0.6.4. That would increase the support load in IRC and on the mailing list. It could also open us to the understandable criticism that we break things unnecessarily. I would rather fix this and call it deprecated in the next release. We could remove it in the future after we are certain that users have had sufficient time to migrate to the new method of loading the modules. One way of doing a transition would be to add a If we implement an environment variable, we should also have a discussion of what the right way to load the modules not just in the scripts, but in initramfs archives as well. |
@behlendorf I suspect that this patch is something that would be reasonable to backport to 0.6.4 while other options are not. It is your call though. |
OK, then since nobody likes it let's get rid of it. I thought you guys might have a reason for keeping it but it sure doesn't sound that way. I agree that we should phase it out slowly to mitigate any unexpected breakage which might occur and an environment variable is a totally reasonable way to do this. Although, I'd suggest using just @ryao let me provide some additional comments about this patch and if you could refresh it to behave like we've described we could it for an 0.6.4.2. |
I should have a pull request ready within the next couple of weeks so the zed will no longer be dependent upon this autoload behavior. |
I've not been following this issue too closely but I wanted to chime in with a reminder, in case it matters in this context, of another contender in the "load the module race": our standard packaging includes a udev module |
@dweeezil Good point, that definitely improves the odds the modules will already be loaded. It also looks like we're going to want to update the systemd units to do a |
@behlendorf If you update the systemd units to load the module, how will they ensure that I had this exact problem in genkernel, which is why I implemented the busy wait in shell code. |
@ryao I suppose we'll need some sort of polling loop like in the init scripts. Although, there might be a better way like creating another unit file which does the modprobe and then the import units might require the /dev/zfs file? I'm no expert but you're right it's a wrinkle we'll need to contend with. |
@behlendorf A colleague had the same idea. Unfortunately, the functionality to specify a file that is required will disable a service from starting when it does not find it rather than waiting for it to appear. |
@ryao we'll need to do a little research then. This certainly sounds like it should be a solved problem. |
@ryao upon further reflection I think this needs to be broken in to two parts.
This allows for the init scripts and systemd units to safely do the following which simplifies things.
|
@behlendorf How do you propose that we block? |
@ryao either busy-wait or polling/sleep until the device shows up. I'd argue for polling ever 1 ms or so, but if you'ld prefer to busy wait that's OK. In practice this is such a tiny optimization I doubt it matters either way. |
@ryao @FransUrbo I'm proposing #3431 which I believe addresses all the concerns raised. |
libzfs_init()
's just-in-time load of the module before using it isracy because Linux kernel module initialization is asynchronous. This
causes a sporadic failure whenever
libzfs_init()
is required to loadthe kernel modules. This happens during the boot process on EPEL
systems, Fedora and likely others such as Ubuntu.
The general mode of failure is that
libzfs_init()
is expected to loadthe module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.
A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fe
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307
The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code. Unfortunately, it was not known at the
time that
libzfs_init()
had the same problem, so this went unfixed. Itcaused sporadic failures in the flocker tutorial, which caught our
attention at ClusterHQ:
https://clusterhq.atlassian.net/browse/FLOC-1834
Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with
libzfs_init()
. While all Linux kernel modules neededASAP during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of
sched_yield()
for greater efficiency.This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a fallback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
Closes #2556
Signed-off-by: Richard Yao [email protected]
Reviewed-by: Turbo Fredriksson [email protected]