-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-import: Use cache file to reimport pools at boot #3800
zfs-import: Use cache file to reimport pools at boot #3800
Conversation
2a9c6b8
to
f135bb1
Compare
The correct way to fix this with the current infrastructure is to set
in the defaults config file ( That setup will enforce the use of cache file, and ONLY use the cache file (i.e., try once with that ONLY). |
Your so called "fix", @MrStaticVoid , fucks up everything for the large majority! This isn't Solaris. You need to start understanding this! This is LINUX! Linux works very differently from Solaris and any of the other *BSD variants out there. Yes, I'm usually the first to object when ZoL isn't working like ZFS, but mostly we need to make local changes so that it works the way the large majority expect things to work. And that is, when we talk about the cache file, NOT to use one! It is needed and even required for some cases, but generally and for the huge majority, it is more of a pain than it's worth. Before you start shopping up almost everything that have worked for every single Debian GNU/Linux (and derivatives - as well as many other systems) for years, you need to step back and actually think - YOU are not the majority!. Very few in the Linux community runs systems where a cache file is needed or required. Granted, those that do usually run a lot of setups, but as a whole, they're still the minority. And as I've (and others) have said for years now, it's more trouble than it's worth. Go have a look at the mailinglist archives and old issues - a lot of people are having problem that originates from the use of the cache file. |
@FransUrbo, you have not once provided a concrete example of how this "fucks things up for a large majority" besides espousing your own ignorance. Meanwhile, I can point to:
NINE concrete examples from @ryao and myself pointing out how you're wrong. Indeed, the last one even shows how your changes do introduce a security vulnerability into the works. The fact is, Linux is not sufficiently different from Solaris to justify the changes you've proposed. They both use You keep saying that very few setups require a cache file as if that's persuasive, but until you can respond to my nine examples showing why 100% of users need a cache file, I'm not convinced. The fact that you broke the cache file with your init script changes suggests to me you do not understand the way they work very well.
Sounds like a documentation problem. Let's educate our users a little bit better. Let's stop including the cache file in the initrd (which is where the majority of the complaints come from, and it's not necessary), but to change the way ZFS has always worked as you are proposing to do is arrogant and dangerous. |
This is not correct, because:
This pull request gets rid of 253 lines of unnecessary complication that buys us nothing except breaking compatibility with the rest of the ZFS world and introducing security vulnerabilities in the process. |
Actually I have. You just don't seem to understand written English.
And for every single one of these, I've given you two explanations on why you're not getting it and why your reasoning is wrong. AND given you solutions to solve the problem.
I have. And others have. On several occasions. Not my fault you're not getting it. Maybe your Linux skills isn't really up to snuff.
Just a shot in the dark, and I was never really considering it. Just offering ONE possible solution. I since came up with a correct one. Which you're suspiciously ignoring...
Maybe you should start to actually think?
Since I already have, several, several times and you're still not getting it NOR are you actually commenting on them, well... There's just so many times one can say and refute your comments and ravings without loosing interest in the matter. |
I've done my best to be respectful and stick to the facts, and in response all I've seen from you is ad hominem attacks and hurtful rhetoric. All I can hope, at this point, is that a more reasonable person than you will respond to the facts and do the right thing. Please don't comment on this pull request any more, @FransUrbo. |
@MrStaticVoid Oh, bu-hu! You're not going to start crying, are you? You're ACTUALLY telling me to stay the hell away from an open forum on an open source project?? Dang, you really ARE deluded! |
I think the record is clear at this point. Good bye, @FransUrbo. |
@@ -1,11 +1,10 @@ | |||
#!@SHELL@ | |||
# | |||
# zfs-import This script will import/export zfs pools. | |||
# zfs-import This script will reimport ZFS pools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say verbatim import.
@FransUrbo We should restore the original functionality so that the zpool import/export commands implicitly take care of things without requiring configuration files be edited. This would be consistent with the systemd unit files and other platforms. It did not occur to me that the new scripts would change this, so I focused primarily on the other areas during review due to lack of time. If I had realized that the new code changed this, I would have complained during review. That said, I am okay with having the option to switch to configuration files, but it should not be the default. That makes configuration more complicated than it needs to be. |
f135bb1
to
c777a1e
Compare
I restored the code that searches and imports all visible pools as a configurable, non-default option. The default is to use the verbatim import as @ryao said. I put a lot of thought into how to frame the configuration option, what to call it, how to describe and document it. I don't know if I got it right. My opinion is that the option presents enough of a risk to hide it from the user completely, but I included it anyway. |
@MrStaticVoid This looks good to me, but I need to review things more thoroughly to ensure that I did not miss anything. I intend to backport this change to Gentoo as soon as I have finished working out some unrelated systemd issues that were reported to me. |
@MrStaticVoid thanks for working on this and taking the time to open a pull request. I've been heads down working to address some of the stability issues which cropped up with the latest release so I haven't had a chance to give this the attention it deserves. I definitely want to explore what the right behavior should be for Linux. |
@MrStaticVoid I have been busy at work, but my plan is to merge this into Gentoo before the end of this week. Would you rebase this on HEAD? |
This change modifies the import service to use the default cache file to perform a verbatim import of pools at boot. This fixes code that searches all devices and imported all visible pools. Using the cache file is in keeping with the way ZFS has always worked, how Solaris, Illumos, FreeBSD, and systemd performs imports, and is how it is written in the man page (zpool(1M,8)): All pools in this cache are automatically imported when the system boots. Importantly, the cache contains important information for importing multipath devices, and helps control which pools get imported in more dynamic environments like SANs, which may have thousands of visible and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable is not equipped to handle. Verbatim imports prevent rogue pools from being automatically imported and mounted where they shouldn't be. The change also stops the service from exporting pools at shutdown. Exporting pools is only meant to be performed explicitly by the administrator of the system. The old behavior of searching and importing all visible pools is preserved and can be switched on by heeding the warning and toggling the ZPOOL_IMPORT_ALL_VISIBLE variable in /etc/default/zfs. Closes openzfs#3777 Closes openzfs#3526
c777a1e
to
ec690ae
Compare
@ryao, I rebased to HEAD. There was a conflict with the addition of the
Maybe a bug? Hard to tell because I fundamentally disagree with the need for the Anyway, as a Funtoo user, thanks for all the work you do on for ZFS on Gentoo, @ryao. Now if only we could get this regression fixed in the next ebuild release...cough. |
@ryao If you're hellbent on spending time to maintain your own stuff for Gentoo, there's nothing no one can say against that - you are the master of your own time. However, I wrote and submitted the SYSV init and initrd scripts (and behlendorf accepted) it to ZoL as a means to save everyone some time and energy. This includes the The config variables Your initrd scripts is still way behind in functionality of those that ZoL is distributing and that is your right. But it's not fair to your users, wether they're asking for that or not today. And regarding the |
@FransUrbo The issues we had dealing with the cachefile were bugs that are not inherent to the cachefile. Removing the cachefile constitutes a regression in ease of management in handling persistent imports. It also makes ZFS administration on KVM hosts where guest use ZFS into a nightmare. I used to think that the cachefile was a problem, but I changed my mind upon actually trying to remove it. The way forward is to fix the bugs that cause problems rather than try to reimplement the cachefile in a configuration file. The latter is fairly error prone. We have the mountpoint property so that we are not reliant on fstab for the same reason. |
@behlendorf This looks good to me and you can add my signed-off. I have already had to deal with a few people who had problems from this and I suspect that the support load will increase as others upgrade to 0.6.5.y. I think this should go into 0.6.5.3 so that we can avoid that. The update should prevent the cachefile from being clobbered on shutdown on the systems where this did not make a difference, so it should be fine to fix this in the release branch. I think that leaving the regression in the branch would be much worse. @MrStaticVoid This is going into Gentoo later today. The only lingering issue is a cosmetic one when the initramfs has already loaded the pools, but this fixes the key regression. |
@dajhorn I had thought that Ubuntu used upstart jobs, but after helping a user upgrade an older Ubuntu system, it seems that Ubuntu uses the sysvrc scripts, so this affects Ubuntu too. Would you give this your review too? |
openzfs/zfs#3800 Package-Manager: portage-2.2.20.1 Signed-off-by: Richard Yao <[email protected]>
@fearedbliss genkernel-next is sabayon's somewhat poorly named genkernel fork. The code is in the genkernel zfs branch: https://gitweb.gentoo.org/proj/genkernel.git/log/?h=zfs It has not yet been merged because I have not implemented full support for @FransUrbo The double import allows us to avoid coherence issues between the master cachefile on the pool and a copy in the initramfs by allowing us to omit a copy in the initramfs. I consider this to be the most problem-free way of booting with the current That said, I would like to develop a new commandine syntax that allows GRUB2 to pass information about the root pool to the kernel module that would then handle import in the rootfs mount. Until that is ready, this is the best we can do to avoid the issues that we had previously. The code that implements genkernel's ZFS support in that branch is here, although the syntax highlighting is behaving strangely: https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/linuxrc?h=zfs#n111 That is in order of execution. The double import logic is disabled when the initramfs has a copy of the zpool.cache file. It would be better to use the root dataset to store the cachefile like Open Indiana does rather than the $ROOTFS, but a switch to that needs more thought in terms of how it interacts with existing systems. I believe it is possible though. |
@ryao I'm reposting a reply that I sent to @MrStaticVoid from this comment https://github.com/fearedbliss/bliss-initramfs/commit/ff3939771917197fedfcbbb2aadebe21407d1218 Hey @MrStaticVoid , I can't take credit for the idea since it was ryao's haha. I just implemented it. You are correct in that it is useful for multipath. Personally I don't use multipath but having this implementation benefits the user in a lot of other ways. Primarily the benefit of this is to have one location for the zpool.cache file. Having multiple copies of the cache file (one in initramfs, and one on actual system) is not good and will increase the chances of the cache files being out of sync and causing boot errors or other unforeseeable results. Another benefit is that any activity you do in the actual system is automatically available in the initramfs without having to remake the initramfs. For example, I have a v28 bootpool so that I can use GRUB 2 zfs features and I keep my / pool in the latest v5000 0.6.5.2 state. So I can upgrade my root pool to the latest zfs on day 1 and not have to worry about my system not booting. So for this I have: 'boot' pool, and 'tank' pool. In the live system, I can export or import the pool either by-id or w/e, and when I reboot, this exact information will be available to the initramfs environment. Another important feature is the 'refresh' feature that I implemented. Basically this ignores the cachefile in the live system, scans the system for the root pool during early boot, and then imports it, copies the new cachefile to the live system, and boots. This is good if you want to regenerate the root pool (Since you cant do this from the live system since it's being used). The same exact code above can be used in the situation where the root pool doesn't have a cache file. Let's say you installed Gentoo on ZFS (or another Linux distro) and you forgot to copy the zpool.cache file, this might be a problem in certain situations. However, bliss-initramfs will detect that your rootfs doesn't have a cachefile and will copy the one that the initramfs creates into it. Which again, will mean that the initramfs can now re-use that initramfs during the next boot. So overall, there are many benefits to having the zpool.cache file available, it's just a matter of having the correct implementation and usage of it. I hope this explanation helps. |
This is still not an answer to my question I've been asking several times: How is this "better" than just importing it straight away?? If that forced ro import succeeds, a
Sorry, but what have you been smoking NOW?? How is that any different than what I'm doing? |
@FransUrbo cachefile allows the system to import all the pools which are defined in it, and ignore (at boot times) those which aren't, but are somehow attached (eg alternative SAN paths, pools defined inside zvols for VMs to use, random attached storage devices etc.). Sorry for stating the obvious, but you seem to have missed it. |
@FransUrbo A readonly import, a verbatim import and a normal import are three different things, despite them all being done through the There are likely other issues too, but it has been 3 years since I investigated this and I do not recall every issue that I encountered back then. However, the issues were substantial enough that I concluded that export on shutdown and non-verbatim import at boot were bad ideas. Given that the zpool.cache was included in the initramfs and the module would automatically do a verbatim import, I did not have to worry about superfluous @Bronek The new approach taken in genkernel's zfs branch and bliss-initramfs does not handle multipath correctly when there are SPA namespace collisions. We need to address this by havng the bootloader to pass information on the kernel commandline. However, it does seem to handle systems as they are deployed today because no one has complained yet to my knowledge. |
@behlendorf In a similar vein, the new behavior in the scripts means that we see |
I think the export issue is warranted a separate issue. And I have no problem disabling that, or making it configurable (although I don't see the point in that). I don't actually export the pool myself (because I'm booting from it). But I think we should continue discussing the export problem in a issue of its own. |
There are three things here. The boot scripts, dracut at boot and shutdown. We now have an issue for each. That said, the behavior that went into the repository damages ZFS' ability to use the uberblock history to recover from severe corruption events and pollutes the pool. It causes problems for system administrators who use ZFS on guests and the host in KVM and Xen environments beyond just making system administration a pain. The fact that the behavior is unnecessarily different from the other OpenZFS platforms when we are trying to reduce differences is minor in comparison to the other issues that it caused. @MrStaticVoid's patch fixes the main regression and my opinion is that should be merged for 0.6.5.3. While there are lingering cosmetic issues (printing that no pools were imported when the initramfs imported the pools), this is a regression introduced by the original scripts. Fixing a cosmetic issue is not quite as important as fixing the main regression. The only things left to discuss should not be blockers. Further discussion is unlikely to make the rest of us think that ZFS should act like LVM and MD RAID. The way those two do things have always been a pain for system administrators and it is not a behavior that we should adopt. I truly regret not catching this under earlier review. Had I thought that the zpool-import script could cause such problems, I would have read it line by line rather than spot checking due to limited time. |
I furiously argue against putting this into the snapshot branches! Not only because it introduces a significant and dangerous (because of untested code/logic) change in behavior, but also because this isn't what we want for the general Linux public. And if |
@FransUrbo The behavior shipped in 0.6.5 is broken and it is being deployed to users as they upgrade. If we backport the patch now, a substantial number of systems that have yet to be upgraded will be spared from the pain of dealing with this regression. As for the systems that were already upgraded, the patch has been deployed to Gentoo users. So far, there have been no problems switching back from the broken behavior to the correct one. This is because the pools are entered into the cachefile at boot and the export on shutdown behavior disappears when the scripts are upgraded, leaving the cachefile in a state where the imported pools are restored by the correct behavior at next boot. Any pools that failed to import or were incorrectly imported by the incorrect behavior are things that the system administrator will need to handle in either case and will be a pain without the patch. As for zpool export dirtying the uberblock, that is not something that merits fixing. The idea of adopting the LVM/MD behavior goes against the design of ZFS by giving system administrators more work. It would not be adopted by any other platform and it would make merging code harder than it already is. It makes no sense to make more work for us by adopting a bad behavior when the original design from Solaris works better. |
@FransUrbo when speaking "we", please explain does this include users, or just some developers? I am a user, and I much more prefer the idea of a configuration file used to decide which pools will be imported and mounted at boot, than importing anything that happens to be available at the moment (and then automatically mounting, because of pool configuration). If such behaviour was really natural in Linux, we wouldn't have |
@Bronek A plaintext configuration file does not handle namespace collsions unless GUIDs are used (which is a pain) and it cannot not handle GUID collisions (which is a security issue on which I do not wish to elaborate in public). It also requires multiple commands to maintain it and to keep the system inline with the update. The zpool.cache file avoids that and should be the easiest thing to use, provided that the mistakes made porting ZFS to Linux that caused problems are addressed. Similarly, the mountpoint property replaces fstab to provide the same ease when dealing with ZFS filesystems from pools. |
@ryao yes I know, by configuration file I was referring to |
@FransUrbo "Sorry, but what have you been smoking NOW?? How is that any different than what I'm doing? Lol.. I implemented bliss-initramfs back in 2012 with cachefile support from the near beginning of the project. Later on as @ryao and I discussed this topic over the years, I ended up removing cachefile support (and also hostid support) from bliss-initramfs. After discussing this topic again a few years after that change, I implemented again. This stuff isn't anything that I haven't already done, it's literally bringing back what I had done years ago and improving it with new ideas. You should stop assuming you implemented all of these things first. |
@ryao In bliss-initramfs, what it expects is the following from the bootloader: linux [kernel] root=tank/gentoo/root Since I don't use the "bootfs" property and solely rely on the user passing their root dataset directly, would it handle the multipath/SPA stuff correctly? |
@fearedbliss It would have a problem if multiple pools have the same name. |
I never did. It's a well known fact that I only improved (slightly) on what's in ZoL now. Where that code came from, we don't exactly know. But feel free to avoid the actually question... |
@ryao Ah I see. Alright we can discuss. |
@behlendorf Would you merge this to HEAD? This patch makes the sysvrc/openrc scripts behave like the systemd units. All distributions I know to use those scripts sans Debian have had it backported. Having the regressed behavior in the main repository is problematic for users of Gentoo's 9999 ebuilds that build from git because they obtain a broken behavior where pools that should not be imported are imported. The main place where this is a problem involves VM hosts where the export on shutdown behavior combined with this can cause pool corruption from the guest and the host systems importing the pool at the same time. It is also a security hazard on VM hosts where maliciously crafted pools could target zero day bugs in the zpool import routines to gain root or higher privileges on the host. I find myself between a rock and a hard place in supporting the Gentoo users building from HEAD via the Gentoo live ebuilds Merging this will not pose problems for Debian users who do not build themselves while those that do need to apply this patch manually to get a behavior consistent with other ZFS systems. Since @FransUrbo is attached to this behavior, I imagine he would carry a patch in Debian reinstating his preferred behavior after it is merged. Consequently, merging this should pose no problem for the users of the packaging he maintains. |
@behlendorf #3889 appears to have occurred where two nodes had access to the same pools over SAS and they both tried importing the pools at the same time, causing pool corruption. This happened on CentOS, but on distributions where the "import all pools" behavior is used in conjunction with either export at shutdown, pool corruption will occur, even if the default spl_hostid is changed from 0 to prevent it. |
@MrStaticVoid Your prediction of a security vulnerability was correct. I have figured out how to exploit this. I have sent the details in a PGP encrypted email to @behlendorf. I think it will be appropriate to write a second patch killing the option for the auto-import behavior. Then @behlendorf can decide what he wants to do. |
Thanks for looking out ryao. I'm leaving for a trip so I don't have time to submit another pull request, but I gather the older version of this pull request: https://github.com/MrStaticVoid/zfs/commit/f135bb10161ed8b543aed6cd1c1a3bd052d47597.diff would be a good starting point, after |
@MrStaticVoid thank you for investigating this and posting a fix. This definitely is a serious accidental regression I should have caught when initially reviewing the init scripts. Many of the points you and @ryao raised above I agree with. But most importantly we absolutely need to preserve using the cache file for now as an added layer of protection to ensure systems with shared storage don't end up with a pool imported concurrently by multiple hosts. I'll be merging your patch to the master and @nedbass can cherry-pick it for the release branch and 0.6.5.3. That said, longer term I still would like to seriously explore replacing (or supplementing) the existing If someone is particularly interested in this area I'd love to see a plan laid out for how to improve the current situation. For example, I agree with @FransUrbo that on Linux systems some of the arguments for keeping the cache file, such as speeding up the boot, just don't make any sense. Unlike the other platforms all block devices are brought online asynchronously and udev probes them with blkid. It seems to me the optimal thing to do would be to leverage udev to dynamically construct the geometry for all available pools. Then the policy for which pools should be imported, and under what conditions, should reside in a clearly human readable configuration file with sane defaults. This would ensure pools are imported as early as possible, but no sooner, and the policy by which a pool is imported is clearly articulated and visible to users. I wish I personally had time to work on this and investigate all the issues but the fact of the matter is I don't. And I don't think we need to be constrained by what the other platforms have done. The way Linux system bootstraps itself is very different from illumos and the BSDs and we should take advantage of that. This would be a valuable project to work if we have anyone interested. |
Merged as: 33df62d zfs-import: Perform verbatim import using cache file |
We could likely address this in a way that it will not matter by preferring I consider the need for various initramfs software to generate the symlinks to be a blocker, so I hvae not implemented this yet, but my development in my spare time is heading in that direction. |
openzfs/zfs#3800 Package-Manager: portage-2.2.20.1 Signed-off-by: Richard Yao <[email protected]>
This change modifies the import service to use the default cache file
to reimport pools at boot. This fixes code that exhaustively searched
the entire system and imported all visible pools. Using the cache
file is in keeping with the way ZFS has always worked, and is how it is
written in the man page (zpool(1M,8)):
Importantly, the cache contains important information for importing
multipath devices, and helps control which pools get imported in more
dynamic environments like SANs, which may have thousands of visible
and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable
is not equipped to handle.
The change also stops the service from exporting pools at shutdown.
Exporting pools is only meant to be performed by the administrator
of the system.
Closes #3777
Closes #3526