Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-import: Use cache file to reimport pools at boot #3800

Closed

Conversation

jameslikeslinux
Copy link
Contributor

This change modifies the import service to use the default cache file
to reimport pools at boot. This fixes code that exhaustively searched
the entire system and imported all visible pools. Using the cache
file is in keeping with the way ZFS has always worked, and is how it is
written in the man page (zpool(1M,8)):

All pools  in  this  cache  are  automatically imported when the
system boots.

Importantly, the cache contains important information for importing
multipath devices, and helps control which pools get imported in more
dynamic environments like SANs, which may have thousands of visible
and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable
is not equipped to handle.

The change also stops the service from exporting pools at shutdown.
Exporting pools is only meant to be performed by the administrator
of the system.

Closes #3777
Closes #3526

@FransUrbo
Copy link
Contributor

The correct way to fix this with the current infrastructure is to set

ZPOOL_IMPORT_OPTS="-c /etc/zfs/zpool.cache"
ZPOOL_CACHE=""

in the defaults config file (/etc/defaults/zfs or corresponding location on Redhat/Fedora etc).

That setup will enforce the use of cache file, and ONLY use the cache file (i.e., try once with that ONLY).

@FransUrbo
Copy link
Contributor

Your so called "fix", @MrStaticVoid , fucks up everything for the large majority!

This isn't Solaris. You need to start understanding this! This is LINUX! Linux works very differently from Solaris and any of the other *BSD variants out there. Yes, I'm usually the first to object when ZoL isn't working like ZFS, but mostly we need to make local changes so that it works the way the large majority expect things to work. And that is, when we talk about the cache file, NOT to use one! It is needed and even required for some cases, but generally and for the huge majority, it is more of a pain than it's worth.

Before you start shopping up almost everything that have worked for every single Debian GNU/Linux (and derivatives - as well as many other systems) for years, you need to step back and actually think - YOU are not the majority!.

Very few in the Linux community runs systems where a cache file is needed or required. Granted, those that do usually run a lot of setups, but as a whole, they're still the minority. And as I've (and others) have said for years now, it's more trouble than it's worth. Go have a look at the mailinglist archives and old issues - a lot of people are having problem that originates from the use of the cache file.

@jameslikeslinux
Copy link
Contributor Author

@FransUrbo, you have not once provided a concrete example of how this "fucks things up for a large majority" besides espousing your own ignorance. Meanwhile, I can point to:

  1. zfs-import script does not honor the cachefile #3777 (comment)
  2. zfs-import script does not honor the cachefile #3777 (comment)
  3. zfs-import script does not honor the cachefile #3777 (comment)
  4. zfs-import script does not honor the cachefile #3777 (comment)
  5. Change default cachefile property to 'none'. #3526 (comment)
  6. Change default cachefile property to 'none'. #3526 (comment)
  7. Change default cachefile property to 'none'. #3526 (comment)
  8. Change default cachefile property to 'none'. #3526 (comment)
  9. Change default cachefile property to 'none'. #3526 (comment)

NINE concrete examples from @ryao and myself pointing out how you're wrong. Indeed, the last one even shows how your changes do introduce a security vulnerability into the works.

The fact is, Linux is not sufficiently different from Solaris to justify the changes you've proposed. They both use /dev. They're both Unix. Can you give me a concrete example of how Linux is so different from Solaris to justify breaking compatibility, and indeed, introducing a security vulnerability in the process? I can't think of one.

You keep saying that very few setups require a cache file as if that's persuasive, but until you can respond to my nine examples showing why 100% of users need a cache file, I'm not convinced. The fact that you broke the cache file with your init script changes suggests to me you do not understand the way they work very well.

Go have a look at the mailinglist archives and old issues - a lot of people are having problem that originates from the use of the cache file.

Sounds like a documentation problem. Let's educate our users a little bit better. Let's stop including the cache file in the initrd (which is where the majority of the complaints come from, and it's not necessary), but to change the way ZFS has always worked as you are proposing to do is arrogant and dangerous.

@jameslikeslinux
Copy link
Contributor Author

The correct way to fix this with the current infrastructure is to set

ZPOOL_IMPORT_OPTS="-c /etc/zfs/zpool.cache"
ZPOOL_CACHE=""

This is not correct, because:

  1. The current infrastructure exports pools at shutdown, which completely negates the point of the cache.
  2. Unless made the default, the current infrastructure is still vulnerable to Change default cachefile property to 'none'. #3526 (comment) and the number of issues raised by @ryao in zfs-import script does not honor the cachefile #3777.

This pull request gets rid of 253 lines of unnecessary complication that buys us nothing except breaking compatibility with the rest of the ZFS world and introducing security vulnerabilities in the process.

@FransUrbo
Copy link
Contributor

@FransUrbo, you have not once provided a concrete example of how this "fucks things up for a large majority"

Actually I have. You just don't seem to understand written English.

NINE concrete examples from @ryao and myself pointing out how you're wrong.

And for every single one of these, I've given you two explanations on why you're not getting it and why your reasoning is wrong. AND given you solutions to solve the problem.

Can you give me a concrete example of how Linux is so different from Solaris to justify breaking compatibility

I have. And others have. On several occasions. Not my fault you're not getting it. Maybe your Linux skills isn't really up to snuff.

and indeed, introducing a security vulnerability in the process?

Just a shot in the dark, and I was never really considering it. Just offering ONE possible solution. I since came up with a correct one. Which you're suspiciously ignoring...

I can't think of one.

Maybe you should start to actually think?

until you can respond to my nine examples showing why 100% of users need a cache file, I'm not convinced.

Since I already have, several, several times and you're still not getting it NOR are you actually commenting on them, well... There's just so many times one can say and refute your comments and ravings without loosing interest in the matter.

@jameslikeslinux
Copy link
Contributor Author

You just don't seem to understand written English.

Maybe your Linux skills isn't really up to snuff.

Maybe you should start to actually think?

I've done my best to be respectful and stick to the facts, and in response all I've seen from you is ad hominem attacks and hurtful rhetoric. All I can hope, at this point, is that a more reasonable person than you will respond to the facts and do the right thing. Please don't comment on this pull request any more, @FransUrbo.

@FransUrbo
Copy link
Contributor

@MrStaticVoid Oh, bu-hu! You're not going to start crying, are you? You're ACTUALLY telling me to stay the hell away from an open forum on an open source project?? Dang, you really ARE deluded!

@jameslikeslinux
Copy link
Contributor Author

I think the record is clear at this point. Good bye, @FransUrbo.

@@ -1,11 +1,10 @@
#!@SHELL@
#
# zfs-import This script will import/export zfs pools.
# zfs-import This script will reimport ZFS pools.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say verbatim import.

@ryao
Copy link
Contributor

ryao commented Sep 19, 2015

@FransUrbo We should restore the original functionality so that the zpool import/export commands implicitly take care of things without requiring configuration files be edited. This would be consistent with the systemd unit files and other platforms. It did not occur to me that the new scripts would change this, so I focused primarily on the other areas during review due to lack of time. If I had realized that the new code changed this, I would have complained during review.

That said, I am okay with having the option to switch to configuration files, but it should not be the default. That makes configuration more complicated than it needs to be.

@jameslikeslinux
Copy link
Contributor Author

I restored the code that searches and imports all visible pools as a configurable, non-default option. The default is to use the verbatim import as @ryao said.

I put a lot of thought into how to frame the configuration option, what to call it, how to describe and document it. I don't know if I got it right. My opinion is that the option presents enough of a risk to hide it from the user completely, but I included it anyway.

@ryao
Copy link
Contributor

ryao commented Sep 20, 2015

@MrStaticVoid This looks good to me, but I need to review things more thoroughly to ensure that I did not miss anything. I intend to backport this change to Gentoo as soon as I have finished working out some unrelated systemd issues that were reported to me.

@behlendorf
Copy link
Contributor

@MrStaticVoid thanks for working on this and taking the time to open a pull request. I've been heads down working to address some of the stability issues which cropped up with the latest release so I haven't had a chance to give this the attention it deserves. I definitely want to explore what the right behavior should be for Linux.

@ryao
Copy link
Contributor

ryao commented Oct 1, 2015

@MrStaticVoid I have been busy at work, but my plan is to merge this into Gentoo before the end of this week. Would you rebase this on HEAD?

This change modifies the import service to use the default cache file
to perform a verbatim import of pools at boot.  This fixes code that
searches all devices and imported all visible pools.

Using the cache file is in keeping with the way ZFS has always worked,
how Solaris, Illumos, FreeBSD, and systemd performs imports, and is how
it is written in the man page (zpool(1M,8)):

    All pools  in  this  cache  are  automatically imported when the
    system boots.

Importantly, the cache contains important information for importing
multipath devices, and helps control which pools get imported in more
dynamic environments like SANs, which may have thousands of visible
and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable
is not equipped to handle.  Verbatim imports prevent rogue pools from
being automatically imported and mounted where they shouldn't be.

The change also stops the service from exporting pools at shutdown.
Exporting pools is only meant to be performed explicitly by the
administrator of the system.

The old behavior of searching and importing all visible pools is
preserved and can be switched on by heeding the warning and toggling
the ZPOOL_IMPORT_ALL_VISIBLE variable in /etc/default/zfs.

Closes openzfs#3777
Closes openzfs#3526
@jameslikeslinux
Copy link
Contributor Author

@ryao, I rebased to HEAD. There was a conflict with the addition of the ZFS_POOL_IMPORT variable from 5773296. I had to reword the comment for the variable because, looking into it, it's only used by the Debian initramfs:

> git grep ZFS_POOL_IMPORT
contrib/initramfs/scripts/zfs:  if [ -n "${ZFS_POOL_IMPORT}" ]; then
contrib/initramfs/scripts/zfs:          echo "$ZFS_POOL_IMPORT"
etc/init.d/zfs.in:#ZFS_POOL_IMPORT="pool1;pool2"

Maybe a bug? Hard to tell because I fundamentally disagree with the need for the ZFS_POOL_IMPORT variable in the first place and I can't make heads or tales of the sloppy initramfs script.

Anyway, as a Funtoo user, thanks for all the work you do on for ZFS on Gentoo, @ryao. Now if only we could get this regression fixed in the next ebuild release...cough.

@FransUrbo
Copy link
Contributor

@ryao If you're hellbent on spending time to maintain your own stuff for Gentoo, there's nothing no one can say against that - you are the master of your own time.

However, I wrote and submitted the SYSV init and initrd scripts (and behlendorf accepted) it to ZoL as a means to save everyone some time and energy. This includes the dracut code I also have in the pipeline. You have said, if I understand you correctly, that you're against a config file for the init scripts for ZoL, but Gentoo still have them for other scripts.

The config variables ZPOOL_IMPORT_OPTS and ZPOOL_CACHE will solve the problems you're seeing. If you don't want that, you'll have to spend time maintaining your own scripts…

Your initrd scripts is still way behind in functionality of those that ZoL is distributing and that is your right. But it's not fair to your users, wether they're asking for that or not today.

And regarding the ZFS_POOL_IMPORT, it's only used in the initrd scripts at the moment, because the WHOLE fix is waiting for review. Which should be "soon" I hope. See #3559. I think this solves everyones problem correctly.

@ryao
Copy link
Contributor

ryao commented Oct 1, 2015

@FransUrbo The issues we had dealing with the cachefile were bugs that are not inherent to the cachefile. Removing the cachefile constitutes a regression in ease of management in handling persistent imports. It also makes ZFS administration on KVM hosts where guest use ZFS into a nightmare.

I used to think that the cachefile was a problem, but I changed my mind upon actually trying to remove it. The way forward is to fix the bugs that cause problems rather than try to reimplement the cachefile in a configuration file. The latter is fairly error prone. We have the mountpoint property so that we are not reliant on fstab for the same reason.

@ryao
Copy link
Contributor

ryao commented Oct 1, 2015

@behlendorf This looks good to me and you can add my signed-off. I have already had to deal with a few people who had problems from this and I suspect that the support load will increase as others upgrade to 0.6.5.y. I think this should go into 0.6.5.3 so that we can avoid that. The update should prevent the cachefile from being clobbered on shutdown on the systems where this did not make a difference, so it should be fine to fix this in the release branch. I think that leaving the regression in the branch would be much worse.

@MrStaticVoid This is going into Gentoo later today. The only lingering issue is a cosmetic one when the initramfs has already loaded the pools, but this fixes the key regression.

@ryao
Copy link
Contributor

ryao commented Oct 1, 2015

@dajhorn I had thought that Ubuntu used upstart jobs, but after helping a user upgrade an older Ubuntu system, it seems that Ubuntu uses the sysvrc scripts, so this affects Ubuntu too. Would you give this your review too?

ryao added a commit to gentoo/gentoo that referenced this pull request Oct 1, 2015
openzfs/zfs#3800

Package-Manager: portage-2.2.20.1
Signed-off-by: Richard Yao <[email protected]>
@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

@fearedbliss genkernel-next is sabayon's somewhat poorly named genkernel fork. The code is in the genkernel zfs branch:

https://gitweb.gentoo.org/proj/genkernel.git/log/?h=zfs

It has not yet been merged because I have not implemented full support for /dev/disk/by-* symlinks. We currently only have partial support for /dev/disk/by-id/*. My plan is to add it soon.

@FransUrbo The double import allows us to avoid coherence issues between the master cachefile on the pool and a copy in the initramfs by allowing us to omit a copy in the initramfs. I consider this to be the most problem-free way of booting with the current root=ZFS=$ROOTFS kernel commandline syntax.

That said, I would like to develop a new commandine syntax that allows GRUB2 to pass information about the root pool to the kernel module that would then handle import in the rootfs mount. Until that is ready, this is the best we can do to avoid the issues that we had previously.

The code that implements genkernel's ZFS support in that branch is here, although the syntax highlighting is behaving strangely:

https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/linuxrc?h=zfs#n111
https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/linuxrc?h=zfs#n307
https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/initrd.scripts?h=zfs#n1027
https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/linuxrc?h=zfs#n525

That is in order of execution. The double import logic is disabled when the initramfs has a copy of the zpool.cache file. It would be better to use the root dataset to store the cachefile like Open Indiana does rather than the $ROOTFS, but a switch to that needs more thought in terms of how it interacts with existing systems. I believe it is possible though.

@fearedbliss
Copy link

@ryao I'm reposting a reply that I sent to @MrStaticVoid from this comment https://github.com/fearedbliss/bliss-initramfs/commit/ff3939771917197fedfcbbb2aadebe21407d1218


Hey @MrStaticVoid ,

I can't take credit for the idea since it was ryao's haha. I just implemented it. You are correct in that it is useful for multipath. Personally I don't use multipath but having this implementation benefits the user in a lot of other ways. Primarily the benefit of this is to have one location for the zpool.cache file. Having multiple copies of the cache file (one in initramfs, and one on actual system) is not good and will increase the chances of the cache files being out of sync and causing boot errors or other unforeseeable results.

Another benefit is that any activity you do in the actual system is automatically available in the initramfs without having to remake the initramfs. For example, I have a v28 bootpool so that I can use GRUB 2 zfs features and I keep my / pool in the latest v5000 0.6.5.2 state. So I can upgrade my root pool to the latest zfs on day 1 and not have to worry about my system not booting. So for this I have: 'boot' pool, and 'tank' pool. In the live system, I can export or import the pool either by-id or w/e, and when I reboot, this exact information will be available to the initramfs environment.

Another important feature is the 'refresh' feature that I implemented. Basically this ignores the cachefile in the live system, scans the system for the root pool during early boot, and then imports it, copies the new cachefile to the live system, and boots. This is good if you want to regenerate the root pool (Since you cant do this from the live system since it's being used).

The same exact code above can be used in the situation where the root pool doesn't have a cache file. Let's say you installed Gentoo on ZFS (or another Linux distro) and you forgot to copy the zpool.cache file, this might be a problem in certain situations. However, bliss-initramfs will detect that your rootfs doesn't have a cachefile and will copy the one that the initramfs creates into it. Which again, will mean that the initramfs can now re-use that initramfs during the next boot.

So overall, there are many benefits to having the zpool.cache file available, it's just a matter of having the correct implementation and usage of it.

I hope this explanation helps.

@FransUrbo
Copy link
Contributor

The double import allows us to avoid coherence issues between the master cachefile on the pool and a copy in the initramfs by allowing us to omit a copy in the initramfs.

This is still not an answer to my question I've been asking several times: How is this "better" than just importing it straight away?? If that forced ro import succeeds, a rw one should work to, so why bother!?

Another important feature is the 'refresh' feature that I implemented.

Sorry, but what have you been smoking NOW?? How is that any different than what I'm doing?
Other than generating a cache file of course, which you weren't interested in to start with, since you imported the pool without one!

@Bronek
Copy link

Bronek commented Oct 2, 2015

@FransUrbo cachefile allows the system to import all the pools which are defined in it, and ignore (at boot times) those which aren't, but are somehow attached (eg alternative SAN paths, pools defined inside zvols for VMs to use, random attached storage devices etc.). Sorry for stating the obvious, but you seem to have missed it.

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

@FransUrbo A readonly import, a verbatim import and a normal import are three different things, despite them all being done through the zpool import command. Using readonly import followed by verbatim import means that the vdev labels are not marked dirty and zpool history is not polluted by a command that the user never issued. Furthermore, export/import at shutdown/boot dirties the labels. That causes us to lose the ability to do transaction rollback because we cannot distinguish between the export/import and a change in the vdev geometry, which is an integrity issue. In the event that a pool is damaged, the vdev labels being dirtied on shutdown can make a damaged pool irrecoverable with the standard commandline tools.

There are likely other issues too, but it has been 3 years since I investigated this and I do not recall every issue that I encountered back then. However, the issues were substantial enough that I concluded that export on shutdown and non-verbatim import at boot were bad ideas. Given that the zpool.cache was included in the initramfs and the module would automatically do a verbatim import, I did not have to worry about superfluous zpool import commands in the zpool history outside of unusual cases, but that has since regressed with 0.6.5. Since I happen to agree with the reasoning that disabled the behavior, my solution is to do a readonly import to extract the cachefile followed by a verbatim import that loads the state from the cachefile into the kernel. This simultaneously addresses the issue of superfluous zpool import commands appearing in zpool history and the issue of out-of-sync cachefiles causing pools to fault on import.

@Bronek The new approach taken in genkernel's zfs branch and bliss-initramfs does not handle multipath correctly when there are SPA namespace collisions. We need to address this by havng the bootloader to pass information on the kernel commandline. However, it does seem to handle systems as they are deployed today because no one has complained yet to my knowledge.

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

Since the initramfs and boot scripts are two different (although somewhat related things), I just filed the following issues for export on shutdown and import on startup:

#3875
#3876

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

@behlendorf In a similar vein, the new behavior in the scripts means that we see zpool history pollution and vdev labels being dirtied. As I said in #3875 and #3876, this breaks zpool import -T and zpool import -F because the kernel cannot distinguish between a vdev configuration change and the export. It also pollutes the zpool history with commands that the user never actually ran on the pool. Boot scripts should never pollute the history or artificially limit our ability to deal with catastrophic pool corruption events. Consequently, I consider this to be a serious regression.

@FransUrbo
Copy link
Contributor

I think the export issue is warranted a separate issue. And I have no problem disabling that, or making it configurable (although I don't see the point in that). I don't actually export the pool myself (because I'm booting from it).

But I think we should continue discussing the export problem in a issue of its own.

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

There are three things here. The boot scripts, dracut at boot and shutdown. We now have an issue for each.

That said, the behavior that went into the repository damages ZFS' ability to use the uberblock history to recover from severe corruption events and pollutes the pool. It causes problems for system administrators who use ZFS on guests and the host in KVM and Xen environments beyond just making system administration a pain. The fact that the behavior is unnecessarily different from the other OpenZFS platforms when we are trying to reduce differences is minor in comparison to the other issues that it caused.

@MrStaticVoid's patch fixes the main regression and my opinion is that should be merged for 0.6.5.3. While there are lingering cosmetic issues (printing that no pools were imported when the initramfs imported the pools), this is a regression introduced by the original scripts. Fixing a cosmetic issue is not quite as important as fixing the main regression. The only things left to discuss should not be blockers.

Further discussion is unlikely to make the rest of us think that ZFS should act like LVM and MD RAID. The way those two do things have always been a pain for system administrators and it is not a behavior that we should adopt. I truly regret not catching this under earlier review. Had I thought that the zpool-import script could cause such problems, I would have read it line by line rather than spot checking due to limited time.

@FransUrbo
Copy link
Contributor

I furiously argue against putting this into the snapshot branches! Not only because it introduces a significant and dangerous (because of untested code/logic) change in behavior, but also because this isn't what we want for the general Linux public.

And if zpool exports dirties the uberblocks or what not, then that needs to be fixed, because that really sounds like a very serious issue. A separate issue!

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

@FransUrbo The behavior shipped in 0.6.5 is broken and it is being deployed to users as they upgrade. If we backport the patch now, a substantial number of systems that have yet to be upgraded will be spared from the pain of dealing with this regression. As for the systems that were already upgraded, the patch has been deployed to Gentoo users.

So far, there have been no problems switching back from the broken behavior to the correct one. This is because the pools are entered into the cachefile at boot and the export on shutdown behavior disappears when the scripts are upgraded, leaving the cachefile in a state where the imported pools are restored by the correct behavior at next boot. Any pools that failed to import or were incorrectly imported by the incorrect behavior are things that the system administrator will need to handle in either case and will be a pain without the patch.

As for zpool export dirtying the uberblock, that is not something that merits fixing. The idea of adopting the LVM/MD behavior goes against the design of ZFS by giving system administrators more work. It would not be adopted by any other platform and it would make merging code harder than it already is. It makes no sense to make more work for us by adopting a bad behavior when the original design from Solaris works better.

@Bronek
Copy link

Bronek commented Oct 2, 2015

@FransUrbo when speaking "we", please explain does this include users, or just some developers? I am a user, and I much more prefer the idea of a configuration file used to decide which pools will be imported and mounted at boot, than importing anything that happens to be available at the moment (and then automatically mounting, because of pool configuration). If such behaviour was really natural in Linux, we wouldn't have /etc/fstab . Since we want pools used by ZOL to be interchangeable with other platforms, we cannot rely on mounpoint=legacy to be universally set either. Ergo, it makes sense to follow what these other platforms do.

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

@Bronek A plaintext configuration file does not handle namespace collsions unless GUIDs are used (which is a pain) and it cannot not handle GUID collisions (which is a security issue on which I do not wish to elaborate in public). It also requires multiple commands to maintain it and to keep the system inline with the update. The zpool.cache file avoids that and should be the easiest thing to use, provided that the mistakes made porting ZFS to Linux that caused problems are addressed. Similarly, the mountpoint property replaces fstab to provide the same ease when dealing with ZFS filesystems from pools.

@Bronek
Copy link

Bronek commented Oct 2, 2015

@ryao yes I know, by configuration file I was referring to zpool.cache . Its a shame that it's not plain text, but at least zdb can be used to view it.

@fearedbliss
Copy link

@FransUrbo "Sorry, but what have you been smoking NOW?? How is that any different than what I'm doing?
Other than generating a cache file of course, which you weren't interested in to start with, since you imported the pool without one!"

Lol.. I implemented bliss-initramfs back in 2012 with cachefile support from the near beginning of the project. Later on as @ryao and I discussed this topic over the years, I ended up removing cachefile support (and also hostid support) from bliss-initramfs. After discussing this topic again a few years after that change, I implemented again. This stuff isn't anything that I haven't already done, it's literally bringing back what I had done years ago and improving it with new ideas.

You should stop assuming you implemented all of these things first.

@fearedbliss
Copy link

@ryao In bliss-initramfs, what it expects is the following from the bootloader:

linux [kernel] root=tank/gentoo/root

Since I don't use the "bootfs" property and solely rely on the user passing their root dataset directly, would it handle the multipath/SPA stuff correctly?

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

@fearedbliss It would have a problem if multiple pools have the same name.

@FransUrbo
Copy link
Contributor

You should stop assuming you implemented all of these things first.

I never did. It's a well known fact that I only improved (slightly) on what's in ZoL now. Where that code came from, we don't exactly know.

But feel free to avoid the actually question...

@fearedbliss
Copy link

@ryao Ah I see. Alright we can discuss.

@ryao
Copy link
Contributor

ryao commented Oct 8, 2015

@behlendorf Would you merge this to HEAD? This patch makes the sysvrc/openrc scripts behave like the systemd units. All distributions I know to use those scripts sans Debian have had it backported. Having the regressed behavior in the main repository is problematic for users of Gentoo's 9999 ebuilds that build from git because they obtain a broken behavior where pools that should not be imported are imported.

The main place where this is a problem involves VM hosts where the export on shutdown behavior combined with this can cause pool corruption from the guest and the host systems importing the pool at the same time. It is also a security hazard on VM hosts where maliciously crafted pools could target zero day bugs in the zpool import routines to gain root or higher privileges on the host.

I find myself between a rock and a hard place in supporting the Gentoo users building from HEAD via the Gentoo live ebuilds Merging this will not pose problems for Debian users who do not build themselves while those that do need to apply this patch manually to get a behavior consistent with other ZFS systems. Since @FransUrbo is attached to this behavior, I imagine he would carry a patch in Debian reinstating his preferred behavior after it is merged. Consequently, merging this should pose no problem for the users of the packaging he maintains.

@ryao
Copy link
Contributor

ryao commented Oct 8, 2015

@behlendorf #3889 appears to have occurred where two nodes had access to the same pools over SAS and they both tried importing the pools at the same time, causing pool corruption. This happened on CentOS, but on distributions where the "import all pools" behavior is used in conjunction with either export at shutdown, pool corruption will occur, even if the default spl_hostid is changed from 0 to prevent it.

@ryao
Copy link
Contributor

ryao commented Oct 9, 2015

@MrStaticVoid Your prediction of a security vulnerability was correct. I have figured out how to exploit this. I have sent the details in a PGP encrypted email to @behlendorf. I think it will be appropriate to write a second patch killing the option for the auto-import behavior. Then @behlendorf can decide what he wants to do.

@jameslikeslinux
Copy link
Contributor Author

Thanks for looking out ryao. I'm leaving for a trip so I don't have time to submit another pull request, but I gather the older version of this pull request:

https://github.com/MrStaticVoid/zfs/commit/f135bb10161ed8b543aed6cd1c1a3bd052d47597.diff

would be a good starting point, after s/reimport/verbatim import/g. I think it would also be worth looking at the various initrd codes to see if they are affected by what you found.

@behlendorf
Copy link
Contributor

@MrStaticVoid thank you for investigating this and posting a fix. This definitely is a serious accidental regression I should have caught when initially reviewing the init scripts. Many of the points you and @ryao raised above I agree with. But most importantly we absolutely need to preserve using the cache file for now as an added layer of protection to ensure systems with shared storage don't end up with a pool imported concurrently by multiple hosts. I'll be merging your patch to the master and @nedbass can cherry-pick it for the release branch and 0.6.5.3.

That said, longer term I still would like to seriously explore replacing (or supplementing) the existing zpool.cache file with a clear text version. One of my major issues with the cache file is that it doesn't in any way separate pool configuration information from pool state information. And by that I mean the geometry of a pool should be split from what actions should be taken for that pool at boot. This is almost certainly just an unfortunate side effect of how the cache file was originally implemented. But since the cache file is opaque and its purpose largely misunderstood this only leads to more confusion.

If someone is particularly interested in this area I'd love to see a plan laid out for how to improve the current situation. For example, I agree with @FransUrbo that on Linux systems some of the arguments for keeping the cache file, such as speeding up the boot, just don't make any sense. Unlike the other platforms all block devices are brought online asynchronously and udev probes them with blkid. It seems to me the optimal thing to do would be to leverage udev to dynamically construct the geometry for all available pools. Then the policy for which pools should be imported, and under what conditions, should reside in a clearly human readable configuration file with sane defaults. This would ensure pools are imported as early as possible, but no sooner, and the policy by which a pool is imported is clearly articulated and visible to users.

I wish I personally had time to work on this and investigate all the issues but the fact of the matter is I don't. And I don't think we need to be constrained by what the other platforms have done. The way Linux system bootstraps itself is very different from illumos and the BSDs and we should take advantage of that. This would be a valuable project to work if we have anyone interested.

@behlendorf
Copy link
Contributor

Merged as:

33df62d zfs-import: Perform verbatim import using cache file

@behlendorf behlendorf closed this Oct 13, 2015
@ryao
Copy link
Contributor

ryao commented Oct 14, 2015

Unlike the other platforms all block devices are brought online asynchronously and udev probes them with blkid. It seems to me the optimal thing to do would be to leverage udev to dynamically construct the geometry for all available pools.

We could likely address this in a way that it will not matter by preferring /dev/disk/by-id/* symlinks. I have been told that while FreeBSD lists things like /dev/ada0 in the cachefile, it does not actually use that and instead goes by GUID. I have yet to confirm that, but I think something along these lines would achieve this with minimal platform-specific changes.

I consider the need for various initramfs software to generate the symlinks to be a blocker, so I hvae not implemented this yet, but my development in my spare time is heading in that direction.

gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this pull request Oct 6, 2019
openzfs/zfs#3800

Package-Manager: portage-2.2.20.1
Signed-off-by: Richard Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zfs-import script does not honor the cachefile
7 participants