-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.6.5 Regression: Exporting the pool on shutdown in dracut dirties vdev labels and pollutes zpool history. #3875
Comments
@ryao I don't believe this is quite so clear cut. Cleanly exporting the pool resolves a whole raft of other issues which will be reintroduced if we revert this. See commit 07a3312. And while it's true that cleanly exporting the pool complicates using |
@behlendorf All of those issues would be correctedly addressed by the the behavior suggested in #3876 without requiring any divergence from Illumos. Alternatively, zpool export -F could be used, but that would not fix the history issue. We should not be exporting the pool at all though. The export command is only meant to be used to say that the system administrator does not want the system to own the pool anymore. |
@behlendorf I am rewriting the dracut ZFS module to fix this issue, #3876 and the unreported issue of pools faulting on import when a vdev is added. The WIP code is in my git repository's dracut branch: https://github.com/ryao/zfs/tree/dracut I will open a pull request after it has been properly tested. |
Pull request #3879 is opened with a tentative fix. |
This isn't a regression. It's been doing this "for years" and as far as I can tell, no one have ever complained about this - "if it ain't broken, don't fix it". |
This is a regression that breaks multiple things. I tried this technique in 2012 and abandoned it because it caused problems such as this. ZFS scripts should never execute an export operation. |
@behlendorf As I said in #3800, #3889 appears to have occurred where two nodes had access to the same pools over SAS and they both tried importing the pools at the same time, causing pool corruption. This happened on CentOS, but on distributions where the "import all pools" behavior is used in conjunction with either export at shutdown, pool corruption will occur, even if the default spl_hostid is changed from 0 to prevent it. This export on shutdown behavior needs to be removed. It breaks the mechanism meant to keep ZFS safe in multipath environments. Do we need to wait for data loss to occur in production before we address this? |
The export isn't the problem obviously! it's the import. |
@FransUrbo Being at risk for things going south by default is something we should never do. Anyway, I have highlighted enough problems that I am confident this behavior is broken and is one of several things that must go. If you want to keep the broken behavior in Debian, I cannot stop you from doing that, but there are very good reasons why things were done they way that they were done in Solaris and being on Linux does not invalidate them. |
Proposed fix in #5238 |
Closing. Commit 7e8a2d0 prevent dracut from dirtying label on export. |
The new dracut code that does pool export on shutdown in dracut dirties the vdev labels, which breaks
zpool import -T
andzpool import -F
because the kernel cannot distinguish between a vdev configuration change and the export. It also pollutes thezpool history
with commands that the user never actually ran on the pool.I did not pay too much attention to what was happening in the dracut module because people on Gentoo use genkernel and people at work do not worry about / on ZFS, but I am starting to pay attention because the incorrect behavior could make catastrophic failures that could be recovered with some minor assistance from the ZoL project irrecoverable without major assistance.
I tried export on shutdown in 2012 in Gentoo and abandoned it after discovering the problems it caused:
https://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/sys-fs/zfs/files/zfs-shutdown?view=log&pathrev=MAIN
The export on shutdown behavior in the dracut zfs module should be removed.
The text was updated successfully, but these errors were encountered: