Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.6.5 Regression: Exporting the pool on shutdown in dracut dirties vdev labels and pollutes zpool history. #3875

Closed
ryao opened this issue Oct 2, 2015 · 11 comments
Labels
Component: Dracut dracut integration
Milestone

Comments

@ryao
Copy link
Contributor

ryao commented Oct 2, 2015

The new dracut code that does pool export on shutdown in dracut dirties the vdev labels, which breaks zpool import -T and zpool import -F because the kernel cannot distinguish between a vdev configuration change and the export. It also pollutes the zpool history with commands that the user never actually ran on the pool.

I did not pay too much attention to what was happening in the dracut module because people on Gentoo use genkernel and people at work do not worry about / on ZFS, but I am starting to pay attention because the incorrect behavior could make catastrophic failures that could be recovered with some minor assistance from the ZoL project irrecoverable without major assistance.

I tried export on shutdown in 2012 in Gentoo and abandoned it after discovering the problems it caused:

https://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/sys-fs/zfs/files/zfs-shutdown?view=log&pathrev=MAIN

The export on shutdown behavior in the dracut zfs module should be removed.

@ryao ryao changed the title Regression: Exporting the pool on shutdown in dracut dirties vdev labels and pollutes zpool history. 0.6.5 Regression: Exporting the pool on shutdown in dracut dirties vdev labels and pollutes zpool history. Oct 2, 2015
@behlendorf
Copy link
Contributor

@ryao I don't believe this is quite so clear cut. Cleanly exporting the pool resolves a whole raft of other issues which will be reintroduced if we revert this. See commit 07a3312.

And while it's true that cleanly exporting the pool complicates using zpool import -T and zpool import -F I'd argue that's an issue with those utilities. We should fix them.

@behlendorf behlendorf added this to the 0.7.0 milestone Oct 2, 2015
@ryao
Copy link
Contributor Author

ryao commented Oct 3, 2015

@behlendorf All of those issues would be correctedly addressed by the the behavior suggested in #3876 without requiring any divergence from Illumos.

Alternatively, zpool export -F could be used, but that would not fix the history issue. We should not be exporting the pool at all though. The export command is only meant to be used to say that the system administrator does not want the system to own the pool anymore.

@ryao
Copy link
Contributor Author

ryao commented Oct 3, 2015

@behlendorf I am rewriting the dracut ZFS module to fix this issue, #3876 and the unreported issue of pools faulting on import when a vdev is added. The WIP code is in my git repository's dracut branch:

https://github.com/ryao/zfs/tree/dracut

I will open a pull request after it has been properly tested.

@ryao
Copy link
Contributor Author

ryao commented Oct 3, 2015

Pull request #3879 is opened with a tentative fix.

@FransUrbo
Copy link
Contributor

This isn't a regression. It's been doing this "for years" and as far as I can tell, no one have ever complained about this - "if it ain't broken, don't fix it".

@ryao
Copy link
Contributor Author

ryao commented Oct 8, 2015

This is a regression that breaks multiple things. I tried this technique in 2012 and abandoned it because it caused problems such as this. ZFS scripts should never execute an export operation.

@ryao
Copy link
Contributor Author

ryao commented Oct 8, 2015

@behlendorf As I said in #3800, #3889 appears to have occurred where two nodes had access to the same pools over SAS and they both tried importing the pools at the same time, causing pool corruption. This happened on CentOS, but on distributions where the "import all pools" behavior is used in conjunction with either export at shutdown, pool corruption will occur, even if the default spl_hostid is changed from 0 to prevent it.

This export on shutdown behavior needs to be removed. It breaks the mechanism meant to keep ZFS safe in multipath environments. Do we need to wait for data loss to occur in production before we address this?

@FransUrbo
Copy link
Contributor

The export isn't the problem obviously! it's the import.

@ryao
Copy link
Contributor Author

ryao commented Oct 9, 2015

@FransUrbo Being at risk for things going south by default is something we should never do. Anyway, I have highlighted enough problems that I am confident this behavior is broken and is one of several things that must go. If you want to keep the broken behavior in Debian, I cannot stop you from doing that, but there are very good reasons why things were done they way that they were done in Solaris and being on Linux does not invalidate them.

@behlendorf behlendorf added the Component: Dracut dracut integration label Apr 29, 2016
@behlendorf
Copy link
Contributor

Proposed fix in #5238

@behlendorf
Copy link
Contributor

Closing. Commit 7e8a2d0 prevent dracut from dirtying label on export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Dracut dracut integration
Projects
None yet
Development

No branches or pull requests

3 participants