-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistake adding log device as single-drive vdev seems unrecoverable #6907
Comments
Well, since Solaris ZFS ensures that the |
If you're saying that is the documented behaviour of ZFS on Linux, then this is actually a bug, since the exact commands I showed above demonstrate a situation in which that check is not working. |
@tesujimath So, presuming for the purpose of discussion that at some point you did run zpool add without -f and it added the device, could you please share what ZoL version was on the machine at the time? (The contents of "zpool history | grep zpool" would probably also be useful, but also might have been absorbed in the morass of other commands often run on a pool in the intervening period.) |
@kpande please believe me. I'm not making this up. Here's my recent history:
My goal was to rename the rawly named log device using its vdev-id alias. @rincebrain The information you request is in the original issue comment. Or did I misunderstand you? |
@tesujimath One of the main things that is unusual is that usually, for non-{log,cache} vdevs, it always orders them by when they were added to a pool, so the status output you're seeing suggests you added 5 of the raidz vdevs after the fateful zpool add command, which is why I was asking for confirmation that you are running the same ZoL version now that you were when you added the device. The fact that it's showing up this way...is fascinating. |
@rincebrain Yep, upgraded earlier today from ZFS 0.6.5.9 to 0.7.3, and 0.7.3 is definitely what was running when I erroneously added H35 to my pool. I rebooted twice after the upgrade before doing this. So no chance of old zfs kernel module hanging around. I'm using zfs-kmod, BTW. Fascination is not my dominant emotion just now ... |
@kpande Is it relevant that my zpool version hasn't been upgraded since 0.6.5.9, so the new zpool features in 0.7.3 have not been activated in the zpool? |
@tesujimath Occupational hazard, my day job is primarily "huh, why did it do that", even when it's a five-alarm fire. So, a few remarks, before I go on:
The fact that you made the pool on 0.6.5.9 but did the remove and add on the pool after upgrading 0.7.3 is useful for reproduction, but we won't know if it's relevant to why this happened until after we get it reproducing somewhere else. |
@rincebrain Thanks for those ideas. I am mulling it over. Actually, the zpool was created several years ago, on ZFS 0.6.0-rc14 I think, and then zpool upgraded over the years. Current feature flags are these:
|
@kpande I managed to reproduce the problem on a test server, with a brand newly created zpool. Here is the sequence that exhibits the nasty behviour. Essentially, we create a zpool with a log device, extend the pool, then remove the log device and add it as a standalone vdev (without
And just to be clear, here's my version info:
|
For reference the You can run it locally from the ZoL source tree by running.
Here's an except from the log.
I'm reopening this issue until this is understood. |
I can confirm that doing the following reproduces this on 0.7.3, though I was confused to discover very strange failures of trying to run even existing zfs-tests/.../zpool_add/ tests on my vanilla CentOS 7 VM.
The additional vdev after the log appears necessary, as doing this without it complains as expected on 0.7.3. |
Who knew, mixing git master's zfs-tests with 0.7.3 doesn't work well. Shocking, I know. I haven't opened a PR because I haven't done the linting and cleanup yet, but you can find a test in https://github.com/rincebrain/zfs/tree/6907_test |
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#6907
Proposed fix in #6911 with a shamelessly stolen version of @rincebrain's test case. |
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#6907
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#6907
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#6907
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#6907
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6907 Closes #6911
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6907 Closes openzfs#6911
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6907 Closes openzfs#6911
When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6907 Closes openzfs#6911
I mistyped the zpool command to add a log device, so it got added as a single-disk vdev, alongside all my raidz1 vdevs. Each of them are around 5TB, and the wannabee log device is an 8GB ZeusRAM drive (i.e. tiny).
I now can't remove the device, and my zpool is in a really vulnerable state. And has no log device.
This is a busy production fileserver, and since there is 56TB of data in this zpool, copying the data to another fileserver so I can destroy and re-create the zpool is very unattractive. I see #3371 would be a solution. Alternatively, perhaps I could install another OpenZFS implementation (such as Delphix), and use that to recover my zpool, then revert back to ZFS on Linux. Or use some development branch of ZFS on Linux for the recovery, if such a thing exists.
Any suggestions here on what would be possible will be gratefully received.
System information
Describe the problem you're observing
I mistakenly added a single drive to a zpool as a vdev rather than a log device. Now I can't remove it.
Describe how to reproduce the problem
[Edit: See later comment for reproducible sequence]
I mistyped the command to add a log device to my pool. Instead of this:
I typed this:
Trying to remove it gives me this error:
Here's what my zpool looks like now:
The text was updated successfully, but these errors were encountered: