-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pool moved from FreeNAS to Ubuntu, problems with replacing disk #6011
Comments
While I'm certainly looking for answers on how to fix it. It seemed like a bug to me that ZFS isn't able to match the partitions up correctly with a simple zfs replace command. Is it not supposed to be able to do that when created on a different platform? |
@Thumper333 due to differences between how block devices are handled across the OpenZFS platforms partitions/slices will be created differently. However, this shouldn't cause any functional problems. All that's required is that the primary partition used by ZFS on each drive be large enough to contain the full contents of the mirrored device. They do not need to be partitioned identically. In your case that's the Could you clarify exactly what the observable problem is? Is it that the scrub took much less time than you expected? If so might it be because the pool doesn't contain that much data? |
There is almost 4 terabytes of data on SDC and the scrub takes about two minutes. I also get checks on mirrors every time. I believe it is trying to copy all the information from the large partition over to the swap partition on the other Drive. I noticed that when I do zpool status, the two drives it shows in the mirror are sdc2 and sdd. Should it not be showing sdd1? |
Historically on Solaris As for the scrub I agree 2 minutes doesn't sound reasonable. Are there any errors being logged either to the console |
I've never seen the
Also, I used here is
even after I had cleared the error and scrub again, it would do the exact same thing. That's what lead me to run badblocks on sdd, which came up fine. At this point badblocks completed, and the one disk is obviously sitting there wiped. I would kinda like to get the swap partitions to match at 2G each, but if that's not possible fine. I'll put the disk back into service and then I can get some more data for you, but how do you want me to do this? Just Again, there is about 4TB on sdc, so it should take a LONG time to resilver. I've never seen it spend more than a few minutes resilvering or scrubbing... ever. I can pay close attention to whatever you'd like. Just let me know next steps. Edit for more infoAfter more research decided to try
I don't know how it's going to get the info to create the new swap partition. Will it look at what the "old" ones sdd had, or will it try to match sdc? Either way, Is there a way to force it to match sdc? Edit for more infookay, still something very wrong. I figured out how to match the partitions from the good drive using fdisk. For those wondering, run So before doing that, I had completely tested the replacement drive. I ran
So at this point I tried to just destroy the pool and start from scratch, but it won't let me do that, saying that the pool is busy. Not sure what else to do. |
I did everything I could to save this, and asked around for about two months for help and nobody could figure out how to save the pool, or what it was actually doing. I still believe there was something goofy with how zfs was handling it, as nomatter what I did I ended up with a zpool that would only partially resilver, and then constantly have checksum errors when trying to scrub. This weekend I finally gave up and forced a destroy of the pool (I was able to finally figure out how to stop programs accessing it. ) After going through and double and tripple checking that all my data was backed up on a separate drive, I destroyed the pool and created a fresh one. I then rsync'd the data back to the new pool and watched it take all the data. Everything seems to be working fine, but I'm still convinced that there was a bug to be worked out. If anyone wants to contact me for more info, I'm happy to provide any logs or anything to help with that. If anyone else find this thread because they cameup with the same problem. Here is my advice... 1.) Back up your data - I never lost anything, but I felt really nervous knowing that my 2nd drive in the mirror didn't have anything on it. In my case, the data wasn't totally critical CAN get it all back, but would be a major major pain to do so. Good luck. Just remember, all your data is still there until do you a zpool destroy . So as nervous as I was messing with all the commands, I never lost it until I did the destroy. It will be gone at this point though (unless there's a way to recover from this which I'm not aware of). |
@Thumper333 we now have other users that seem to be having your same issue. I'd like to close this and keep the conversation in one place so we can "pool" (no pun intended) our info about this problem and hopefully solve it. |
Just wanted to note that I am having the same issue. I created a pool on FreeBSD and ended up deciding to move to Ubuntu. I got the pool back up and running and can access files just fine, but if I attempt to scrub it runs for 1 minute then stops. I have 5TB of data so it should definitely run for longer than 1 minute. I see no issues in dmesg nor are any reported to me using any utilities. |
The same issue has been recently reported on the ML: http://list.zfsonlinux.org/pipermail/zfs-discuss/2017-May/028389.html
@errantmind which version of zfs are you running on ubuntu? Is this a raidz or mirror? |
I'm still watching this thread and I don't mind helping, although I did end up buying an extra disk for backup, backing everything up, destroying the pools, wiping the disks, and migrating everything back to new pools created in ubuntu. If there is anything I can do, or questions I can answer, please ask. |
I tried reproducing this on a VM but was unsuccessful. I'm wondering if I'm missing a step here...
I was able to add the disk and scrub okay... I'd be glad to test again as well. |
@bunder2015 there's information in the other issue suggesting this is not reproducible on 0.7.0-rc3 (#6038 (comment)). Like i said before it would be nice to close this issue and keep all the information we have in one place to avoid exactly this. |
@loli10K Hey, been busy for a while but I thought I would post some more information, let me know if I can be of more help: sudo apt list --installed | grep zfs : |
This should be fixed in 94d353a. |
System information
Describe the problem you're observing
Zpool originally created in FreeNAS then migrated to Ubuntu. Almost immediately replaced one drive in a mirror. Scrubs on 4TB would only run for 2minutes and complete. I found that when replacing the drive, zfs created the partitions on the drive different than the drive it was supposed to mirror.
Describe how to reproduce the problem
create mirrored pool in FreeNAS
remove disks from FreeNAS box (admittedly without using export)
import pool to Ubuntu box
remove drive
install new drive
run zpool replace on new drive
Here is how I found the issue: (formatted and trimmed for ease of reading)
Do I..
1.) backup all data, wipe the pool and recreate it all. I don't have enough storage to do so without striping 3 disks together with no redundancy.
2.) fix the existing pool by manually creating partitions on the replacement disk which was now wiped by testing with badblocks. (don't even know if this is possible. Disk is 100% good)
The text was updated successfully, but these errors were encountered: