Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool moved from FreeNAS to Ubuntu, problems with replacing disk #6011

Closed
Thumper333 opened this issue Apr 13, 2017 · 14 comments
Closed

Pool moved from FreeNAS to Ubuntu, problems with replacing disk #6011

Thumper333 opened this issue Apr 13, 2017 · 14 comments

Comments

@Thumper333
Copy link

System information

Type Version/Name
Distribution Name Yakkety
Distribution Version 16.10
Linux Kernel
Architecture
ZFS Version 0.6.5.8-0ubuntu4.1
SPL Version 0.6.5.8-2

Describe the problem you're observing

Zpool originally created in FreeNAS then migrated to Ubuntu. Almost immediately replaced one drive in a mirror. Scrubs on 4TB would only run for 2minutes and complete. I found that when replacing the drive, zfs created the partitions on the drive different than the drive it was supposed to mirror.

Describe how to reproduce the problem

create mirrored pool in FreeNAS
remove disks from FreeNAS box (admittedly without using export)
import pool to Ubuntu box
remove drive
install new drive
run zpool replace on new drive

Here is how I found the issue: (formatted and trimmed for ease of reading)

sudo fdisk -l

Disk /dev/sdc: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 7127FE7D-E061-11E6-BD1F-3497F600DDAF

Device            Start            End                 Sectors      Size    Type
/dev/sdc1     4096         4198399          4194304       2G    FreeBSD swap
/dev/sdc2  4198400 11721043967 11716845568  5.5T  FreeBSD ZFS

------------------------------------------------------------

Disk /dev/sdd: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: E799A1D5-F9B7-C843-AB62-AADC9B0A2180

Device                Start               End                     Sectors        Size     Type
/dev/sdd1         2048          11721027583   11721025536  5.5T    Solaris /usr & Apple ZFS
/dev/sdd9  11721027584 11721043967          16384           8M    Solaris reserved 1


Do I..

1.) backup all data, wipe the pool and recreate it all. I don't have enough storage to do so without striping 3 disks together with no redundancy.
2.) fix the existing pool by manually creating partitions on the replacement disk which was now wiped by testing with badblocks. (don't even know if this is possible. Disk is 100% good)

@Thumper333
Copy link
Author

While I'm certainly looking for answers on how to fix it. It seemed like a bug to me that ZFS isn't able to match the partitions up correctly with a simple zfs replace command. Is it not supposed to be able to do that when created on a different platform?

@behlendorf
Copy link
Contributor

@Thumper333 due to differences between how block devices are handled across the OpenZFS platforms partitions/slices will be created differently. However, this shouldn't cause any functional problems. All that's required is that the primary partition used by ZFS on each drive be large enough to contain the full contents of the mirrored device. They do not need to be partitioned identically.

In your case that's the /dev/sdc2 and /dev/sdd1 partitions. The /dev/sdd1 partition is slightly larger than /dev/sdc2 since it didn't create a FreeBSD swap partition, but that should be fine.

Could you clarify exactly what the observable problem is? Is it that the scrub took much less time than you expected? If so might it be because the pool doesn't contain that much data?

@Thumper333
Copy link
Author

There is almost 4 terabytes of data on SDC and the scrub takes about two minutes. I also get checks on mirrors every time. I believe it is trying to copy all the information from the large partition over to the swap partition on the other Drive. I noticed that when I do zpool status, the two drives it shows in the mirror are sdc2 and sdd. Should it not be showing sdd1?

@behlendorf
Copy link
Contributor

Historically on Solaris zpool status has hid the partition information from the users when given a whole device to use. That concept was kept for the Linux version but due to differences in block device naming conventions mixing devices created on different platforms can result in different output. You can definitively tell what partitions ZFS is using with the -P option to zpool status -vP, it will output the full path for each vdev.

As for the scrub I agree 2 minutes doesn't sound reasonable. Are there any errors being logged either to the console dmesg or in the zpool status output? You can also check the zpool events log.

@Thumper333
Copy link
Author

Thumper333 commented Apr 14, 2017

I've never seen the dmesg screen, that's good to know about. Here are the last few lines. I'm assuming the numbers on the left is a time stamp, maybe in seconds since reboot?

[   14.966585] e1000e: enp8s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   14.966927] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0: link becomes ready
[ 8210.111513] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[22481.285923] perf: interrupt took too long (3137 > 3128), lowering kernel.perf_event_max_sample_rate to 63750
[28377.095312] perf: interrupt took too long (3923 > 3921), lowering kernel.perf_event_max_sample_rate to 50750
[37016.333014] ata4.00: request sense failed stat 50 emask 0
[37138.497120] ata4.00: request sense failed stat 50 emask 0
[37672.478613] perf: interrupt took too long (4914 > 4903), lowering kernel.perf_event_max_sample_rate to 40500
[64708.547853] perf: interrupt took too long (6166 > 6142), lowering kernel.perf_event_max_sample_rate to 32250
[108636.747078] systemd[1]: apt-daily.timer: Adding 5h 57min 11.119592s random time.
[108639.297887] systemd[1]: apt-daily.timer: Adding 4h 33min 171.631ms random time.
[108639.670594] systemd[1]: apt-daily.timer: Adding 10h 2min 56.577325s random time.
[108943.483811]  sdd: AHDI sdd1 sdd2 sdd3 sdd4

Also, I used zpool status -vP to see direct paths which confirms that they are indeed mapped to the correct partitions, so at least we know that's not the issue.

here is zpool status -vP

 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 60K in 0h2m with 0 errors on Tue Apr 11 15:36:29 2017
config:

        NAME           STATE     READ WRITE CKSUM
        Tank           DEGRADED     0     0     0
          mirror-0     DEGRADED     0     0     0
            /dev/sdc2  ONLINE       0     0     0
            /dev/sdd1  OFFLINE      0     0   993

errors: No known data errors

even after I had cleared the error and scrub again, it would do the exact same thing. That's what lead me to run badblocks on sdd, which came up fine.

At this point badblocks completed, and the one disk is obviously sitting there wiped. I would kinda like to get the swap partitions to match at 2G each, but if that's not possible fine. I'll put the disk back into service and then I can get some more data for you, but how do you want me to do this? Just zpool online sdd or zpool replace sdd sdd?

Again, there is about 4TB on sdc, so it should take a LONG time to resilver. I've never seen it spend more than a few minutes resilvering or scrubbing... ever. I can pay close attention to whatever you'd like. Just let me know next steps.

Edit for more info

After more research decided to try zpool replace Tank sdd and I get this error:

invalid vdev specification
use '-f' to override the following errors:
/dev/sdd does not contain an EFI label but it may contain partition
information in the MBR. 

I don't know how it's going to get the info to create the new swap partition. Will it look at what the "old" ones sdd had, or will it try to match sdc? Either way, Is there a way to force it to match sdc?

Edit for more info

okay, still something very wrong. I figured out how to match the partitions from the good drive using fdisk. For those wondering, run fdisk /dev/sdx to open fdisk's format utility. I opened it on the good drive and selected the option to write the partition formatting to a file, then closed without writing any changes. Then I opened fdisk on the now blank disk and selected the option to open from the file. Now write the changes to the disk and done.

So before doing that, I had completely tested the replacement drive. I ran zpool detach sdd to remove the drive from the pool, leaving only the good drive with all my data. I then ran fdisk to get the partitions to match. Then I ran zpool attach Tank sdc2 sdd2. The drive immediately starts resilvering and everything finally looks like I'll be home free. Same thing happens again. Resilvering runs for about 2 minutes and then I get checksum errors and the pool stops resilvering.

  pool: Tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Apr 15 00:53:53 2017
    2.65G scanned out of 3.14T at 26.3M/s, 34h40m to go
    2.65G resilvered, 0.08% done
config:

        NAME        STATE     READ WRITE CKSUM
        Tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc2    ONLINE       0     0     0
            sdd2    ONLINE       0     0     0  (resilvering)

errors: No known data errors
root@NAS:~# zpool status Tank
  pool: Tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 2.65G in 0h1m with 0 errors on Sat Apr 15 00:55:44 2017
config:

        NAME        STATE     READ WRITE CKSUM
        Tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc2    ONLINE       0     0     0
            sdd2    ONLINE       0     0     8

errors: No known data errors

So at this point I tried to just destroy the pool and start from scratch, but it won't let me do that, saying that the pool is busy. Not sure what else to do.

@Thumper333
Copy link
Author

I did everything I could to save this, and asked around for about two months for help and nobody could figure out how to save the pool, or what it was actually doing. I still believe there was something goofy with how zfs was handling it, as nomatter what I did I ended up with a zpool that would only partially resilver, and then constantly have checksum errors when trying to scrub. This weekend I finally gave up and forced a destroy of the pool (I was able to finally figure out how to stop programs accessing it. ) After going through and double and tripple checking that all my data was backed up on a separate drive, I destroyed the pool and created a fresh one. I then rsync'd the data back to the new pool and watched it take all the data. Everything seems to be working fine, but I'm still convinced that there was a bug to be worked out. If anyone wants to contact me for more info, I'm happy to provide any logs or anything to help with that.

If anyone else find this thread because they cameup with the same problem. Here is my advice...

1.) Back up your data - I never lost anything, but I felt really nervous knowing that my 2nd drive in the mirror didn't have anything on it. In my case, the data wasn't totally critical CAN get it all back, but would be a major major pain to do so.
2.) Test your drive - with my research, I found the best overall testing plan was to offline the questionalbe drive and then run badblocks on it. This is a program that will write patterns on every sector of the disk and then read them back to verify everything works. It takes a long time on large drives. Mine too about 72 hours to run on 6TB. Sigh. At least my drive came up fine. After that, I ran a long smart test, another long test, but at least this one can be done quicker. Mine took about 8 hours if I recall. You could probably get away with a short test if you are impatient, but I wanted to be sure. Might as well run short tests on all your drives while you're at it. Heck, run memtest too if you want to be really thorough.
3.) If your drive comes up fine like mine, you have a few options, but in the end none of these worked for me. I tried removing the bad drive from the mirror, then matched up the partitions manually with fdisk, then added the disk back to let it resilver. I ended up with the same 2 minute resilver and then checksum errors again.
4.) In the end, I destroyed the pool, formatted how I wanted it with a swap file space in case I ever move it back to FreeNAS, and created a new mirrored pool. You'll then have no data in it and have to move it from backup to your new pool. One thing to note is that you'll need to specify your mount point to mimic your old pool's mount if you want to avoid modifying anything else that was going to be looking in the /mnt folder where FreeNAS put it. It will default to mounting in / which really threw me for a loop for a bit.

Good luck. Just remember, all your data is still there until do you a zpool destroy . So as nervous as I was messing with all the commands, I never lost it until I did the destroy. It will be gone at this point though (unless there's a way to recover from this which I'm not aware of).

@loli10K
Copy link
Contributor

loli10K commented Apr 21, 2017

@Thumper333 we now have other users that seem to be having your same issue. I'd like to close this and keep the conversation in one place so we can "pool" (no pun intended) our info about this problem and hopefully solve it.

@errantmind
Copy link

Just wanted to note that I am having the same issue. I created a pool on FreeBSD and ended up deciding to move to Ubuntu. I got the pool back up and running and can access files just fine, but if I attempt to scrub it runs for 1 minute then stops. I have 5TB of data so it should definitely run for longer than 1 minute. I see no issues in dmesg nor are any reported to me using any utilities.

@loli10K
Copy link
Contributor

loli10K commented Jun 5, 2017

The same issue has been recently reported on the ML: http://list.zfsonlinux.org/pipermail/zfs-discuss/2017-May/028389.html

I'm hoping for some guidance troubleshooting an issue. I'm running Fedora
25 with a 14 disk RAIDZ2 pool (not an optimal arrangement, I know). The
pool was created under FreeNAS 10...

When I run a scrub, it does about 0.02% and then stops without an error.

@errantmind which version of zfs are you running on ubuntu? Is this a raidz or mirror?

@Thumper333
Copy link
Author

Thumper333 commented Jun 5, 2017

I'm still watching this thread and I don't mind helping, although I did end up buying an extra disk for backup, backing everything up, destroying the pools, wiping the disks, and migrating everything back to new pools created in ubuntu. If there is anything I can do, or questions I can answer, please ask.

@bunder2015
Copy link
Contributor

bunder2015 commented Jun 6, 2017

I tried reproducing this on a VM but was unsuccessful. I'm wondering if I'm missing a step here...

zpool attach to single disk pool on fbsd 11.0-release to make the mirror, wait for resilver
reboot into linux sysresccd 0.7.0-rc3
zpool detach disk, wipe the partition table with gdisk
copy partition table from old disk to new disk with gdisk, zpool attach, wait for resilver
zpool scrub

I was able to add the disk and scrub okay... I'd be glad to test again as well.

@loli10K
Copy link
Contributor

loli10K commented Jun 6, 2017

@bunder2015 there's information in the other issue suggesting this is not reproducible on 0.7.0-rc3 (#6038 (comment)).

Like i said before it would be nice to close this issue and keep all the information we have in one place to avoid exactly this.

@errantmind
Copy link

@loli10K Hey, been busy for a while but I thought I would post some more information, let me know if I can be of more help:
FreeNAS 9 -> Ubuntu 16.04.2 LTS
raid-z2, 8x2TB disk array

sudo apt list --installed | grep zfs :
libzfs2linux/xenial-updates,now 0.6.5.6-0ubuntu17 amd64 [installed,automatic]
zfs-doc/xenial-updates,xenial-updates,now 0.6.5.6-0ubuntu17 all [installed,automatic]
zfs-zed/xenial-updates,now 0.6.5.6-0ubuntu17 amd64 [installed,automatic]
zfsutils-linux/xenial-updates,now 0.6.5.6-0ubuntu17 amd64 [installed]

@loli10K
Copy link
Contributor

loli10K commented Jul 11, 2017

This should be fixed in 94d353a.

@loli10K loli10K closed this as completed Jul 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants