-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot upgrade zfs from 0.7.10 to 0.7.11 while resilvering pool #7909
Comments
This is from journalctl for that boot: |
After resilvering and booting into kernel 4.18.7-arch1-1-ARCH with zfs 0.7.11: zpool status -v
errors: No known data errors zdb: |
As it looks the So it's likely that this is just a another case of a stale /etc/zpool.cache getting in the way (again), have you tried to move it out of the way prior a manual import? |
I've moved /etc/zfs/zpool.cache to /etc/zfs/zpool.cache.old and the output of zpool import is:
Edit: Tried both with lts and latest kernel (4.14.69-1-lts, 4.18.7-arch1-1-ARCH) |
Tried bringing the device offline and online and seems like it is ok now: zpool offline tank wwn-0x50014ee2baaf6436zpool statuspool: tank
zpool online tank wwn-0x50014ee2baaf6436zpool statuspool: tank
errors: No known data errors |
And after reboot with lts kernel and zfs 0.7.11: pool: tank
On lts kernel and 0.7.10 the pool mounts fine (zpool status):
errors: No known data errors |
Make sure that the zpool.cache in your 0.7.11 initramfs is current (=from after the pool had been cleanly imported by a 0.7.11 zfs version). Or do as I did on my systems and hack zpool.cache out of the initramfs: either by giving the Both will force zpool import to do a real import instead of loading potential garbage from a stale file. You might need to add |
It is not root pool luckily... Just pool for /home. I've made sure to delete /etc/zfs/zpool.cache before creating initramfs with 0.7.11 (and I've removed copying of zpool.cache as well, just to be sure). After reboot, the pool was not imported and I couldn't import it by hand due to corrupt metadata: pool: tank
On zfs 0.7.10 the pool works just fine (although had 2 files with permanent errors after scrubbing). |
The same is happening with zfs-linux-lts (version 0.7.11) package from https://github.com/archzfs/archzfs/wiki [root@rakef ~]# zpool import -f -d /dev/disk/by-id/ tank |
Couldn't this be affected by the fact that I'm using the whole device and created the pool under 0.7.10? And in module/zfs/vdev_disk.c there was code for 0.7.10: if (wholedisk) { |
@shartse (original commit) or @tonyhutter (having reverted it) might have a clue why 0.7.11 wouldn't import your 0.7.10 pool - I don't really have one. The 0.7.10 bug should only matter when creating the partitions, it was (as far as I got it) in the partitions created too big (beyond the end of the device). Could you post |
Disk /dev/sdb: 7814037168 sectors, 3.6 TiB Number Start (sector) End (sector) Size Code Name |
Looks good to me, apart from the protective partition no longer being at the absolute end of the drive. Could you try to |
[root@rakef ~]# zpool import -d /pool/
|
Does the pool still function on 0.7.10, or is it rejected that way for both 0.7.10 and 0.7.11 now? I would guess that the reverted logic changes in 2a16d4c might have written an incorrect set of size values that work enough for the flawed calculations in it to do the right thing, but then when something without that flawed logic tries to open it, it decides rightfully that those values are wearing flaming underwear on their heads. On 0.7.11, zdb -e -vv tank might eventually say something useful about why it's insane, or comparing the values from that on 0.7.10 and 0.7.11. (There are probably more precise ways to do this, but it's not my forte, so I'm iterating on the logic.) |
On 0.7.10 seems to be fine:
The only difference between zdb -e -vv is that on 0.7.10 it contains hostid and hostname and complains: zdb: can't open 'tank': File exists where on 0.7.11 it complains: zdb: can't open 'tank': Input/output error |
I ran strace zfs import on both 0.7.10 and 0.7.11 and on 0.7.10: On 0.7.11 it opens /dev/zfs and: |
@artee666 I'm sorry you're running into problems importing your pool. Just so I'm clear, was this the sequence of events?:
|
@tonyhutter AIUI the sequence is
|
Between openzfs/zfs#7909,openzfs/zfs#7899, openzfs/zfs#7906 and http://list.zfsonlinux.org/pipermail/zfs-discuss/2018-September/032318.html, it seems like 0.7.10 should be clearly marked as "bad".
I tried reproducing your error on some 8TB disks w/4k sectors (ashift=12), but no luck. Perhaps my disk geometry doesn't hit the issue. Here's the steps I did:
My only suggestion is to detatch the disk, wipe the label, and then try re-attaching it while the pool is running 0.7.11. |
I think I've created the one disk pool with 0.7.10, wrote there data, attached another disk to make mirrored pair and rebooted into 0.7.11 while still resilvering. I've reinstalled my machine and now I'm running ubuntu with 0.7.5. So far so good :-) |
Closing. 0.7.10 was marked a dud, and 0.7.11 released with the fix. Only disks added/replaced when running 0.7.10 with certain geometries where effected. |
System information
Distribution Name Archlinux
Distribution Version latest
Linux Kernel 4.14.69-1-lts
Architecture x64
ZFS Version 0.7.10-1
SPL Version 0.7.10-1
Describe the problem you're observing
I have pool with one disk (DISK1) and attached another disk (DISK2) to it. It started to resilver. Compiled zfs 0.7.11 and spl 0.7.11 and rebooted. The pool couldn't be imported because one device was missing. Tried to import it manually by issuing zpool import -f -d /dev/disk/by-id, but the output was that the DISK1 (with all the original data) had corrupted data. Going back to zfs 0.7.10 fixed this for me and now the pool continues to resilver.
The text was updated successfully, but these errors were encountered: