Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permanent errors have been detected in the following files with clean scrub and no other errors #13859

Open
rkeiii opened this issue Sep 9, 2022 · 3 comments
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rkeiii
Copy link

rkeiii commented Sep 9, 2022

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04
Kernel Version 5.15.0-47-generic
Architecture Intel x64
OpenZFS Version 2.1.5-1~22.04.york0 (zfs version also lists zfs-kmod-2.1.4-0ubuntu0.1!?!?)

Describe the problem you're observing

After a power loss event I am unable to mount most of my ZFS filesystems. I have performed at least three scrubs now. The output of the zfs status command claims that a device experienced an error but I can find no information about which device or when. I'm wondering if this is a bug due to seemingly inconsistent information from zpool status. After each scrub the zpool status -v command again shows entirely clean. However when I try to zfs mount -a I get the following

rkeiii@ate:~$ sudo zfs mount -a
cannot mount 'bits/enc/ghd': Input/output error
cannot mount 'bits/enc/vmware': Input/output error
cannot mount 'bits/enc/downloads': Input/output error
cannot mount 'bits/enc/home': Input/output error
cannot mount 'bits/enc/backups': Input/output error
cannot mount 'bits/enc/personal': Input/output error

zpool status -v output

rkeiii@ate:~$ sudo zpool status -v
  pool: bits
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 02:58:52 with 0 errors on Fri Sep  9 02:34:04 2022
config:

	NAME                                    STATE     READ WRITE CKSUM
	bits                                    ONLINE       0     0     0
	  raidz2-0                              ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_JEG7NHAN  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YJ0RXPD  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YHZ6URD  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YJ0K1MD  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YHZH64D  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        bits/enc/backups:<0x0>
        bits/enc/vmware:<0x0>
        bits/enc/personal:<0x0>
        bits/enc/downloads:<0x0>
        bits/enc/home:<0x0>
        bits/enc/ghd:<0x0>
rkeiii@ate:~$

I was able to track down my original pool setup commands and the commands I used to transfer the volumes from unencrypted to encrypted ZFS filesystems:

pool create command used originally (4-5 years ago)

sudo zpool create -f bits raidz2 sda sdb sdc sdd sde

zfs create command for the encrypted fs

sudo zfs create -o compression=lz4 -o encryption=on -o keyformat=passphrase bits/enc

zfs send/recv command used to transfer the data from unencrypted ZFS FS to encrypted ZFS FS

sudo zfs send -Rw bits/downloads@zfs-auto-snap_frequent-2019-08-25-0215 | mbuffer -s 128k -m 4G | sudo zfs recv bits/enc/downloads

Describe how to reproduce the problem

I am unsure what led to this. Possibilities include:

  • I originally migrated non-encrypted ZFS datasets from within the same pool to encrypted ZFS datasets
  • The power loss event (but scrub and status are willing to report no issues?)

Include any warning/errors/backtraces from the system logs

@rkeiii rkeiii added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 9, 2022
@rincebrain
Copy link
Contributor

rincebrain commented Sep 10, 2022

Native encryption strikes again.

(The versions in zfs version differ because you're using the kernel module that shipped with your Ubuntu install and the userland from, I'm going to not-really-guess jonathonf's PPA - you need the zfs-dkms package from that PPA to run the newer kernel module too...)

I'd bet at least a nickel that the problem is the same as #13521 and #13709, so the terrible workaround I suggested there will probably work here too.

@rincebrain rincebrain added Component: Send/Recv "zfs send/recv" feature Component: Encryption "native encryption" feature labels Sep 10, 2022
@rkeiii
Copy link
Author

rkeiii commented Sep 10, 2022

@rincebrain Thank you for the workaround! That worked like a charm. I'm including the exact commands I used below for others reference if they run into this:

root@ate:~# zfs snapshot bits/enc/downloads/tv@recover1
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover2
root@ate:~# zfs send --raw -i bits/enc/downloads/tv@recover1 bits/enc/downloads/tv@recover2 > /bits/recover_downloads_tv
root@ate:~# zfs rollback -r bits/enc/downloads/tv@recover1
root@ate:~# zfs receive -F -v bits/enc/downloads/tv < /bits/recover_downloads_tv
receiving incremental stream of bits/enc/downloads/tv@recover2 into bits/enc/downloads/tv@recover2
received 1.31K stream in 1 seconds (1.31K/sec)
root@ate:~# sudo zfs mount -a

Also here's a gist with a convenient little script I cobbled together to do this (because I had 15+ afflicted filesystems): https://gist.github.com/rkeiii/0fe05fdcee6f520c208280acbf2b49ea

The script is intended to be invoked as "./recover $zfs_fs_name"

@florian-obradovic
Copy link

@rincebrain Thank you for the workaround! That worked like a charm. I'm including the exact commands I used below for others reference if they run into this:

root@ate:~# zfs snapshot bits/enc/downloads/tv@recover1
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover2
root@ate:~# zfs send --raw -i bits/enc/downloads/tv@recover1 bits/enc/downloads/tv@recover2 > /bits/recover_downloads_tv
root@ate:~# zfs rollback -r bits/enc/downloads/tv@recover1
root@ate:~# zfs receive -F -v bits/enc/downloads/tv < /bits/recover_downloads_tv
receiving incremental stream of bits/enc/downloads/tv@recover2 into bits/enc/downloads/tv@recover2
received 1.31K stream in 1 seconds (1.31K/sec)
root@ate:~# sudo zfs mount -a

Also here's a gist with a convenient little script I cobbled together to do this (because I had 15+ afflicted filesystems): https://gist.github.com/rkeiii/0fe05fdcee6f520c208280acbf2b49ea

The script is intended to be invoked as "./recover $zfs_fs_name"

@rkeiii & @rincebrain: you made my day/night. Awesome! Thank you very much!
I also ran into this (and this #13709) and can confirm that I was able to mount my datasets again!

I'm unsure how to 100% tell which datasets are affected.

Probably just try to mount them all. Or is it only the ones reported by zpool status:
errors: Permanent errors have been detected in the following files:

        tank/encrptd/Flo_Data:<0x0>
        tank/encrptd/micro_boot_backup:<0x0>

Best regards, Flo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants