Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC: unencrypted block in encrypted object set #16065

Open
sethtroisi opened this issue Apr 5, 2024 · 3 comments
Open

PANIC: unencrypted block in encrypted object set #16065

sethtroisi opened this issue Apr 5, 2024 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@sethtroisi
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.04
Kernel Version 6.8.0-11-generic #11-Ubuntu
Architecture x86_64
OpenZFS Version 2.2.0 when problem started, not on zfs-2.2.2-0ubuntu4 zfs-kmod-2.2.2-0ubuntu4

ZFS config is a mirrored pool with two nvme drives. I configured ZFS years ago with two pools: rpool and bpool. rpool is the mirrored pool containing my linux root and and an encrypted dataset (compression lz4, encryption aes-256-gcm) for my user directory.

Describe the problem you're observing

TL;DR: Kernel Panic unencrypted block in encrypted object set from zfs send / zfs scrub

This is likely a duplicate of or similar to #13937, #15275 and possible related to #15465. The note in 15275#issuecomment-1800270695 says this should be fixed in 2.2.2, maybe it's not fixed in 2.2.0 which was the version I was using when it manifested and I still need to clean up the broken state?

I'm hoping that someone has a path forward to fixing the pool (or letting me zfs send it's data): Towards that goal I'm not sure how to find which file caused it, how to delete the bad file if I find it (I'm scared to import in anything other than read-only mode), or to zfs send excluding a set of files.

Warning Signs:
One of the drives is end of life [3]. The issue first manifested when I was compiling a graphics.h and any command (e.g. cat, vim) that tried to read the file would lock up. I would reboot, run zpool scrub see an error for that file and delete it. This happened again the next day with a different file. See [4] zpool status.

Issue:

I took a snapshot USERDATA@20240402 and tried to zfs send it to a send it to a remote machine (zfs send ... | pv | ssh four zfs receive) which sends ~20GB and then a kernel panic (PANIC: unencrypted block in encrypted object set 2760) happens and the zfs send thread locks (see [2], kill -9 won't kill it). the remote zfs receive does exit gracefully.

Now I'm on a live cd with the pool imported readonly and the end of life drive removed from the system.

sudo zpool import -R /mnt/tmp_zfs/ -f -o readonly=on rpool
sudo zfs load-key -a
sudo zfs mount rpool/USERDATA/five_crypt

I succeeded in rsync the data out from the drive but I would prefer to send the snapshot

I have tried zfs send with raw mode (-w) and without, I tried an incremental send from a previous snapshot, and a few other variations. I can't figure out how to set zfs_recovery (see below)[1] which might help.

Describe how to reproduce the problem

I number of things will cause the same panic [3]

sudo zfs send -w -V -Leb rpool/USERDATA/five_crypt@20240402 | pv | ssh four zfs receive -v ghosts/USERDATAEnter

sudo zpool scrub rpool I'm not sure this is even valid on a read-only pool.

I'm a SWE and comfortable build and helping with debug if there's productive debugging I can do. This is slightly made harder because I'm on a live cd and can't reboot but I can install ubuntu to another drive and get zfs devel version running if it's likely to help.

Attempts to avoid issue with zfs_recovery

[1] I have tried to set zfs_recovery after seeing advice in #15544 but don't think the setting is being set. I have tried echo 1 >> sudo /sys/module/zfs/parameters/zfs_recover and adding "set vfs.zfs.recovery=1" to grub boot but "cat /sys/module/zfs/parameters/zfs_recover" still prints 0.

$ grep . /sys/module/{spl,zfs}/parameters/*{panic,recover}*
/sys/module/spl/parameters/spl_panic_halt:0
/sys/module/zfs/parameters/zfs_recover:0

Include any warning/errors/backtraces from the system logs

[2]
I found this at somepoint in my syslog but that device has now been taken offline.
2024-04-03T18:19:30.194255-07:00 five smartd[2225]: Device: /dev/nvme1, Critical Warning (0x04): Reliability

[3] syslog

2024-04-05T20:17:33.403062+00:00 ubuntu kernel: PANIC: unencrypted block in encrypted object set 2760
2024-04-05T20:17:33.403077+00:00 ubuntu kernel: Showing stack for process 17834
2024-04-05T20:17:33.403078+00:00 ubuntu kernel: CPU: 7 PID: 17834 Comm: send_traverse Tainted: P           O       6.8.0-11-generic #11-Ubuntu
2024-04-05T20:17:33.403079+00:00 ubuntu kernel: Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS, BIOS 3001 12/04/2020
2024-04-05T20:17:33.403080+00:00 ubuntu kernel: Call Trace:
2024-04-05T20:17:33.403081+00:00 ubuntu kernel:  <TASK>
2024-04-05T20:17:33.403081+00:00 ubuntu kernel:  dump_stack_lvl+0x48/0x70
2024-04-05T20:17:33.403082+00:00 ubuntu kernel:  dump_stack+0x10/0x20
2024-04-05T20:17:33.403082+00:00 ubuntu kernel:  spl_dumpstack+0x29/0x40 [spl]
2024-04-05T20:17:33.403083+00:00 ubuntu kernel:  vcmn_err+0xcd/0x110 [spl]
2024-04-05T20:17:33.403083+00:00 ubuntu kernel:  zfs_panic_recover+0x75/0xa0 [zfs]
2024-04-05T20:17:33.403084+00:00 ubuntu kernel:  send_cb+0x360/0x5f0 [zfs]
2024-04-05T20:17:33.403084+00:00 ubuntu kernel:  ? zbookmark_subtree_completed+0x60/0x90 [zfs]
2024-04-05T20:17:33.403085+00:00 ubuntu kernel:  traverse_visitbp+0x179/0xbe0 [zfs]
2024-04-05T20:17:33.403085+00:00 ubuntu kernel:  ? srso_return_thunk+0x5/0x5f
2024-04-05T20:17:33.403085+00:00 ubuntu kernel:  ? resume_skip_check+0x26/0x80 [zfs]
2024-04-05T20:17:33.403086+00:00 ubuntu kernel:  ? traverse_prefetch_metadata+0x7d/0x170 [zfs]
2024-04-05T20:17:33.403086+00:00 ubuntu kernel:  traverse_visitbp+0x2bc/0xbe0 [zfs]
2024-04-05T20:17:33.403087+00:00 ubuntu kernel:  traverse_dnode+0xe6/0x230 [zfs]
2024-04-05T20:17:33.403087+00:00 ubuntu kernel:  traverse_visitbp+0x6b7/0xbe0 [zfs]
2024-04-05T20:17:33.403088+00:00 ubuntu kernel:  traverse_visitbp+0x2bc/0xbe0 [zfs]
2024-04-05T20:17:33.403092+00:00 ubuntu kernel: message repeated 4 times: [  traverse_visitbp+0x2bc/0xbe0 [zfs]]
2024-04-05T20:17:33.403092+00:00 ubuntu kernel:  traverse_dnode+0xe6/0x230 [zfs]
2024-04-05T20:17:33.403099+00:00 ubuntu kernel:  traverse_visitbp+0x996/0xbe0 [zfs]
2024-04-05T20:17:33.403100+00:00 ubuntu kernel:  ? srso_return_thunk+0x5/0x5f
2024-04-05T20:17:33.403100+00:00 ubuntu kernel:  traverse_impl+0x1d0/0x460 [zfs]
2024-04-05T20:17:33.403101+00:00 ubuntu kernel:  ? __pfx_send_cb+0x10/0x10 [zfs]
2024-04-05T20:17:33.403101+00:00 ubuntu kernel:  ? __pfx_send_cb+0x10/0x10 [zfs]
2024-04-05T20:17:33.403102+00:00 ubuntu kernel:  traverse_dataset_resume+0x65/0x90 [zfs]
2024-04-05T20:17:33.403103+00:00 ubuntu kernel:  ? __pfx_send_cb+0x10/0x10 [zfs]
2024-04-05T20:17:33.403104+00:00 ubuntu kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
2024-04-05T20:17:33.403104+00:00 ubuntu kernel:  send_traverse_thread+0x49/0x90 [zfs]
2024-04-05T20:17:33.403104+00:00 ubuntu kernel:  ? __pfx_send_traverse_thread+0x10/0x10 [zfs]
2024-04-05T20:17:33.403105+00:00 ubuntu kernel:  thread_generic_wrapper+0x5f/0x70 [spl]
2024-04-05T20:17:33.403105+00:00 ubuntu kernel:  kthread+0xf2/0x120
2024-04-05T20:17:33.403106+00:00 ubuntu kernel:  ? __pfx_kthread+0x10/0x10
2024-04-05T20:17:33.403106+00:00 ubuntu kernel:  ret_from_fork+0x47/0x70
2024-04-05T20:17:33.403107+00:00 ubuntu kernel:  ? __pfx_kthread+0x10/0x10
2024-04-05T20:17:33.403107+00:00 ubuntu kernel:  ret_from_fork_asm+0x1b/0x30
2024-04-05T20:17:33.403108+00:00 ubuntu kernel:  </TASK>
2024-04-05T20:17:33.403883+00:00 ubuntu kernel: audit: type=1400 audit(1712348253.397:121): apparmor="DENIED" operation="open" class="file" profile="rsyslogd" name="/run/systemd/sessions/" pid=2221 comm=72733A6D61696E20513A526567 requested_mask="r" denied_mask="r" fsuid=102 ouid=0

[4] Zpool status

$ sudo zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:09:18 with 0 errors on Wed Apr  3 04:11:51 2024
config:

	NAME                                      STATE     READ WRITE CKSUM
	rpool                                     DEGRADED     0     0     0
	  mirror-0                                DEGRADED     0     0     0
	    703bc945-8754-2742-977d-66ddad17cc1f  OFFLINE      0     0     0
	    nvme0n1p4                             ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        rpool/USERDATA/five_crypt:<0x2b0784>
@sethtroisi sethtroisi added the Type: Defect Incorrect behavior (e.g. crash, hang) label Apr 5, 2024
@ago
Copy link

ago commented Apr 9, 2024

Hi. Looks like I have exactly the same issue, so I decided not to create duplicates.
It is easily and reliably reproducible, also in a virtual machine. I've just freshly installed Kubuntu 23.10 using virt-manager without any particular customizations. Then I built and installed openzfs 2.2.3 from git according to this manual: https://openzfs.github.io/openzfs-docs/Developer%20Resources/Custom%20Packages.html
Then I added a second virtio block device specifically for the test pool. Then here is how I reproduced the issue:

root@kubuntuvm:~# zpool create -o ashift=12 -o autotrim=on -O acltype=posixacl -O xattr=sa -O dnodesize=auto -O compression=lz4 -O normalization=formD -O relatime=on -O mountpoint=/testpool testpool /dev/vdb
root@kubuntuvm:~# zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt testpool/encrypted
Enter new passphrase:
Re-enter new passphrase:
root@kubuntuvm:~# zpool status -v testpool
  pool: testpool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: No known data errors
root@kubuntuvm:~# cd /testpool/
root@kubuntuvm:/testpool# dd if=/dev/urandom bs=4M count=1 of=block4M
1+0 records in
1+0 records out
4194304 bytes (4,2 MB, 4,0 MiB) copied, 0,0119457 s, 351 MB/s
root@kubuntuvm:/testpool# strace cp block4M encrypted/
execve("/usr/bin/cp", ["cp", "block4M", "encrypted/"], 0x7ffc93a00200 /* 28 vars */) = 0
brk(NULL)                               = 0x637f22f1b000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc68376950) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x793cac4c4000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=78351, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 78351, PROT_READ, MAP_PRIVATE, 3, 0) = 0x793cac4b0000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=174472, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 181928, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x793cac483000
mmap(0x793cac489000, 114688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x793cac489000
mmap(0x793cac4a5000, 28672, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x793cac4a5000
mmap(0x793cac4ac000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x29000) = 0x793cac4ac000
mmap(0x793cac4ae000, 5800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x793cac4ae000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=34888, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 32800, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x793cac47a000
mmap(0x793cac47c000, 16384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x793cac47c000
mmap(0x793cac480000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x793cac480000
mmap(0x793cac481000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x793cac481000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=26696, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 28696, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x793cac472000
mmap(0x793cac474000, 12288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x793cac474000
mmap(0x793cac477000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x793cac477000
mmap(0x793cac478000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x793cac478000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\203\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2105184, ...}, AT_EMPTY_PATH) = 0
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2150256, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x793cac200000
mmap(0x793cac226000, 1568768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x793cac226000
mmap(0x793cac3a5000, 348160, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a5000) = 0x793cac3a5000
mmap(0x793cac3fa000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1f9000) = 0x793cac3fa000
mmap(0x793cac400000, 53104, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x793cac400000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=629440, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 631568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x793cac165000
mmap(0x793cac167000, 442368, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x793cac167000
mmap(0x793cac1d3000, 176128, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6e000) = 0x793cac1d3000
mmap(0x793cac1fe000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x98000) = 0x793cac1fe000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x793cac470000
arch_prctl(ARCH_SET_FS, 0x793cac471540) = 0
set_tid_address(0x793cac471810)         = 33586
set_robust_list(0x793cac471820, 24)     = 0
rseq(0x793cac471e60, 0x20, 0, 0x53053053) = 0
mprotect(0x793cac3fa000, 16384, PROT_READ) = 0
mprotect(0x793cac1fe000, 4096, PROT_READ) = 0
mprotect(0x793cac478000, 4096, PROT_READ) = 0
mprotect(0x793cac481000, 4096, PROT_READ) = 0
mprotect(0x793cac4ac000, 4096, PROT_READ) = 0
mprotect(0x637f21a2f000, 4096, PROT_READ) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x793cac46e000
mprotect(0x793cac4fb000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x793cac4b0000, 78351)           = 0
statfs("/sys/fs/selinux", 0x7ffc68376920) = -1 ENOENT (No such file or directory)
statfs("/selinux", 0x7ffc68376920)      = -1 ENOENT (No such file or directory)
getrandom("\x99\x98\x7e\x45\xca\x85\xea\xbc", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x637f22f1b000
brk(0x637f22f3c000)                     = 0x637f22f3c000
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 403
read(3, "", 1024)                       = 0
close(3)                                = 0
access("/etc/selinux/config", F_OK)     = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=6078144, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 6078144, PROT_READ, MAP_PRIVATE, 3, 0) = 0x793caba00000
close(3)                                = 0
geteuid()                               = 0
openat(AT_FDCWD, "encrypted/", O_RDONLY|O_PATH|O_DIRECTORY) = 3
newfstatat(AT_FDCWD, "block4M", {st_mode=S_IFREG|0644, st_size=4194304, ...}, 0) = 0
newfstatat(3, "block4M", 0x7ffc68376490, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "block4M", O_RDONLY)   = 4
newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=4194304, ...}, AT_EMPTY_PATH) = 0
openat(3, "block4M", O_WRONLY|O_CREAT|O_EXCL, 0644) = 5
ioctl(5, BTRFS_IOC_CLONE or FICLONE, 4) = -1 EXDEV (Invalid cross-device link)
newfstatat(5, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0
fadvise64(4, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
copy_file_range(4, NULL, 5, NULL, 9223372035781033984, 0) = 4194304
copy_file_range(4, NULL, 5, NULL, 9223372035781033984, 0) = 0
close(5)                                = 0
close(4)                                = 0
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
root@kubuntuvm:/testpool# cat encrypted/block4M > /dev/null
<hangs forever>

The kernel log shows this:

[  560.017341] PANIC: unencrypted block in encrypted object set 74
[  560.017350] Showing stack for process 33587
[  560.017354] CPU: 1 PID: 33587 Comm: cat Tainted: P           O       6.5.0-27-generic #28-Ubuntu
[  560.017359] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  560.017361] Call Trace:
[  560.017364]  <TASK>
[  560.017369]  dump_stack_lvl+0x48/0x70
[  560.017379]  dump_stack+0x10/0x20
[  560.017388]  spl_dumpstack+0x29/0x40 [spl]
[  560.017415]  vcmn_err+0xcd/0x110 [spl]
[  560.017443]  zfs_panic_recover+0x75/0xa0 [zfs]
[  560.017784]  dbuf_read_impl.constprop.0+0x54c/0x5d0 [zfs]
[  560.017977]  ? srso_alias_return_thunk+0x5/0x7f
[  560.017985]  ? aggsum_add+0x1aa/0x1d0 [zfs]
[  560.018164]  dbuf_read+0x23c/0x5b0 [zfs]
[  560.018343]  ? srso_alias_return_thunk+0x5/0x7f
[  560.018347]  dmu_buf_hold_array_by_dnode+0x113/0x570 [zfs]
[  560.018533]  dmu_read_uio_dnode+0x5a/0x110 [zfs]
[  560.018648]  ? srso_alias_return_thunk+0x5/0x7f
[  560.018651]  ? avl_add+0x4a/0x80 [zfs]
[  560.018738]  dmu_read_uio_dbuf+0x48/0x70 [zfs]
[  560.018852]  zfs_read+0x125/0x300 [zfs]
[  560.018972]  zpl_iter_read+0xc2/0x140 [zfs]
[  560.019082]  vfs_read+0x20d/0x360
[  560.019090]  ksys_read+0x73/0x100
[  560.019093]  __x64_sys_read+0x19/0x30
[  560.019095]  do_syscall_64+0x5c/0x90
[  560.019099]  ? irqentry_exit_to_user_mode+0x17/0x20
[  560.019102]  ? srso_alias_return_thunk+0x5/0x7f
[  560.019105]  ? irqentry_exit+0x43/0x50
[  560.019107]  ? srso_alias_return_thunk+0x5/0x7f
[  560.019109]  ? exc_page_fault+0x94/0x1b0
[  560.019112]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  560.019117] RIP: 0033:0x783038d1a7a1
[  560.019133] Code: 00 48 8b 15 79 36 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb c3 e8 a0 e7 01 00 f3 0f 1e fa 80 3d a5 bd 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54
[  560.019135] RSP: 002b:00007ffc5ba85e28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  560.019138] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 0000783038d1a7a1
[  560.019139] RDX: 0000000000020000 RSI: 0000783038e5e000 RDI: 0000000000000003
[  560.019141] RBP: 0000000000020000 R08: 00000000ffffffff R09: 0000000000000000
[  560.019142] R10: 0000000000000022 R11: 0000000000000246 R12: 0000783038e5e000
[  560.019143] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000000
[  560.019147]  </TASK>

I noticed, that this happens only when strace shows copy_file_range() being used. So looks like the fix in #15464 was not enough.

@sethtroisi
Copy link
Contributor Author

If you scrub the pool does that fix the issue? Can you rm the file without causing the pool to hang?

If this can be fixed just be deleting files it would be nice to have add the filename (if that's possible) to the error message so that it can be deleted manually.

@ago
Copy link

ago commented Apr 9, 2024

Scrubbing the pool indeed fixes the issue to some extent: reading the file returns an IO error, but doesn't panic and hang anymore. Cool! Deleting the file + scrubbing seems to fix the pool completely. Great, at least there is no need to recreate my affected pool.

root@kubuntuvm:~# zpool status -v
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /testpool/encrypted/block4M
root@kubuntuvm:~# zpool scrub testpool
root@kubuntuvm:~# zpool status -v
  pool: testpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Apr  9 22:32:30 2024
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: No known data errors
root@kubuntuvm:~# ls /testpool/encrypted/block4M  -l
-rw-r--r-- 1 root root 4194304 Apr  9 16:03 /testpool/encrypted/block4M
root@kubuntuvm:~# cat /testpool/encrypted/block4M > /dev/null
cat: /testpool/encrypted/block4M: Input/output error
root@kubuntuvm:~# zpool status -v
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Apr  9 22:32:30 2024
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /testpool/encrypted/block4M
root@kubuntuvm:~# zpool scrub testpool
root@kubuntuvm:~# zpool status -v
  pool: testpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Apr  9 22:34:02 2024
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: No known data errors
root@kubuntuvm:~# cat /testpool/encrypted/block4M > /dev/null
cat: /testpool/encrypted/block4M: Input/output error
root@kubuntuvm:~# zpool status -v
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Apr  9 22:34:02 2024
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /testpool/encrypted/block4M
root@kubuntuvm:~# rm /testpool/encrypted/block4M
root@kubuntuvm:~# zpool scrub testpool
root@kubuntuvm:~# zpool status -v
  pool: testpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Apr  9 22:34:23 2024
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          vdb       ONLINE       0     0     0

errors: No known data errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants