-
Notifications
You must be signed in to change notification settings - Fork 178
VERIFY3(c < (1ULL << 17) >> 9) failed (65535 < 256) SPLError: 18818:0:(zio.c:247:zio_buf_alloc()) SPL PANIC #390
Comments
@akorn This is likely a ZFS issue rather than an SPL issue. You're likely facing an issue which has recently started causing a lot of problems. The related issues are openzfs/zfs#2700, openzfs/zfs#2663, openzfs/zfs#2701 and openzfs/zfs#2717. EDIT: Removed reference to issue 2214. Unfortunately, I've not been able to reproduce this yet nor have I gotten my hands on a pool with the problem for further examination. If you could find the inode number of "documents", could you please run |
@dweeezil, zdb segfaults:
Would it make sense to retry with your latest zdb (do you expect it to work where the zdb from openzfs/zfs@2d50158 segfaulted)? Or maybe the information it did print is enough? |
@akorn That version of my zdb branch would segfault, too. I just added a couple of changes to it ending with dweeezil/zfs@6efc9d3. That one will still segfault, but it should dump the spill blkptr before it does so (run with 7 -d's). You've clearly got a corrupted spill blkptr and my hope is by examining a dump of it, the nature of the corruption will be made more clear. Am I correct in assuming this system is also using selinux and that it has a policy in place which would label files on ZFS? Inode 8 would normally be one of the very first objects created in a new file system. Had the file system just been created prior to running the rsync? Can you reliably reproduce this problem (in a reasonable amount of time)? |
@dweeezil, I'll try with your zdb tomorrow if I have time. As for selinux, no. I only use xattrs for posix ACLs. This fs is the backup of another, live fs (xfs); I use rsync to update the copy, and then I take a zfs snapshot of it. The 'documents' directory probably has inode number 8 because rsync first created the top-level directories and only then started populating them. Interestingly, all three existing snapshots of this fs exhibit the corruption, but it can't have been present when the snapshots were taken because the backup process locks up on a corrupt fs, leading to no new snapshot being created. The snapshots are from 25 August and 1 September, which means the corruption occurred after that, and that it took a while for it to happen (counting from initial fs creation). I have no idea how reproducible the problem is. Since it's a backup fs (and the 2nd backup of the live system), I can destroy and recreate it at will, i.e. we can experiment. |
@dweeezil, here is the output from your zdb:
And fwiw here is the backtrace:
|
FWIW, these are the ACLs that should be set (obtained using "getfacl -n --skip-base *" on the source fs that was rsynced to zfs):
"documents" only contains two subdirectories (no files). |
@akorn Thank you. This is good information. It is certainly the same problem as in openzfs/zfs#2700 and other related issues I mentioned except that in your case, the problem has manifested itself I'm still trying to reproduce the problem locally but now, between the zdb output I got from @sopmot in openzfs/zfs#2700 and your debugging above, I've got a better idea of what's happening. If I can't come up with a reproducer and/or find the problem through code inspection, and if you're able to reliably reproduce the problem, could you do an strace output of rsync (and all its subordinate threads/processes; use "strace -ff -o ") and grep it for all operations involving the "documents" directory (beginning with its creation)? I'll post a followup here and to openzfs/zfs#2700 with more details after I work on this some more. |
I corrected the description of the corruption in the previous issue comment. |
I can no longer import my pool (openzfs/zfs#2932); is there any way to recover, perhaps even resorting to a hex editor? I don't have backups because these are the backups. :) I don't mind losing a filesystem or two, but losing the entire pool would be inconvenient. Trying to import with |
Closing, the root cause of the issue was resolved by openzfs/zfs@4254acb |
Hi,
I have filesystem with xattr=sa and acltype=posixacl.
lgetxattr() on inodes with ACLs on it causes the process performing the lookup to hang, like this:
The filesystem was created with 0.6.3-1~trusty from Darik's PPA ; the above was obtained with latest git master as of yesterday (f9bde4f, openzfs/zfs@2d50158)
Strace output up to the moment where the process hangs:
Additionally, listxattr() on an entry with no xattrs returns EFAULT (at least after the above panic).
The text was updated successfully, but these errors were encountered: