-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPLError: 11346:0:(zio.c:792:zio_write()) when trying to create a filesystem w/ DEBUG #1933
Comments
Adding a printk() in dmu.c:dmu_write_policy():
|
The ASSERT here looks correct, a type of 192 isn't possible. The type must be in the range 0-55 (DMU_OT_NUMTYPES), or 128-137. The lower range is for legacy compatibility with pre-feature flag pools, and the upper range is to allow new generic types. If you could also print the dataset and object number it would be interesting to see what zdb outputs about the object. Is the type 192 really stored on disk or do it somehow get damaged in memory. |
@behlendorf Indeed, saw that to. I wanted to printout the dataset, but I have no idea to get to it from any of the zio functions. The level and number of structs is impossible to get a grips on... Any pointers? |
@FransUrbo You should be able to do something like this (untested). The bookmark hung off the zio should provide the information you need. The only tricky bit is you're going to get the objset id not its name. You'll need to use diff --git a/module/zfs/zio.c b/module/zfs/zio.c
index 97f2549..75a7a7a 100644
--- a/module/zfs/zio.c
+++ b/module/zfs/zio.c
@@ -730,6 +730,14 @@ zio_write(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *b
zio->io_physdone = physdone;
zio->io_prop = *zp;
+#ifdef _KERNEL
+ if (unlikely(!DMU_OT_IS_VALID(zp->zp_type))) {
+ printk("Invalid zp_type=%d dataset=%llu object=%llu\n",
+ zp->zp_type, zio->io_bookmark.zb_objset,
+ zio->io_bookmark.zb_object);
+ }
+#endif /* KERNEL */
+
return (zio);
}
|
@behlendorf unfortunately I still can't run any zdb command option without getting an I/O error :(. |
We do create 1 new DMU_OT type, just like upstream, but it is placed just before NUM_TYPES
Which is quite shy of 192/196. I would want to get rid of that DMU_OT type as well, but the new system is well confusing, and I dare say, not documented :) '''The level and number of structs is impossible to get a grips on... Any pointers?''' They are all pointers mate, hah! mirite?! |
@lundman haha, very catchy :). I'm currently building with the patch from @behlendorf and I'll see what that leads to. But considering I can't run any zdb command, I'm unsure what good it will do... |
@behlendorf is it safe to disable the ASSERT()? The code you showed is after the assert is triggered, so it's not executed... But why do I only get it when compiling with debugging and not without it? |
@FransUrbo All the ASSERTs are compiled out if debugging is not enabled. |
It seems that zp_type is only invalid the first time.
Trying to mount everything that is mountable shows:
So all of the ZVOL accesses give the same dataset and object! |
@FransUrbo That may make some sense but I'd be skeptical. Dataset 0 is always the MOS for a given pool, it contains the metadata which describes the contents of a pool (the list of datasets, references to spacemaps, etc). It wouldn't shock me if the crytpo bits added a new object type linked off the MOS and we're trying to replay a log entry during import for the unknown type. Have you tried importing the pool read-only? That should effectively prevent any replay and disable the
|
Yes, importing the pool readonly avoid most of, if not all of the problems. Unfortunately, that won't help me much at the moment. Spending quite a lot of money at this time to build a storage backup server with enough disks isn't something I really can/want to do. I'm planning a 'one killer of a storage server' to be build in the spring/summer (enough CPU, MEM and disk to run dedup+compression+better redundancy+crypto on all of my data instead of just just some selected bits). But I have started to accept the fact that maybe I don't have a choice :(. @lundman Any thoughts, especially on 'ZoL trying to replay a log entry which is part of the crypto bit'? |
@lundman do you have any idea where that |
192? 196? I think I mentioned earlier we do indeed add a new type, but shouldn't that high. At the moment the log replay will simply skip the unknown value and everything is fine. We should probably drill down to find out where it comes from though. It would also be interesting to import a Solaris pool v30 with crypto to see if they as well have 196. The first crypto code I run has uptime of 400 days, so I'm not sure how "scary" it actually is :) It would be nice to change it to the newer style |
I'm closing this, because it's now quite obvious that it's the zfs-crypto patch that fucked everything up. |
Can't say for certain if this is a ZoL issue or ZFS-Crypto one (can't test without crypto since I have the encryption feature enable on my pool). I've created a ticket at zfsrogue as well, but this MIGHT be related to my dead lock problem (#1862).
Trying to create a filesystem with DEBUG enabled on both spl and zfs gives:
After the SPLEerror, zfs hangs. Creating a filesystem without DEBUG enabled works (or did work a few days ago when I was running without it - have cherry-picked some commits since).
Trying to figure out what test fails, I ended up with this:
So somehow/somewhere zp_type is set to an unvalid number. I'm currently trying to figure out where/how zp_type got a value of 196...
See http://bayour.com/misc/SPLError.txt for a previous debug test. It seems that it's cycling through something and have correct/acceptable values except for the last cycle just before the SPLError...
The text was updated successfully, but these errors were encountered: