-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect ashift value may cause faulted pools #425
Comments
Actually, even without dd/reboot the pool faults when it's exported/re-imported or when I try to scrub the pool. Hmm. |
That's not good. Why larger ashift values are damaging isn't immediately clear to me but it's clearly causing damage to the label. The larger ashift will impact the total number of uberblocks which can be stored in the fixed size labels and perhaps we have an overrun. It would probably be wise to restrict the maximum ashift size to 12 which has been well tested. |
While we initially allowed you to set your ashift as large as 17 (SPA_MAXBLOCKSIZE) that is actually unsafe. What wasn't considered at the time is that each uberblock written to the vdev label ring buffer will be of this size. Now the buffer is statically sized to 128k and we need to be able to fit several uberblocks in it. With a large ashift that becomes a problem. Therefore I'm reducing the maximum configurable ashift value to 12. This is large enough for the 4k sector drives and small enough that we can still keep the most recent 32 uberblock in the vdev label ring buffer. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#425
…penzfs#425) This module provides a mechanism to have the Agent process randmly exit at certain "interesting" points, to test recovery on restart. To use it, set the "die_mtbf_secs" tunable to the desired mean time between failures, in seconds. A time point between 0 and 2x the configured time will be selected as the amount of time to run before dying. At that point, a random call site of `maybe_die_with()` will be selected to exit the process. Note that each *call site* (source file, line, column) is equally likely to die, not each *call* (invocation of maybe_die_with()). For example, maybe_die_with() is called 1000x/sec from one call site and 1x/sec from another call site, we will be equally likely to terminate via each of the 2 call sites. Therefore you don't need to worry about adding a high-frequency caller and having it "always" die on that caller.
While investigating another issue I've come across this:
I'm able to reproduce this same behavior with ashift values as low as 14. The disk in this test is an iSCSI LUN (with 64kB sectors) that's passed-through to the VM by VMware as a disk with 512-byte sectors.
There are two things I'm currently wondering about:
a) Can this possibly happen with lower ashift values (e.g. ashift=12 with 512-bytes-per-sector disks) and is this because ZFS assumes that single-sector-writes are atomic (I'm really just guessing here).
b) Depending on how (a) turns out, is this suggestion (from zpool(8)) such a great idea:
The text was updated successfully, but these errors were encountered: