Incorrect ashift value may cause faulted pools #425

gunnarbeutner · 2011-10-20T07:33:55Z

While investigating another issue I've come across this:

root@zt:~# zpool create -f -o ashift=16 tank sdb
root@zt:~# dd if=/dev/zero of=/tank/test bs=1M &
[1] 3405
root@zt:~# sleep 5; echo b > /proc/sysrq-trigger

after the reboot:

root@zt:~# zpool status
  pool: tank
 state: FAULTED
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from
        a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        FAULTED      0     0     0  corrupted data
          sdb       UNAVAIL      0     0     0  corrupted data
root@zt:~#

I'm able to reproduce this same behavior with ashift values as low as 14. The disk in this test is an iSCSI LUN (with 64kB sectors) that's passed-through to the VM by VMware as a disk with 512-byte sectors.

There are two things I'm currently wondering about:

a) Can this possibly happen with lower ashift values (e.g. ashift=12 with 512-bytes-per-sector disks) and is this because ZFS assumes that single-sector-writes are atomic (I'm really just guessing here).

b) Depending on how (a) turns out, is this suggestion (from zpool(8)) such a great idea:

       For optimal performance, the pool sector size should be greater than or equal to the
       sector size of the underlying disks. Since the property cannot be changed after pool
       creation,  if in a given pool, you ever want to use drives that report 4KiB sectors,
       you must set ashift=12 at pool creation time.

The text was updated successfully, but these errors were encountered:

gunnarbeutner · 2011-10-20T09:51:44Z

Actually, even without dd/reboot the pool faults when it's exported/re-imported or when I try to scrub the pool. Hmm.

behlendorf · 2011-10-22T00:00:13Z

That's not good. Why larger ashift values are damaging isn't immediately clear to me but it's clearly causing damage to the label. The larger ashift will impact the total number of uberblocks which can be stored in the fixed size labels and perhaps we have an overrun. It would probably be wise to restrict the maximum ashift size to 12 which has been well tested.

While we initially allowed you to set your ashift as large as 17 (SPA_MAXBLOCKSIZE) that is actually unsafe. What wasn't considered at the time is that each uberblock written to the vdev label ring buffer will be of this size. Now the buffer is statically sized to 128k and we need to be able to fit several uberblocks in it. With a large ashift that becomes a problem. Therefore I'm reducing the maximum configurable ashift value to 12. This is large enough for the 4k sector drives and small enough that we can still keep the most recent 32 uberblock in the vdev label ring buffer. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#425

…penzfs#425) This module provides a mechanism to have the Agent process randmly exit at certain "interesting" points, to test recovery on restart. To use it, set the "die_mtbf_secs" tunable to the desired mean time between failures, in seconds. A time point between 0 and 2x the configured time will be selected as the amount of time to run before dying. At that point, a random call site of `maybe_die_with()` will be selected to exit the process. Note that each *call site* (source file, line, column) is equally likely to die, not each *call* (invocation of maybe_die_with()). For example, maybe_die_with() is called 1000x/sec from one call site and 1x/sec from another call site, we will be equally likely to terminate via each of the 2 call sites. Therefore you don't need to worry about adding a high-frequency caller and having it "always" die on that caller.

behlendorf closed this as completed in ca5fd24 Nov 11, 2011

loli10K mentioned this issue Feb 8, 2017

Further ashift handling improvements #5763

Merged

11 tasks

sdimitro pushed a commit to sdimitro/zfs that referenced this issue May 23, 2022

enable sibling block ingestion by default (openzfs#425)

679178c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect ashift value may cause faulted pools #425

Incorrect ashift value may cause faulted pools #425

gunnarbeutner commented Oct 20, 2011

gunnarbeutner commented Oct 20, 2011

behlendorf commented Oct 22, 2011

Incorrect ashift value may cause faulted pools #425

Incorrect ashift value may cause faulted pools #425

Comments

gunnarbeutner commented Oct 20, 2011

gunnarbeutner commented Oct 20, 2011

behlendorf commented Oct 22, 2011