From 0c637f3100f0b6a76594f1ec42bfdc24b6455765 Mon Sep 17 00:00:00 2001 From: loli10K Date: Sat, 23 Feb 2019 00:36:34 +0100 Subject: [PATCH] zpool reports 16E expandsize on disks with oddball number of sectors The issue is caused by a small discrepancy in how userland creates the partition layout and the kernel estimates available space: * zpool command: subtract 9M from the usable device size, then align to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M EFI "reserved" partition. * kernel module: subtract 10M from the device size. 10M is the sum of 1M "start" partition alignment + 1m "end" partition alignment + 8M EFI "reserved" partition. For devices where the number of sectors is not a multiple of the alignment size the zpool command will create a partition layout which reserves less than 1M after the 8M EFI "reserved" partition: Disk /dev/sda: 1024 MiB, 1073739776 bytes, 2097148 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 49811D40-16F4-4E41-84A9-387703950D7F Device Start End Sectors Size Type /dev/sda1 2048 2078719 2076672 1014M Solaris /usr & Apple ZFS /dev/sda9 2078720 2095103 16384 8M Solaris reserved 1 When the kernel module vdev_open() the device its max_asize ends up being slightly smaller than asize: this results in a huge number (16E) reported by metaslab_class_expandable_space(). This change prevents bdev_max_capacity() from returing a size smaller than bdev_capacity(). Reviewed-by: Brian Behlendorf Reviewed-by: George Wilson Reviewed by: Sara Hartse Signed-off-by: loli10K Closes #1468 Closes #8391 --- module/zfs/vdev.c | 11 +++++++++++ module/zfs/vdev_disk.c | 15 +++++++++++---- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c index b17682d81c1d..a803833ba63b 100644 --- a/module/zfs/vdev.c +++ b/module/zfs/vdev.c @@ -1645,6 +1645,17 @@ vdev_open(vdev_t *vd) error = vd->vdev_ops->vdev_op_open(vd, &osize, &max_osize, &ashift); + /* + * Physical volume size should never be larger than its max size, unless + * the disk has shrunk while we were reading it or the device is buggy + * or damaged: either way it's not safe for use, bail out of the open. + */ + if (osize > max_osize) { + vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN, + VDEV_AUX_OPEN_FAILED); + return (SET_ERROR(ENXIO)); + } + /* * Reset the vdev_reopening flag so that we actually close * the vdev on error. diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c index db765c57bb35..4ac08c86148e 100644 --- a/module/zfs/vdev_disk.c +++ b/module/zfs/vdev_disk.c @@ -108,6 +108,10 @@ bdev_capacity(struct block_device *bdev) * case, and updating the partition table if appropriate. Once the partition * size has been increased the additional capacity will be visible using * bdev_capacity(). + * + * The returned maximum expansion capacity is always expected to be larger, or + * at the very least equal, to its usable capacity to prevent overestimating + * the pool expandsize. */ static uint64_t bdev_max_capacity(struct block_device *bdev, uint64_t wholedisk) @@ -122,14 +126,17 @@ bdev_max_capacity(struct block_device *bdev, uint64_t wholedisk) * alignment restrictions. Over reporting this value isn't * harmful and would only result in slightly less capacity * than expected post expansion. + * The estimated available space may be slightly smaller than + * bdev_capacity() for devices where the number of sectors is + * not a multiple of the alignment size and the partition layout + * is keeping less than PARTITION_END_ALIGNMENT bytes after the + * "reserved" EFI partition: in such cases return the device + * usable capacity. */ available = i_size_read(bdev->bd_contains->bd_inode) - ((EFI_MIN_RESV_SIZE + NEW_START_BLOCK + PARTITION_END_ALIGNMENT) << SECTOR_BITS); - if (available > 0) - psize = available; - else - psize = bdev_capacity(bdev); + psize = MAX(available, bdev_capacity(bdev)); } else { psize = bdev_capacity(bdev); }