[stable/cobia] Sync with Upstream zfs-2.2-release #170

ixhamza · 2023-10-02T20:05:18Z

Motivation and Context

Sync with Upstream zfs-2.2-release

How Has This Been Tested?

CI Tests
Scale Build: https://ci.tn.ixsystems.net/jenkins/job/TrueNAS%20SCALE%20-%20Unstable/job/Build%20-%20TrueNAS%20SCALE_Custom/613/
Core Build: https://builds.ixsystems.com/view/Custom/job/TrueNAS%20Custom%20Build/196/

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

Additionally, the .child element of ctl_table has been removed in 6.5. This change adds a new test for the pre-6.5 register_sysctl_table() function, and uses the old code in that case. If it isn't found, then the parentage entries in the tables are removed, and the register_sysctl call is provided the paths of "kernel/spl", "kernel/spl/kmem", and "kernel/spl/kstat" directly, to populate each subdirectory over three calls, as is the new API. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15098

When disk_check_media_change() exists, then define zfs_check_media_change() to simply call disk_check_media_change() on the bd_disk member of its argument. Since disk_check_media_change() is newer than when revalidate_disk was present in bops, we should be able to safely do this via a macro, instead of recreating a new implementation of the inline function that forces revalidation. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15101

Multiple changes to the blkdev API were introduced in Linux 6.5. This includes passing (void* holder) to blkdev_put, adding a new blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type that replaces uses of fmode_t, and removing an argument from the release handler on block_device_operations that we weren't using. The open function definition has also changed to take gendisk* and blk_mode_t, so update it accordingly, too. Implement local wrappers for blkdev_get_by_path() and vdev_blkdev_put() so that the in-line calls are cleaner, and place the conditionally-compiled implementation details inside of both of these local wrappers. Both calls are exclusively used within vdev_disk.c, at this time. Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE The wrapper function is now used for testing using the appropriate method for the kernel, whether the open mode is writable or not. Emphasize fmode_t arg in zvol_release is not used Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15099

The iov_iter->iov member is now iov_iter->__iov and must be accessed via the accessor function iter_iov(). Create a wrapper that is conditionally compiled to use the access method appropriate for the target kernel version. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15100

An iov_iter_type() function to access the "type" member of the struct iov_iter was added at one point. Move the conditional logic to decide which method to use for accessing it into a macro and simplify the zpl_uio_init code. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15100

This reverts commit b35374f as there are error messages when loading the SPL module. Errors seemed to be tied to duplicate a duplicate entry. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes openzfs#15134

Additionally, the .child element of ctl_table has been removed in 6.5. This change adds a new test for the pre-6.5 register_sysctl_table() function, and uses the old code in that case. If it isn't found, then the parentage entries in the tables are removed, and the register_sysctl call is provided the paths of "kernel/spl", "kernel/spl/kmem", and "kernel/spl/kstat" directly, to populate each subdirectory over three calls, as is the new API. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15138

…e_read The generic_file_splice_read function was removed in Linux 6.5 in favor of filemap_splice_read. Add an autoconf test for filemap_splice_read and use it if it is found as the handler for .splice_read in the file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This change removes the ITER_* macros that OpenZFS doesn't use from being tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was causing the test to fail, which also affected the code responsible for setting the .splice_read handler, above. That behavior caused run-time panics on Linux 6.5. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15155

Using the filemap_splice_read function for the splice_read handler was leading to occasional data corruption under certain circumstances. Favor using copy_splice_read instead, which does not demonstrate the same erroneous behavior under the tested failure cases. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes openzfs#15164

If we fail to create a proc entry in spl_proc_init() we may end up calling unregister_sysctl_table() twice: one in the failure path of spl_proc_init() and another time during spl_proc_fini(). Avoid the double call to unregister_sysctl_table() and while at it refactor the code a bit to reduce code duplication. This was accidentally introduced when the spl code was updated for Linux 6.5 compatibility. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Andrea Righi <[email protected]> Closes openzfs#15234 Closes openzfs#15235

When register_sysctl_table() is unavailable we fail to properly unregister sysctl entries under "kernel/spl". This leads to errors like the following when spl is unloaded/reloaded, making impossible to properly reload the spl module: [ 746.995704] sysctl duplicate entry: /kernel/spl/kmem/slab_kvmem_total Fix by cleaning up all the sub-entries inside "kernel/spl" when the spl module is unloaded. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Andrea Righi <[email protected]> Closes openzfs#15239

Update the META file to reflect compatibility with the 6.5 kernel. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>

spa_upgrade_errlog() does not update the MOS directory when the head_errlog feature is enabled. In this case if spa_errlog_sync() is not called, the MOS dir references the old errlog_last and errlog_sync objects. Thus when doing a scrub a panic will occur: Call Trace: dump_stack+0x6d/0x8b panic+0x101/0x2e3 spl_panic+0xcf/0x102 [spl] delete_errlog+0x124/0x130 [zfs] spa_errlog_sync+0x256/0x260 [zfs] spa_sync_iterate_to_convergence+0xe5/0x250 [zfs] spa_sync+0x2f7/0x670 [zfs] txg_sync_thread+0x22d/0x2d0 [zfs] thread_generic_wrapper+0x83/0xa0 [spl] kthread+0x104/0x140 ret_from_fork+0x1f/0x40 Fix this by updating the related MOS directory objects in spa_upgrade_errlog(). Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#15279 Closes openzfs#15277

Commit 2d78434 had previously updated this hardcoded limit to allow for CI testing. As there is no deterministic pass/fail value, the need has arisen for one more small increase. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Edmund Nadolski <[email protected]> Closes openzfs#15252

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Laura Hild <[email protected]> Closes openzfs#15247

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b341b20d648bb7e9a3307c33163e7399f0913e66 Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes openzfs#15282 Closes openzfs#15284

Allow zed to autoreplace vdevs marked as REMOVED. Also update statechange-led zedlet to toggle fault LEDs for REMOVED vdevs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#15281

Added in ab26409 ("Linux 3.1 compat, super_block->s_shrink"), with the only consumer which needed the count getting retired in 066e825 ("Linux compat: Minimum kernel version 3.10"). The counter gets in the way of not maintaining the list to begin with. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes openzfs#15274

zfs-share.service executes `zfs share` on every boot to share any filesystem/s, that are shared over SMB and/or NFS using the sharesmb and sharenfs properties. Since we do not rely on these properties to share over SMB and NFS and the service fails on boot on TrueNAS if sharesmb and/or sharenfs properties are set, and we rely on middleware to control the SMB and NFS shares, zfs-share.service should be disabled for TrueNAS SCALE. Signed-off-by: Umer Saleem <[email protected]>

There are some inconsistencies in the handling of mountpoint property. This commit updates the behavior and makes it consistent. If mountpoint property is set when dataset is unmounted, this would update the mountpoint property. The mountpoint could be valid or invalid in this case. Setting the mountpoint property would result in success in this case. Dataset would still be unmounted here. On the other hand, if dataset is mounted and mountpoint property is updated to something invalid where mount cannot be successful, for example, setting the mountpoint inside a readonly directory. This would unmount the dataset, set the mountpoint property to requested value and tries to mount the dataset. The mount operation returns error and this error is treated as overall failure of setting the property while the property is actually set. To make the behavior consistent in case dataset is mounted or unmounted, we should try to mount the dataset whenever mountpoint property is updated. This would result in mounting the datasets if canmount property is set to on, regardless if the dataset was previously unmounted. The failure in mount operation while setting the mountpoint property should not be treated as failure, since the property is actually set now to user requested value. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#15240

For sharesmb and sharenfs properties, the status of setting the property is tied with whether we succeed to share the dataset or not. In case sharing the dataset is not successful, this is treated as overall failure of setting the property. In this case, if we check the property after the failure, it is set to on. This commit updates this behavior and the status of setting the share properties is not returned as failure, when we fail to share the dataset. For sharenfs property, if access list is provided, the syntax errors in access list/host adresses are not validated until after setting the property during postfix phase while trying to share the dataset. This is not correct, since the property has already been set when we reach there. Syntax errors in access list/host addresses are validated while validating the property list, before setting the property and failure is returned to user in this case when there are errors in access list. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#15240

zil_lwb_set_zio_dependency() can not set write ZIO dependency on previous LWB's write ZIO if one is already in done handler and set state to LWB_STATE_WRITE_DONE. So theoretically done handler of next LWB's write ZIO may run before done handler of previous LWB write ZIO completes. In such case we can not defer flushes, since the flush issue process is not locked. This may fix some reported assertions of lwb_vdev_tree not being empty inside zil_free_lwb(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15278

'pam_change_unmounted' and 'pam_recursive' both exist and are referenced by the test run config, but weren't being installed and so are excluded. This gets them installed so they will run as expected. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15291

In openzfs#13375 we modified the allocation size of the buffer that we use to apply l2arc transforms to be the size of the arc hdr we're using, rather than the allocation size that will be in place on the disk, because sometimes the hdr size is larger. Unfortunately, sometimes the allocation size is larger, which means that we overflow the buffer in that case. This change modifies the allocation to be the max of the two values Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15177 Closes openzfs#15248

There is an occasional ztest failure that looks like ztest: attach (/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) returned 22, expected 95. This is because the value that we return is EINVAL, but expected_error is set incorrectly. Change the expected_error value to match both the comment and the actual error value. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15295

'program help subcommand' is a reasonably common pattern for multifunction command-line programs. This commit adds support for that style to the zpool and zfs commands. When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the 'man' program on the PATH with the most likely manpage name for the requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or "zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics. If no topic is supplied, uses the top "zpool" or "zfs" pages. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15288

We have occasional crashes in the rsend tests. Debugging revealed that this is because the send_worker thread is getting EINTR from splice(). This happens when a non-fatal signal is received during the syscall. We should retry the syscall, rather than exiting failure. Tweak the loop to only break if the splice is finished or we receive a non-EINTR error. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15273

When failmode=continue is set and the pool suspends, both 'zpool status' and the 'zfs/pool/state' kstat ignore it and report the normal vdev tree state. There's no clear indicator that the pool is suspended. This is unlike suspend in failmode=wait, or suspend due to MMP check failure, which both report "SUSPENDED" explicitly. This commit changes it so SUSPENDED is reported for failmode=continue the same as for other modes. Rationale: The historical behaviour of failmode=continue is roughly, "press on as though all is well". To this end, the fact that the pool had suspended was not shown, to maintain the façade that all is well. Its unclear why hiding this information was considered appropriate. One possibility is that it was expected that a true pool fault would always be reported as DEGRADED or FAULTED, and that the pool could not suspend without these happening. That is not necessarily true, as vdev health and suspend state are only loosely connected, such that a pool in (apparent) good health can be suspended for good reasons, and of course a degraded pool does not lead to suspension. Even if that expectation were true, there's still a difference in urgency - a degraded pool may not need to be attended to for hours, while a suspended pool is most often unusable until an operator intervenes. An operator that has set failmode=continue has presumably done so because their workload is one that can continue to operate in a useful way when the pool suspends. In this case the operator still needs a clear indicator that there is a problem that needs attending to. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#15297

We've observed this test failing intermittently. When it does, the "same block" check shows that both files have the same content, that is, the file was cloned. The only way this could have happened is if the open txg moved between the dd and clonefile calls. That's possible because although we set zfs_txg_timeout to be large, that only affects the wait time in the sync thread at the start of a new txg; it doesn't change anything if its currently waiting or working. So here we just force the txgs to move immediately before, which should get both operations onto the same txg as intented. Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris Rob Norris <[email protected]> Closes openzfs#15303

While working on similar patches for zfs and zvol in openzfs#15153 I've forgot about ztest. Update it also so that we test the same code paths as use in production. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15301

The problem that was occurring is basically that a device was removed by ztest and replaced with another device. It was then reguided. The import then failed because there were two possible imports with the same name; one with the new guid, and one with the old. This can happen because the label writes from the device removal/replacement can be subject to ztest's error injection. The other ways to fix this would be to change the error injection to not trigger on removals (which may not be technically feasible), or to change the import code to not report configurations that are so short on devices (which would potentially have unpleasant end-user effects when trying to recover from data losses/device configuration issues). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15298

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15309

"zfs_share_concurrent_shares" may fail on FreeBSD and some Linux distributions (fedora). Move it to the common list. "zfs_allow_010_pos" has been observed to fail on FreeBSD 13. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#15306

Reviewed-by: John Wren Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15316

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15315

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15307 Closes openzfs#15308

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR openzfs#13172 but then this was again introduced by PR openzfs#13839. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Also, updated the existing testcase which reliably reproduced the issue without this patch. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Dipak Ghosh <[email protected]> Signed-off-by: Akash B <[email protected]> Closes openzfs#15312

Signed-off-by: Ameer Hamza <[email protected]>

Sync with Upstream zfs-2.2-release

ckane and others added 30 commits September 19, 2023 08:50

Linux 6.5 compat: META (openzfs#15265)

c011ef8

Update the META file to reflect compatibility with the 6.5 kernel. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>

Remove implication that child disks aren't vdevs in zpoolconcepts(7)

228b064

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Laura Hild <[email protected]> Closes openzfs#15247

pcd1193182 and others added 9 commits September 28, 2023 14:28

Set timeout before creating pool in test

8526b12

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15309

ZTS: Fix introduced test bug in block_cloning_copyfilerange

99dc1fc

Reviewed-by: John Wren Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15316

Reduce trim min size even lower for tests to reduce flakiness

d38f466

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15315

Don't allocate from new metaslabs

9e36c57

Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#15307 Closes openzfs#15308

Merge upstream zfs-2.2-release into truenas/zfs-2.2-release

fe88aff

Signed-off-by: Ameer Hamza <[email protected]>

Merge pull request #169 from truenas/zfs-2.2-release-sync-cobia-release

8e57472

Sync with Upstream zfs-2.2-release

ixhamza requested a review from amotin October 2, 2023 20:05

amotin approved these changes Oct 2, 2023

View reviewed changes

amotin merged commit 1b56fd9 into stable/cobia Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable/cobia] Sync with Upstream zfs-2.2-release #170

[stable/cobia] Sync with Upstream zfs-2.2-release #170

ixhamza commented Oct 2, 2023

[stable/cobia] Sync with Upstream zfs-2.2-release #170

[stable/cobia] Sync with Upstream zfs-2.2-release #170

Conversation

ixhamza commented Oct 2, 2023

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist: