From cf21b0dcdf1081e40aabaad3b9c7c657e0572498 Mon Sep 17 00:00:00 2001 From: Don Brady Date: Wed, 27 Apr 2022 21:59:35 -0600 Subject: [PATCH] Fix upstream merge conflict into 6.0/stage (#388) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Improve the inline descriptions of the ARC module parameters These are displayed as the descriptions of the sysctl's on FreeBSD Reviewed-by: George Melikov Reviewed-by: Brian Behlendorf Signed-off-by: Allan Jude Closes #13334 * linux: module: weld all but spl.ko into zfs.ko Originally it was thought it would be useful to split up the kmods by functionality. This would allow external consumers to only load what was needed. However, in practice we've never had a case where this functionality would be needed, and conversely managing multiple kmods can be awkward. Therefore, this change merges all but the spl.ko kmod in to a single zfs.ko kmod. Reviewed-by: Tony Hutter Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13274 * scripts: zfs.sh: remove cat Reviewed-by: Tony Hutter Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13274 * scripts: zfs.sh: make usage make sense We don't pass the arguments as arguments Reviewed-by: Tony Hutter Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13274 * scripts: zfs.sh: unload zfs with dependencies Reviewed-by: Tony Hutter Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13274 * linux: module: uninstall legacy modules on (un)installation This can be reverted once we're sure nobody's using them anymore (post-3.0 release?) Reviewed-by: Tony Hutter Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13274 * zpool_history_unpack: return correct errno on nvlist_unpack failure Reviewed-by: Brian Behlendorf Reviewed-by: Damian Szuberski Signed-off-by: WHR Closes #13321 * Document zfs inherit -S's interaction with noninheritable properties Reviewed-by: Brian Behlendorf Reviewed-by: Damian Szuberski Signed-off-by: Ahelenia Ziemiańska Closes #11894 Closes #13335 * rpm -> deb doesn't fail when optional packages are missing Reviewed-by: Brian Behlendorf Signed-off-by: szubersk Closes #13331 Closes #13336 * man: ... -> … again zfs-program.8 is left, but that's literal Lua syntax Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13255 * FreeBSD: Fix translation from ABD to physical pages In hypothetical case of non-linear ABD with single segment, multiple to page size but not aligned to it, vdev_geom_fill_unmap_cb() could fill one page less into bio_ma array. I am not sure it is exploitable, but better to be safe than sorry. Reviewed-by: Brian Behlendorf Reviewed-by: Ryan Moeller Reported-by: Mark Johnston Signed-off-by: Alexander Motin Closes #13345 * Corrected oversight in ZERO_RANGE behavior It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do have differing semantics in some ways - in particular, one requires KEEP_SIZE, and the other does not. Also added a zero-range test to catch this, corrected a flaw that made the punch-hole test succeed vacuously, and a typo in file_write. Reviewed-by: Brian Behlendorf Signed-off-by: Rich Ercolani Closes #13329 Closes #13338 * contrib: dracut: parse-zfs: drop initqueue-finished for i/f The switch was released in dracut 009 in March 2011, we can safely get rid of the compatibility hook Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: parse-zfs: stop pretending we support FILESYSTEM= It was added in the original ae26d0465a ("Add dracut support") commit in 2011, and was then broken a bit later with the advent of dracut-zfs-generator, or maybe earlier as part of other churn Either way, it's broken, and has been in 2.0+ as well, and no-one complained. Stop pretending we support it at all Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib; dracut: centralise root= parsing, actually support root=s So far, everything parsed root= manually, which meant that while zfs-parse.sh was updated, and supposedly supported + -> ' ' conversion, it meant nothing Instead, centralise parsing, and allow: root= root=zfs root=zfs: root=zfs:AUTO root=ZFS=data/set root=zfs:data/set root=zfs:ZFS=data/set (as a side-effect; allowed but undocumented) rootfstype=zfs AND root=data/set <=> root=data/set rootfstype=zfs AND root= <=> root=zfs:AUTO So rootfstype=zfs /also/ behaves as expected, and + decoding works Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib; dracut: flatten zfs-load-key, simplify zfs-env-bootfs Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: zfs-lib: simplify ask_for_password The only user is mount-zfs.sh (non-systemd systems), so reduce it to what it needs Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: zfs-lib: remove find_bootfs Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: inline single-use import_pool, move single-use ask_for_password Also don't set ROOTFS_MOUNTED; the final mention was removed in dracut 011 from July 2011 Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: don't require essentials to be under the same encroot Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading This fixes at least one race I got with an encrypted root Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: zfs-needshutdown: don't list Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * Add dracut.zfs.7 Thorough documentation with a dracut.bootup(7)-style flowchart, dracut.cmdline(7)-style cmdline listing, and per-file docs like the old README Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * contrib: dracut: remove getargbool polyfill It was originally released in dracut 008 in February 2011; we can probably drop it now Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13291 * Strengthen Linux kernel capabilities detection - Add `CONFIG_BLOCK` Linux config requirement to `ZFS_AC_KERNEL_CONFIG_DEFINED`. OpenZFS won't compile without that block device support due to large amount of functional dependencies on it. - Remove dependency on `groups_alloc()` in `ZFS_AC_KERNEL_SRC_GROUP_INFO_GID` to circumvent the missing stub in Linux 4.X kernel headers. Reviewed-by: Brian Behlendorf Signed-off-by: szubersk Closes #13351 * scripts: zfs.sh: explicitly ignore unloaded modules when unloading Reviewed-by: Brian Atkinson Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13356 * scripts: zfs.sh: explicitly unload all modules via rmmod modprobe -r only works for depmodded modules, but this also means we have to re-iterate legacy modules, and in the right order Reviewed-by: Brian Atkinson Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13356 * linux: module: zfs: sysfs: constify types and attrs Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13357 * Linux 5.18 compat: kobj_type.default_attrs replaced with default_groups Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject: kobj_type: remove default_attrs") Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13357 * zvol_wait: Ignore locked zvols "When an encrypted zvol is locked the zfs-volume-wait service does not start. The /sbin/zvol_wait should not wait for links when the volume has property keystatus=unavailable." -- https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1888405 Reviewed-by: Tony Hutter Reviewed-by: Damian Szuberski Thanks: James Dingwall Signed-off-by: Richard Laager Closes #10662 * man: zfs-send.8: fix -X synopses and description Also clean up the horrendously verbose -X handling in zfs_main() Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13352 * tests: cli_user: zfs_001_neg: print the problematic lines Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13352 * Improve zpool status output, list all affected datasets Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of #9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Reviewed-by: Matthew Ahrens Reviewed-by: Brian Behlendorf Reviewed-by: Mark Maybee Reviewed-by: Tony Hutter Co-authored-by: TulsiJain Signed-off-by: George Amanakis Closes #9175 Closes #12812 * Improve log spacemap load time Previous flushing algorithm limited only total number of log blocks to the minimum of 256K and 4x number of metaslabs in the pool. As result, system with 1500 disks with 1000 metaslabs each, touching several new metaslabs each TXG could grow spacemap log to huge size without much benefits. We've observed one of such systems importing pool for about 45 minutes. This patch improves the situation from five sides: - By limiting maximum period for each metaslab to be flushed to 1000 TXGs, that effectively limits maximum number of per-TXG spacemap logs to load to the same number. - By making flushing more smooth via accounting number of metaslabs that were touched after the last flush and actually need another flush, not just ms_unflushed_txg bump. - By applying zfs_unflushed_log_block_pct to the number of metaslabs that were touched after the last flush, not all metaslabs in the pool. - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in advance, making log spacemap load process for wide HDD pool CPU-bound, accelerating it by many times. - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing single-threaded by nature log processing time from ~10 to ~5 minutes. As further optimization we could skip bumping ms_unflushed_txg for metaslabs not touched since the last flush, but that would be an incompatible change, requiring new pool feature. Reviewed-by: Matthew Ahrens Reviewed-by: Brian Behlendorf Signed-off-by: Alexander Motin Sponsored-By: iXsystems, Inc. Closes #12789 * autoconf: Pretend `CONFIG_MODULES` is always on - Unconditionally inject `CONFIG_MODULES` make variable and `#define CONFIG_MODULES` to Kbuild in `ZFS_LINUX_COMPILE` autoconf function to emulate loadable kernel modules support. This allows OpenZFS to perform Linux checks despite `CONFIG_MODULES=n` in the actual Linux config. - Add `ZFS_AC_KERNEL_CONFIG_MODULES` check which encompasses the logic from `ZFS_AC_KERNEL_TEST_MODULE` with additional diagnostic messages to the user - Removed `ZFS_AC_KERNEL_TEST_MODULE` as it merely duplicates every check in `ZFS_AC_KERNEL_CONFIG_DEFINED` - Moved `ZFS_AC_MODULE_SYMVERS` after `ZFS_AC_KERNEL_CONFIG_DEFINED` so the user has a chance to see the proper diagnostic from the steps before. A workaround for Linux's ``` commit 3e3005df73b535cb849cf4ec8075d6aa3c460f68 Author: Masahiro Yamada Date: Wed Mar 31 22:38:03 2021 +0900 kbuild: unify modules(_install) for in-tree and external modules If you attempt to build or install modules ('make modules(_install)' with CONFIG_MODULES disabled, you will get a clear error message, but nothing for external module builds. Factor out the modules and modules_install rules into the common part, so you will get the same error message when you try to build external modules with CONFIG_MODULES=n. Signed-off-by: Masahiro Yamada ``` Reviewed-by: Brian Behlendorf Signed-off-by: szubersk Closes #10832 Closes #13361 * PPC get_user workaround Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic() inline helpers very indirectly include a reference to the GPL'd array mmu_feature_keys[] and fails to build. Workaround this by using copy_from_user() and throwing EFAULT for any calls to __copy_from_user_inatomic(). This is a workaround until a fix for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458 "powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed. Reviewed-by: Brian Behlendorf Authored-by: Colin Ian King Signed-off-by: szubersk Closes #11958 Closes #12590 Closes #13367 * zfs: holds: general cleanup Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13373 * zfs: holds: dequadratify Before: 15 0m0.177s 30 0m0.653s 45 0m1.289s 60 0m2.129s 75 0m3.264s 90 0m4.397s 100 0m5.996s 117 0m8.552s After: 30 0m0.053s 117 0m0.125s Reviewed-by: Brian Behlendorf Signed-off-by: Ahelenia Ziemiańska Closes #13372 Closes #13373 * Linux 5.18 compat: replace __set_page_dirty_nobuffers Replace __set_page_dirty_nobuffers with filemap_dirty_folio. Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache ") Reviewed-by: Brian Behlendorf Reviewed-by: Tony Hutter Authored-by: Satadru Pramanik Signed-off-by: Satadru Pramanik Closes #13325 Closes #13380 * Fix O_APPEND for Linux 3.15 and older kernels When using a Linux kernel which predates the iov_iter interface the O_APPEND flag should be applied in zpl_aio_write() via the call to generic_write_checks(). The updated pos variable was incorrectly ignored resulting in the current offset being used. This issue should only realistically impact the RHEL/CentOS 7.x kernels which are based on Linux 3.10. Reviewed-by: Tony Hutter Signed-off-by: Brian Behlendorf Closes #13370 Closes #13377 Co-authored-by: Allan Jude Co-authored-by: наб Co-authored-by: Low-power Co-authored-by: Damian Szuberski Co-authored-by: Alexander Motin Co-authored-by: Rich Ercolani <214141+rincebrain@users.noreply.github.com> Co-authored-by: Richard Laager Co-authored-by: George Amanakis Co-authored-by: Satadru Pramanik Co-authored-by: Brian Behlendorf --- Makefile.am | 2 +- cmd/zfs/zfs_main.c | 122 +-- cmd/zvol_wait/zvol_wait | 10 +- config/deb.am | 4 +- config/kernel-config-defined.m4 | 88 +- config/kernel-copy-from-user-inatomic.m4 | 26 + config/kernel-group-info.m4 | 4 +- config/kernel-sysfs.m4 | 37 + config/kernel-vfs-filemap_dirty_folio.m4 | 30 + config/kernel.m4 | 37 +- configure.ac | 11 - contrib/dracut/90zfs/module-setup.sh.in | 5 - contrib/dracut/90zfs/mount-zfs.sh.in | 130 ++- contrib/dracut/90zfs/parse-zfs.sh.in | 67 +- .../dracut/90zfs/zfs-env-bootfs.service.in | 2 +- contrib/dracut/90zfs/zfs-generator.sh.in | 30 +- contrib/dracut/90zfs/zfs-lib.sh.in | 181 ++-- contrib/dracut/90zfs/zfs-load-key.sh.in | 105 +- contrib/dracut/90zfs/zfs-needshutdown.sh.in | 2 +- .../90zfs/zfs-rollback-bootfs.service.in | 6 +- .../90zfs/zfs-snapshot-bootfs.service.in | 6 +- contrib/dracut/README.md | 16 +- etc/init.d/zfs-zed.in | 3 +- include/os/freebsd/spl/sys/ccompile.h | 3 - include/os/freebsd/spl/sys/mod_os.h | 5 - include/os/linux/kernel/linux/mod_compat.h | 7 - include/sys/dmu.h | 2 + include/sys/dsl_dataset.h | 3 + include/sys/metaslab.h | 3 + include/sys/metaslab_impl.h | 1 + include/sys/mod.h | 5 - include/sys/spa.h | 8 +- include/sys/spa_log_spacemap.h | 9 +- include/sys/zio.h | 14 + include/zfeature_common.h | 1 + lib/libnvpair/libnvpair.abi | 2 +- lib/libuutil/libuutil.abi | 193 ++-- lib/libzfs/libzfs.abi | 11 +- lib/libzfsbootenv/libzfsbootenv.abi | 37 +- lib/libzutil/zutil_pool.c | 5 +- man/Makefile.am | 1 + man/man4/zfs.4 | 20 +- man/man7/dracut.zfs.7 | 278 ++++++ man/man7/zfsprops.7 | 15 +- man/man7/zpool-features.7 | 11 + man/man8/zed.8.in | 2 +- man/man8/zfs-allow.8 | 24 +- man/man8/zfs-program.8 | 9 +- man/man8/zfs-send.8 | 33 +- man/man8/zfs-set.8 | 5 +- module/Kbuild.in | 423 +++++++- module/Makefile.in | 34 +- module/avl/Makefile.in | 10 - module/avl/avl.c | 22 - module/icp/Makefile.in | 90 -- module/icp/illumos-crypto.c | 7 +- module/lua/Makefile.in | 39 - module/lua/lapi.c | 23 - module/nvpair/Makefile.in | 13 - module/nvpair/nvpair.c | 21 - module/os/freebsd/zfs/vdev_geom.c | 6 +- module/os/linux/spl/Makefile.in | 17 - module/os/linux/spl/spl-generic.c | 8 +- module/os/linux/zfs/Makefile.in | 40 - module/os/linux/zfs/zfs_ioctl_os.c | 65 +- module/os/linux/zfs/zfs_sysfs.c | 49 +- module/os/linux/zfs/zfs_uio.c | 8 +- module/os/linux/zfs/zfs_vnops_os.c | 4 + module/os/linux/zfs/zpl_file.c | 21 +- module/spl/Makefile.in | 13 - module/unicode/Makefile.in | 11 - module/unicode/u8_textprep.c | 21 - module/zcommon/Makefile.in | 28 - module/zcommon/zfeature_common.c | 7 + module/zcommon/zfs_prop.c | 14 +- module/zfs/Makefile.in | 158 --- module/zfs/arc.c | 20 +- module/zfs/dmu.c | 2 +- module/zfs/dsl_dataset.c | 40 + module/zfs/dsl_destroy.c | 3 + module/zfs/metaslab.c | 132 +-- module/zfs/spa.c | 2 +- module/zfs/spa_errlog.c | 910 ++++++++++++++++-- module/zfs/spa_log_spacemap.c | 231 +++-- module/zfs/vdev.c | 7 - module/zfs/vdev_removal.c | 2 - module/zfs/zfeature.c | 7 + module/zfs/zfs_ioctl.c | 2 +- module/zstd/Makefile.in | 69 -- module/zstd/README.md | 13 +- module/zstd/include/zstd_compat_wrapper.h | 2 +- module/zstd/zfs_zstd.c | 12 +- rpm/generic/zfs-kmod.spec.in | 2 +- rpm/redhat/zfs-kmod.spec.in | 2 +- scripts/Makefile.am | 11 +- scripts/dkms.mkconf | 40 +- scripts/zfs.sh | 118 +-- scripts/zfs2zol-patch.sed | 2 +- tests/runfiles/common.run | 1 + tests/runfiles/linux.run | 2 +- tests/zfs-tests/cmd/file/file_write.c | 2 +- tests/zfs-tests/include/libtest.shlib | 20 +- .../zfs_rename/zfs_rename_014_neg.ksh | 6 +- .../cli_root/zpool_get/zpool_get.cfg | 1 + .../cli_root/zpool_status/Makefile.am | 2 + .../zpool_status/zpool_status_003_pos.ksh | 70 ++ .../zpool_status/zpool_status_004_pos.ksh | 81 ++ .../functional/cli_user/misc/zfs_001_neg.ksh | 2 +- .../tests/functional/fallocate/Makefile.am | 3 +- .../fallocate/fallocate_punch-hole.ksh | 35 +- .../fallocate/fallocate_zero-range.ksh | 119 +++ .../tests/functional/fallocate/setup.ksh | 5 +- .../tests/functional/simd/simd_supported.ksh | 4 +- 113 files changed, 3053 insertions(+), 1712 deletions(-) create mode 100644 config/kernel-copy-from-user-inatomic.m4 create mode 100644 config/kernel-sysfs.m4 create mode 100644 config/kernel-vfs-filemap_dirty_folio.m4 create mode 100644 man/man7/dracut.zfs.7 delete mode 100644 module/avl/Makefile.in delete mode 100644 module/icp/Makefile.in delete mode 100644 module/lua/Makefile.in delete mode 100644 module/nvpair/Makefile.in delete mode 100644 module/os/linux/spl/Makefile.in delete mode 100644 module/os/linux/zfs/Makefile.in delete mode 100644 module/spl/Makefile.in delete mode 100644 module/unicode/Makefile.in delete mode 100644 module/zcommon/Makefile.in delete mode 100644 module/zfs/Makefile.in delete mode 100644 module/zstd/Makefile.in create mode 100755 tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_003_pos.ksh create mode 100755 tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_004_pos.ksh create mode 100755 tests/zfs-tests/tests/functional/fallocate/fallocate_zero-range.ksh diff --git a/Makefile.am b/Makefile.am index 7121c25fd8fe..d6ea2b17fffc 100644 --- a/Makefile.am +++ b/Makefile.am @@ -203,7 +203,7 @@ vcscheck: PHONY += zstdcheck zstdcheck: - @$(MAKE) -C module/zstd checksymbols + @$(MAKE) -C module check-zstd-symbols PHONY += lint lint: cppcheck paxcheck diff --git a/cmd/zfs/zfs_main.c b/cmd/zfs/zfs_main.c index 1b6b74e032a0..a103f6b09a22 100644 --- a/cmd/zfs/zfs_main.c +++ b/cmd/zfs/zfs_main.c @@ -315,9 +315,9 @@ get_usage(zfs_help_t idx) case HELP_ROLLBACK: return (gettext("\trollback [-rRf] \n")); case HELP_SEND: - return (gettext("\tsend [-DnPpRvLecwhb] " - "[-X dataset[,dataset]...] " - "[-[i|I] snapshot] \n" + return (gettext("\tsend [-DLPbcehnpsvw] " + "[-i|-I snapshot]\n" + "\t [-R [-X dataset[,dataset]...]] \n" "\tsend [-DnvPLecw] [-i snapshot|bookmark] " "\n" "\tsend [-DnPpvLec] [-i bookmark|snapshot] " @@ -4316,73 +4316,27 @@ zfs_do_snapshot(int argc, char **argv) return (-1); } +/* + * Array of prefixes to exclude – + * a linear search, even if executed for each dataset, + * is plenty good enough. + */ typedef struct zfs_send_exclude_arg { size_t count; - char **list; + const char **list; } zfs_send_exclude_arg_t; -/* - * This function creates the zfs_send_exclude_arg_t - * object described above; it can be called multiple - * times, and the input can be comma-separated. - * This is NOT the most efficient data layout; however, - * I couldn't think of a non-pathological case where - * it should have more than a couple dozen instances - * of excludes. If that turns out to be used in - * practice, we might want to instead use a tree. - */ -static void -add_dataset_excludes(char *exclude, zfs_send_exclude_arg_t *context) -{ - char *tok; - while ((tok = strsep(&exclude, ",")) != NULL) { - if (!zfs_name_valid(tok, ZFS_TYPE_DATASET) || - strchr(tok, '/') == NULL) { - (void) fprintf(stderr, gettext("-X %s: " - "not a valid non-root dataset name.\n"), tok); - usage(B_FALSE); - } - context->list = safe_realloc(context->list, - (sizeof (char *)) * (context->count + 1)); - context->list[context->count++] = tok; - } -} - -static void -free_dataset_excludes(zfs_send_exclude_arg_t *exclude_list) -{ - free(exclude_list->list); -} - -/* - * This is the call back used by zfs_send to - * determine if a dataset should be skipped. - * As stated above, this is not the most efficient - * data structure to use, but as long as the - * number of excluded datasets is relatively - * small (a couple of dozen or so), it won't - * have a big impact on performance on modern - * processors. Since it's excluding hierarchies, - * we'd probably want to move to a more complex - * tree structure in that case. - */ static boolean_t zfs_do_send_exclude(zfs_handle_t *zhp, void *context) { - zfs_send_exclude_arg_t *exclude = context; + zfs_send_exclude_arg_t *excludes = context; const char *name = zfs_get_name(zhp); - for (size_t indx = 0; indx < exclude->count; indx++) { - char *exclude_name = exclude->list[indx]; - size_t len = strlen(exclude_name); - /* If it's shorter, it can't possibly match */ - if (strlen(name) < len) - continue; - if (strncmp(name, exclude_name, len) == 0 && - (name[len] == '/' || name[len] == '\0' || - name[len] == '@')) { + for (size_t i = 0; i < excludes->count; ++i) { + size_t len = strlen(excludes->list[i]); + if (strncmp(name, excludes->list[i], len) == 0 && + memchr("/@", name[len], sizeof ("/@"))) return (B_FALSE); - } } return (B_TRUE); @@ -4403,11 +4357,11 @@ zfs_do_send(int argc, char **argv) int c, err; nvlist_t *dbgnv = NULL; char *redactbook = NULL; - zfs_send_exclude_arg_t exclude_context = { 0 }; + zfs_send_exclude_arg_t excludes = { 0 }; struct option long_options[] = { {"replicate", no_argument, NULL, 'R'}, - {"skip-missing", no_argument, NULL, 's'}, + {"skip-missing", no_argument, NULL, 's'}, {"redact", required_argument, NULL, 'd'}, {"props", no_argument, NULL, 'p'}, {"parsable", no_argument, NULL, 'P'}, @@ -4431,7 +4385,18 @@ zfs_do_send(int argc, char **argv) long_options, NULL)) != -1) { switch (c) { case 'X': - add_dataset_excludes(optarg, &exclude_context); + for (char *ds; (ds = strsep(&optarg, ",")) != NULL; ) { + if (!zfs_name_valid(ds, ZFS_TYPE_DATASET) || + strchr(ds, '/') == NULL) { + (void) fprintf(stderr, gettext("-X %s: " + "not a valid non-root dataset name" + ".\n"), ds); + usage(B_FALSE); + } + excludes.list = safe_realloc(excludes.list, + sizeof (char *) * (excludes.count + 1)); + excludes.list[excludes.count++] = ds; + } break; case 'i': if (fromname) @@ -4542,7 +4507,7 @@ zfs_do_send(int argc, char **argv) if (flags.parsable && flags.verbosity == 0) flags.verbosity = 1; - if (exclude_context.count > 0 && !flags.replicate) { + if (excludes.count > 0 && !flags.replicate) { (void) fprintf(stderr, gettext("Cannot specify " "dataset exclusion (-X) on a non-recursive " "send.\n")); @@ -4731,10 +4696,8 @@ zfs_do_send(int argc, char **argv) flags.doall = B_TRUE; err = zfs_send(zhp, fromname, toname, &flags, STDOUT_FILENO, - exclude_context.count > 0 ? zfs_do_send_exclude : NULL, - &exclude_context, flags.verbosity >= 3 ? &dbgnv : NULL); - - free_dataset_excludes(&exclude_context); + excludes.count > 0 ? zfs_do_send_exclude : NULL, + &excludes, flags.verbosity >= 3 ? &dbgnv : NULL); if (flags.verbosity >= 3 && dbgnv != NULL) { /* @@ -4746,8 +4709,9 @@ zfs_do_send(int argc, char **argv) dump_nvlist(dbgnv, 0); nvlist_free(dbgnv); } - zfs_close(zhp); + zfs_close(zhp); + free(excludes.list); return (err != 0); } @@ -6532,13 +6496,10 @@ holds_callback(zfs_handle_t *zhp, void *data) static int zfs_do_holds(int argc, char **argv) { - int errors = 0; int c; - int i; + boolean_t errors = B_FALSE; boolean_t scripted = B_FALSE; boolean_t recursive = B_FALSE; - const char *opts = "rH"; - nvlist_t *nvl; int types = ZFS_TYPE_SNAPSHOT; holds_cbdata_t cb = { 0 }; @@ -6548,7 +6509,7 @@ zfs_do_holds(int argc, char **argv) int flags = 0; /* check options */ - while ((c = getopt(argc, argv, opts)) != -1) { + while ((c = getopt(argc, argv, "rH")) != -1) { switch (c) { case 'r': recursive = B_TRUE; @@ -6575,10 +6536,9 @@ zfs_do_holds(int argc, char **argv) if (argc < 1) usage(B_FALSE); - if (nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0) != 0) - nomem(); + nvlist_t *nvl = fnvlist_alloc(); - for (i = 0; i < argc; ++i) { + for (int i = 0; i < argc; ++i) { char *snapshot = argv[i]; const char *delim; const char *snapname; @@ -6587,7 +6547,7 @@ zfs_do_holds(int argc, char **argv) if (delim == NULL) { (void) fprintf(stderr, gettext("'%s' is not a snapshot\n"), snapshot); - ++errors; + errors = B_TRUE; continue; } snapname = delim + 1; @@ -6601,10 +6561,10 @@ zfs_do_holds(int argc, char **argv) /* * 1. collect holds data, set format options */ - ret = zfs_for_each(argc, argv, flags, types, NULL, NULL, limit, + ret = zfs_for_each(1, argv + i, flags, types, NULL, NULL, limit, holds_callback, &cb); if (ret != 0) - ++errors; + errors = B_TRUE; } /* @@ -6617,7 +6577,7 @@ zfs_do_holds(int argc, char **argv) nvlist_free(nvl); - return (0 != errors); + return (errors); } #define CHECK_SPINNER 30 diff --git a/cmd/zvol_wait/zvol_wait b/cmd/zvol_wait/zvol_wait index 2aa929b0ca2b..f1fa42e27dc9 100755 --- a/cmd/zvol_wait/zvol_wait +++ b/cmd/zvol_wait/zvol_wait @@ -28,15 +28,17 @@ filter_out_deleted_zvols() { list_zvols() { read -r default_volmode < /sys/module/zfs/parameters/zvol_volmode zfs list -t volume -H -o \ - name,volmode,receive_resume_token,redact_snaps | - while IFS=" " read -r name volmode token redacted; do # IFS=\t here! + name,volmode,receive_resume_token,redact_snaps,keystatus | + while IFS=" " read -r name volmode token redacted keystatus; do # IFS=\t here! - # /dev links are not created for zvols with volmode = "none" - # or for redacted zvols. + # /dev links are not created for zvols with volmode = "none", + # redacted zvols, or encrypted zvols for which the key has not + # been loaded. [ "$volmode" = "none" ] && continue [ "$volmode" = "default" ] && [ "$default_volmode" = "3" ] && continue [ "$redacted" = "-" ] || continue + [ "$keystatus" = "unavailable" ] && continue # We also ignore partially received zvols if it is # not an incremental receive, as those won't even have a block diff --git a/config/deb.am b/config/deb.am index 65aa70687ba5..0033dd7591ff 100644 --- a/config/deb.am +++ b/config/deb.am @@ -61,8 +61,8 @@ deb-utils: deb-local rpm-utils-initramfs pkg7=$${name}-test-$${version}.$${arch}.rpm; \ pkg8=$${name}-dracut-$${version}.noarch.rpm; \ pkg9=$${name}-initramfs-$${version}.$${arch}.rpm; \ - pkg10=`ls python*-pyzfs-$${version}* | tail -1`; \ - pkg11=pam_zfs_key-$${version}.$${arch}.rpm; \ + pkg10=`ls python3-pyzfs-$${version}.noarch.rpm 2>/dev/null`; \ + pkg11=`ls pam_zfs_key-$${version}.$${arch}.rpm 2>/dev/null`; \ ## Arguments need to be passed to dh_shlibdeps. Alien provides no mechanism ## to do this, so we install a shim onto the path which calls the real ## dh_shlibdeps with the required arguments. diff --git a/config/kernel-config-defined.m4 b/config/kernel-config-defined.m4 index c7d18b49b14e..54837d728341 100644 --- a/config/kernel-config-defined.m4 +++ b/config/kernel-config-defined.m4 @@ -19,19 +19,48 @@ AC_DEFUN([ZFS_AC_KERNEL_CONFIG_DEFINED], [ ]) ]) + ZFS_AC_KERNEL_SRC_CONFIG_MODULES + ZFS_AC_KERNEL_SRC_CONFIG_BLOCK ZFS_AC_KERNEL_SRC_CONFIG_DEBUG_LOCK_ALLOC ZFS_AC_KERNEL_SRC_CONFIG_TRIM_UNUSED_KSYMS - ZFS_AC_KERNEL_SRC_CONFIG_ZLIB_INFLATE ZFS_AC_KERNEL_SRC_CONFIG_ZLIB_DEFLATE + ZFS_AC_KERNEL_SRC_CONFIG_ZLIB_INFLATE AC_MSG_CHECKING([for kernel config option compatibility]) ZFS_LINUX_TEST_COMPILE_ALL([config]) AC_MSG_RESULT([done]) + ZFS_AC_KERNEL_CONFIG_MODULES + ZFS_AC_KERNEL_CONFIG_BLOCK ZFS_AC_KERNEL_CONFIG_DEBUG_LOCK_ALLOC ZFS_AC_KERNEL_CONFIG_TRIM_UNUSED_KSYMS - ZFS_AC_KERNEL_CONFIG_ZLIB_INFLATE ZFS_AC_KERNEL_CONFIG_ZLIB_DEFLATE + ZFS_AC_KERNEL_CONFIG_ZLIB_INFLATE +]) + +dnl # +dnl # Check CONFIG_BLOCK +dnl # +dnl # Verify the kernel has CONFIG_BLOCK support enabled. +dnl # +AC_DEFUN([ZFS_AC_KERNEL_SRC_CONFIG_BLOCK], [ + ZFS_LINUX_TEST_SRC([config_block], [ + #if !defined(CONFIG_BLOCK) + #error CONFIG_BLOCK not defined + #endif + ],[]) +]) + +AC_DEFUN([ZFS_AC_KERNEL_CONFIG_BLOCK], [ + AC_MSG_CHECKING([whether CONFIG_BLOCK is defined]) + ZFS_LINUX_TEST_RESULT([config_block], [ + AC_MSG_RESULT([yes]) + ],[ + AC_MSG_RESULT([no]) + AC_MSG_ERROR([ + *** This kernel does not include the required block device support. + *** Rebuild the kernel with CONFIG_BLOCK=y set.]) + ]) ]) dnl # @@ -72,6 +101,61 @@ AC_DEFUN([ZFS_AC_KERNEL_CONFIG_DEBUG_LOCK_ALLOC], [ ]) ]) +dnl # +dnl # Check CONFIG_MODULES +dnl # +dnl # Verify the kernel has CONFIG_MODULES support enabled. +dnl # +AC_DEFUN([ZFS_AC_KERNEL_SRC_CONFIG_MODULES], [ + ZFS_LINUX_TEST_SRC([config_modules], [ + #if !defined(CONFIG_MODULES) + #error CONFIG_MODULES not defined + #endif + ],[]) +]) + +AC_DEFUN([ZFS_AC_KERNEL_CONFIG_MODULES], [ + AC_MSG_CHECKING([whether CONFIG_MODULES is defined]) + AS_IF([test "x$enable_linux_builtin" != xyes], [ + ZFS_LINUX_TEST_RESULT([config_modules], [ + AC_MSG_RESULT([yes]) + ],[ + AC_MSG_RESULT([no]) + AC_MSG_ERROR([ + *** This kernel does not include the required loadable module + *** support! + *** + *** To build OpenZFS as a loadable Linux kernel module + *** enable loadable module support by setting + *** `CONFIG_MODULES=y` in the kernel configuration and run + *** `make modules_prepare` in the Linux source tree. + *** + *** If you don't intend to enable loadable kernel module + *** support, please compile OpenZFS as a Linux kernel built-in. + *** + *** Prepare the Linux source tree by running `make prepare`, + *** use the OpenZFS `--enable-linux-builtin` configure option, + *** copy the OpenZFS sources into the Linux source tree using + *** `./copy-builtin `, + *** set `CONFIG_ZFS=y` in the kernel configuration and compile + *** kernel as usual. + ]) + ]) + ], [ + ZFS_LINUX_TRY_COMPILE([], [], [ + AC_MSG_RESULT([not needed]) + ],[ + AC_MSG_RESULT([error]) + AC_MSG_ERROR([ + *** This kernel is unable to compile object files. + *** + *** Please make sure you prepared the Linux source tree + *** by running `make prepare` there. + ]) + ]) + ]) +]) + dnl # dnl # Check CONFIG_TRIM_UNUSED_KSYMS dnl # diff --git a/config/kernel-copy-from-user-inatomic.m4 b/config/kernel-copy-from-user-inatomic.m4 new file mode 100644 index 000000000000..5fddaca59c20 --- /dev/null +++ b/config/kernel-copy-from-user-inatomic.m4 @@ -0,0 +1,26 @@ +dnl # +dnl # On certain architectures `__copy_from_user_inatomic` +dnl # is a GPL exported variable and cannot be used by OpenZFS. +dnl # + +dnl # +dnl # Checking if `__copy_from_user_inatomic` is available. +dnl # +AC_DEFUN([ZFS_AC_KERNEL_SRC___COPY_FROM_USER_INATOMIC], [ + ZFS_LINUX_TEST_SRC([__copy_from_user_inatomic], [ + #include + ], [ + int result __attribute__ ((unused)) = __copy_from_user_inatomic(NULL, NULL, 0); + ], [], [ZFS_META_LICENSE]) +]) + +AC_DEFUN([ZFS_AC_KERNEL___COPY_FROM_USER_INATOMIC], [ + AC_MSG_CHECKING([whether __copy_from_user_inatomic is available]) + ZFS_LINUX_TEST_RESULT([__copy_from_user_inatomic_license], [ + AC_MSG_RESULT(yes) + AC_DEFINE(HAVE___COPY_FROM_USER_INATOMIC, 1, + [__copy_from_user_inatomic is available]) + ], [ + AC_MSG_RESULT(no) + ]) +]) diff --git a/config/kernel-group-info.m4 b/config/kernel-group-info.m4 index 0fee1d36d50d..6941d62da017 100644 --- a/config/kernel-group-info.m4 +++ b/config/kernel-group-info.m4 @@ -6,8 +6,8 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_GROUP_INFO_GID], [ ZFS_LINUX_TEST_SRC([group_info_gid], [ #include ],[ - struct group_info *gi = groups_alloc(1); - gi->gid[0] = KGIDT_INIT(0); + struct group_info gi __attribute__ ((unused)) = {}; + gi.gid[0] = KGIDT_INIT(0); ]) ]) diff --git a/config/kernel-sysfs.m4 b/config/kernel-sysfs.m4 new file mode 100644 index 000000000000..668def5fe6bf --- /dev/null +++ b/config/kernel-sysfs.m4 @@ -0,0 +1,37 @@ +dnl # +dnl # Linux 5.2/5.18 API +dnl # +dnl # In cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject: kobj_type: remove default_attrs") +dnl # struct kobj_type.default_attrs +dnl # was finally removed in favour of +dnl # struct kobj_type.default_groups +dnl # +dnl # This was added in aa30f47cf666111f6bbfd15f290a27e8a7b9d854 ("kobject: Add support for default attribute groups to kobj_type"), +dnl # if both are present (5.2-5.17), we prefer default_groups; they're otherwise equivalent +dnl # +AC_DEFUN([ZFS_AC_KERNEL_SRC_SYSFS_DEFAULT_GROUPS], [ + ZFS_LINUX_TEST_SRC([sysfs_default_groups], [ + #include + ],[ + struct kobj_type __attribute__ ((unused)) kt = { + .default_groups = (const struct attribute_group **)NULL }; + ]) +]) + +AC_DEFUN([ZFS_AC_KERNEL_SYSFS_DEFAULT_GROUPS], [ + AC_MSG_CHECKING([for struct kobj_type.default_groups]) + ZFS_LINUX_TEST_RESULT([sysfs_default_groups],[ + AC_MSG_RESULT(yes) + AC_DEFINE([HAVE_SYSFS_DEFAULT_GROUPS], 1, [struct kobj_type has default_groups]) + ],[ + AC_MSG_RESULT(no) + ]) +]) + +AC_DEFUN([ZFS_AC_KERNEL_SRC_SYSFS], [ + ZFS_AC_KERNEL_SRC_SYSFS_DEFAULT_GROUPS +]) + +AC_DEFUN([ZFS_AC_KERNEL_SYSFS], [ + ZFS_AC_KERNEL_SYSFS_DEFAULT_GROUPS +]) diff --git a/config/kernel-vfs-filemap_dirty_folio.m4 b/config/kernel-vfs-filemap_dirty_folio.m4 new file mode 100644 index 000000000000..872879002928 --- /dev/null +++ b/config/kernel-vfs-filemap_dirty_folio.m4 @@ -0,0 +1,30 @@ +dnl # +dnl # Linux 5.18 uses filemap_dirty_folio in lieu of +dnl # ___set_page_dirty_nobuffers +dnl # +AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_FILEMAP_DIRTY_FOLIO], [ + ZFS_LINUX_TEST_SRC([vfs_has_filemap_dirty_folio], [ + #include + #include + + static const struct address_space_operations + aops __attribute__ ((unused)) = { + .dirty_folio = filemap_dirty_folio, + }; + ],[]) +]) + +AC_DEFUN([ZFS_AC_KERNEL_VFS_FILEMAP_DIRTY_FOLIO], [ + dnl # + dnl # Linux 5.18 uses filemap_dirty_folio in lieu of + dnl # ___set_page_dirty_nobuffers + dnl # + AC_MSG_CHECKING([filemap_dirty_folio exists]) + ZFS_LINUX_TEST_RESULT([vfs_has_filemap_dirty_folio], [ + AC_MSG_RESULT([yes]) + AC_DEFINE(HAVE_VFS_FILEMAP_DIRTY_FOLIO, 1, + [filemap_dirty_folio exists]) + ],[ + AC_MSG_RESULT([no]) + ]) +]) diff --git a/config/kernel.m4 b/config/kernel.m4 index 771dc21787d4..a70db91a8364 100644 --- a/config/kernel.m4 +++ b/config/kernel.m4 @@ -8,8 +8,8 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [ ZFS_AC_QAT dnl # Sanity checks for module building and CONFIG_* defines - ZFS_AC_KERNEL_TEST_MODULE ZFS_AC_KERNEL_CONFIG_DEFINED + ZFS_AC_MODULE_SYMVERS dnl # Sequential ZFS_LINUX_TRY_COMPILE tests ZFS_AC_KERNEL_FPU_HEADER @@ -101,6 +101,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [ ZFS_AC_KERNEL_SRC_SET_NLINK ZFS_AC_KERNEL_SRC_SGET ZFS_AC_KERNEL_SRC_LSEEK_EXECUTE + ZFS_AC_KERNEL_SRC_VFS_FILEMAP_DIRTY_FOLIO ZFS_AC_KERNEL_SRC_VFS_GETATTR ZFS_AC_KERNEL_SRC_VFS_FSYNC_2ARGS ZFS_AC_KERNEL_SRC_VFS_ITERATE @@ -133,6 +134,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [ ZFS_AC_KERNEL_SRC_BIO_MAX_SEGS ZFS_AC_KERNEL_SRC_SIGNAL_STOP ZFS_AC_KERNEL_SRC_SIGINFO + ZFS_AC_KERNEL_SRC_SYSFS ZFS_AC_KERNEL_SRC_SET_SPECIAL_STATE ZFS_AC_KERNEL_SRC_VFS_READPAGES ZFS_AC_KERNEL_SRC_VFS_SET_PAGE_DIRTY_NOBUFFERS @@ -141,6 +143,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [ ZFS_AC_KERNEL_SRC_ADD_DISK ZFS_AC_KERNEL_SRC_KTHREAD ZFS_AC_KERNEL_SRC_ZERO_PAGE + ZFS_AC_KERNEL_SRC___COPY_FROM_USER_INATOMIC AC_MSG_CHECKING([for available kernel interfaces]) ZFS_LINUX_TEST_COMPILE_ALL([kabi]) @@ -215,6 +218,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [ ZFS_AC_KERNEL_SET_NLINK ZFS_AC_KERNEL_SGET ZFS_AC_KERNEL_LSEEK_EXECUTE + ZFS_AC_KERNEL_VFS_FILEMAP_DIRTY_FOLIO ZFS_AC_KERNEL_VFS_GETATTR ZFS_AC_KERNEL_VFS_FSYNC_2ARGS ZFS_AC_KERNEL_VFS_ITERATE @@ -247,6 +251,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [ ZFS_AC_KERNEL_BIO_MAX_SEGS ZFS_AC_KERNEL_SIGNAL_STOP ZFS_AC_KERNEL_SIGINFO + ZFS_AC_KERNEL_SYSFS ZFS_AC_KERNEL_SET_SPECIAL_STATE ZFS_AC_KERNEL_VFS_READPAGES ZFS_AC_KERNEL_VFS_SET_PAGE_DIRTY_NOBUFFERS @@ -255,6 +260,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [ ZFS_AC_KERNEL_ADD_DISK ZFS_AC_KERNEL_KTHREAD ZFS_AC_KERNEL_ZERO_PAGE + ZFS_AC_KERNEL___COPY_FROM_USER_INATOMIC ]) dnl # @@ -443,8 +449,6 @@ AC_DEFUN([ZFS_AC_KERNEL], [ AC_SUBST(LINUX) AC_SUBST(LINUX_OBJ) AC_SUBST(LINUX_VERSION) - - ZFS_AC_MODULE_SYMVERS ]) dnl # @@ -539,27 +543,6 @@ AC_DEFUN([ZFS_AC_QAT], [ ]) ]) -dnl # -dnl # Basic toolchain sanity check. -dnl # -AC_DEFUN([ZFS_AC_KERNEL_TEST_MODULE], [ - AC_MSG_CHECKING([whether modules can be built]) - ZFS_LINUX_TRY_COMPILE([], [], [ - AC_MSG_RESULT([yes]) - ],[ - AC_MSG_RESULT([no]) - if test "x$enable_linux_builtin" != xyes; then - AC_MSG_ERROR([ - *** Unable to build an empty module. - ]) - else - AC_MSG_ERROR([ - *** Unable to build an empty module. - *** Please run 'make scripts' inside the kernel source tree.]) - fi - ]) -]) - dnl # dnl # ZFS_LINUX_CONFTEST_H dnl # @@ -662,8 +645,10 @@ AC_DEFUN([ZFS_LINUX_COMPILE], [ build kernel modules with LLVM/CLANG toolchain]) AC_TRY_COMMAND([ KBUILD_MODPOST_NOFINAL="$5" KBUILD_MODPOST_WARN="$6" - make modules -k -j$TEST_JOBS ${KERNEL_CC:+CC=$KERNEL_CC} ${KERNEL_LD:+LD=$KERNEL_LD} ${KERNEL_LLVM:+LLVM=$KERNEL_LLVM} -C $LINUX_OBJ $ARCH_UM - M=$PWD/$1 >$1/build.log 2>&1]) + make modules -k -j$TEST_JOBS ${KERNEL_CC:+CC=$KERNEL_CC} + ${KERNEL_LD:+LD=$KERNEL_LD} ${KERNEL_LLVM:+LLVM=$KERNEL_LLVM} + CONFIG_MODULES=y CFLAGS_MODULE=-DCONFIG_MODULES + -C $LINUX_OBJ $ARCH_UM M=$PWD/$1 >$1/build.log 2>&1]) AS_IF([AC_TRY_COMMAND([$2])], [$3], [$4]) ]) diff --git a/configure.ac b/configure.ac index 6ca980d3ebb4..30a62966f808 100644 --- a/configure.ac +++ b/configure.ac @@ -174,17 +174,6 @@ AC_CONFIG_FILES([ man/Makefile module/Kbuild module/Makefile - module/avl/Makefile - module/icp/Makefile - module/lua/Makefile - module/nvpair/Makefile - module/os/linux/spl/Makefile - module/os/linux/zfs/Makefile - module/spl/Makefile - module/unicode/Makefile - module/zcommon/Makefile - module/zfs/Makefile - module/zstd/Makefile rpm/Makefile rpm/generic/Makefile rpm/generic/zfs-dkms.spec diff --git a/contrib/dracut/90zfs/module-setup.sh.in b/contrib/dracut/90zfs/module-setup.sh.in index c79323a3e613..f2145861d497 100755 --- a/contrib/dracut/90zfs/module-setup.sh.in +++ b/contrib/dracut/90zfs/module-setup.sh.in @@ -9,13 +9,10 @@ check() { for tool in "zgenhostid" "zpool" "zfs" "mount.zfs"; do command -v "${tool}" >/dev/null || return 1 done - - return 0 } depends() { echo udev-rules - return 0 } installkernel() { @@ -39,7 +36,6 @@ install() { { dfatal "Failed to install essential binaries"; exit 1; } # Adapted from https://github.com/zbm-dev/zfsbootmenu - if ! ldd "$(command -v zpool)" | grep -qF 'libgcc_s.so'; then # On systems with gcc-config (Gentoo, Funtoo, etc.), use it to find libgcc_s if command -v gcc-config >/dev/null; then @@ -79,7 +75,6 @@ install() { fi if dracut_module_included "systemd"; then - inst_simple "${systemdsystemunitdir}/zfs-import.target" systemctl -q --root "${initdir}" add-wants initrd.target zfs-import.target diff --git a/contrib/dracut/90zfs/mount-zfs.sh.in b/contrib/dracut/90zfs/mount-zfs.sh.in index 7e11c9afdaee..fa9f1bb767b8 100755 --- a/contrib/dracut/90zfs/mount-zfs.sh.in +++ b/contrib/dracut/90zfs/mount-zfs.sh.in @@ -3,46 +3,71 @@ . /lib/dracut-zfs-lib.sh -ZFS_DATASET="" -ZFS_POOL="" - -case "${root}" in - zfs:*) ;; - *) return ;; -esac +decode_root_args || return 0 GENERATOR_FILE=/run/systemd/generator/sysroot.mount GENERATOR_EXTENSION=/run/systemd/generator/sysroot.mount.d/zfs-enhancement.conf -if [ -e "$GENERATOR_FILE" ] && [ -e "$GENERATOR_EXTENSION" ] ; then - # If the ZFS sysroot.mount flag exists, the initial RAM disk configured - # it to mount ZFS on root. In that case, we bail early. This flag - # file gets created by the zfs-generator program upon successful run. - info "ZFS: There is a sysroot.mount and zfs-generator has extended it." - info "ZFS: Delegating root mount to sysroot.mount." - # Let us tell the initrd to run on shutdown. - # We have a shutdown hook to run - # because we imported the pool. +if [ -e "$GENERATOR_FILE" ] && [ -e "$GENERATOR_EXTENSION" ]; then + # We're under systemd and dracut-zfs-generator ran to completion. + info "ZFS: Delegating root mount to sysroot.mount at al." + # We now prevent Dracut from running this thing again. - for zfsmounthook in "$hookdir"/mount/*zfs* ; do - if [ -f "$zfsmounthook" ] ; then - rm -f "$zfsmounthook" - fi - done + rm -f "$hookdir"/mount/*zfs* return fi + info "ZFS: No sysroot.mount exists or zfs-generator did not extend it." info "ZFS: Mounting root with the traditional mount-zfs.sh instead." +# ask_for_password tries prompt cmd +# +# Wraps around plymouth ask-for-password and adds fallback to tty password ask +# if plymouth is not present. +ask_for_password() { + tries="$1" + prompt="$2" + cmd="$3" + + { + flock -s 9 + + # Prompt for password with plymouth, if installed and running. + if plymouth --ping 2>/dev/null; then + plymouth ask-for-password \ + --prompt "$prompt" --number-of-tries="$tries" | \ + eval "$cmd" + ret=$? + else + i=1 + while [ "$i" -le "$tries" ]; do + printf "%s [%i/%i]:" "$prompt" "$i" "$tries" >&2 + eval "$cmd" && ret=0 && break + ret=$? + i=$((i+1)) + printf '\n' >&2 + done + unset i + fi + } 9>/.console_lock + + [ "$ret" -ne 0 ] && echo "Wrong password" >&2 + return "$ret" +} + + # Delay until all required block devices are present. modprobe zfs 2>/dev/null udevadm settle +ZFS_DATASET= +ZFS_POOL= + if [ "${root}" = "zfs:AUTO" ] ; then - if ! ZFS_DATASET="$(find_bootfs)" ; then + if ! ZFS_DATASET="$(zpool get -Ho value bootfs | grep -m1 -vFx -)"; then # shellcheck disable=SC2086 zpool import -N -a ${ZPOOL_IMPORT_OPTS} - if ! ZFS_DATASET="$(find_bootfs)" ; then + if ! ZFS_DATASET="$(zpool get -Ho value bootfs | grep -m1 -vFx -)"; then warn "ZFS: No bootfs attribute found in importable pools." zpool export -aF @@ -53,34 +78,43 @@ if [ "${root}" = "zfs:AUTO" ] ; then info "ZFS: Using ${ZFS_DATASET} as root." fi -ZFS_DATASET="${ZFS_DATASET:-${root#zfs:}}" +ZFS_DATASET="${ZFS_DATASET:-${root}}" ZFS_POOL="${ZFS_DATASET%%/*}" -if import_pool "${ZFS_POOL}" ; then - # Load keys if we can or if we need to - if [ "$(zpool list -H -o feature@encryption "${ZFS_POOL}")" = 'active' ]; then - # if the root dataset has encryption enabled - ENCRYPTIONROOT="$(zfs get -H -o value encryptionroot "${ZFS_DATASET}")" - if ! [ "${ENCRYPTIONROOT}" = "-" ]; then - KEYSTATUS="$(zfs get -H -o value keystatus "${ENCRYPTIONROOT}")" - # if the key needs to be loaded - if [ "$KEYSTATUS" = "unavailable" ]; then - # decrypt them - ask_for_password \ - --tries 5 \ - --prompt "Encrypted ZFS password for ${ENCRYPTIONROOT}: " \ - --cmd "zfs load-key '${ENCRYPTIONROOT}'" - fi + +if ! zpool get -Ho name "${ZFS_POOL}" > /dev/null 2>&1; then + info "ZFS: Importing pool ${ZFS_POOL}..." + # shellcheck disable=SC2086 + if ! zpool import -N ${ZPOOL_IMPORT_OPTS} "${ZFS_POOL}"; then + warn "ZFS: Unable to import pool ${ZFS_POOL}" + rootok=0 + return 1 + fi +fi + +# Load keys if we can or if we need to +# TODO: for_relevant_root_children like in zfs-load-key.sh.in +if [ "$(zpool get -Ho value feature@encryption "${ZFS_POOL}")" = 'active' ]; then + # if the root dataset has encryption enabled + ENCRYPTIONROOT="$(zfs get -Ho value encryptionroot "${ZFS_DATASET}")" + if ! [ "${ENCRYPTIONROOT}" = "-" ]; then + KEYSTATUS="$(zfs get -Ho value keystatus "${ENCRYPTIONROOT}")" + # if the key needs to be loaded + if [ "$KEYSTATUS" = "unavailable" ]; then + # decrypt them + ask_for_password \ + 5 \ + "Encrypted ZFS password for ${ENCRYPTIONROOT}: " \ + "zfs load-key '${ENCRYPTIONROOT}'" fi fi - # Let us tell the initrd to run on shutdown. - # We have a shutdown hook to run - # because we imported the pool. - info "ZFS: Mounting dataset ${ZFS_DATASET}..." - if mount_dataset "${ZFS_DATASET}" ; then - ROOTFS_MOUNTED=yes - return 0 - fi fi -rootok=0 +# Let us tell the initrd to run on shutdown. +# We have a shutdown hook to run +# because we imported the pool. +info "ZFS: Mounting dataset ${ZFS_DATASET}..." +if ! mount_dataset "${ZFS_DATASET}"; then + rootok=0 + return 1 +fi diff --git a/contrib/dracut/90zfs/parse-zfs.sh.in b/contrib/dracut/90zfs/parse-zfs.sh.in index 724c5e2c6dff..f7d1f1c5da9f 100755 --- a/contrib/dracut/90zfs/parse-zfs.sh.in +++ b/contrib/dracut/90zfs/parse-zfs.sh.in @@ -1,7 +1,8 @@ #!/bin/sh # shellcheck disable=SC2034,SC2154 -. /lib/dracut-lib.sh +# shellcheck source=zfs-lib.sh.in +. /lib/dracut-zfs-lib.sh # Let the command line override our host id. spl_hostid=$(getarg spl_hostid=) @@ -15,52 +16,20 @@ else warn "ZFS: Pools may not import correctly." fi -wait_for_zfs=0 -case "${root}" in - ""|zfs|zfs:) - # We'll take root unset, root=zfs, or root=zfs: - # No root set, so we want to read the bootfs attribute. We - # can't do that until udev settles so we'll set dummy values - # and hope for the best later on. - root="zfs:AUTO" - rootok=1 - wait_for_zfs=1 - - info "ZFS: Enabling autodetection of bootfs after udev settles." - ;; - - ZFS=*|zfs:*|FILESYSTEM=*) - # root is explicit ZFS root. Parse it now. We can handle - # a root=... param in any of the following formats: - # root=ZFS=rpool/ROOT - # root=zfs:rpool/ROOT - # root=zfs:FILESYSTEM=rpool/ROOT - # root=FILESYSTEM=rpool/ROOT - # root=ZFS=pool+with+space/ROOT+WITH+SPACE (translates to root=ZFS=pool with space/ROOT WITH SPACE) - - # Strip down to just the pool/fs - root="${root#zfs:}" - root="${root#FILESYSTEM=}" - root="zfs:${root#ZFS=}" - # switch + with spaces because kernel cmdline does not allow us to quote parameters - root=$(echo "$root" | tr '+' ' ') - rootok=1 - wait_for_zfs=1 - - info "ZFS: Set ${root} as bootfs." - ;; - - *) - info "ZFS: no ZFS-on-root" -esac - -# Make sure Dracut is happy that we have a root and will wait for ZFS -# modules to settle before mounting. -if [ "${wait_for_zfs}" -eq 1 ]; then - ln -s /dev/null /dev/root 2>/dev/null - initqueuedir="${hookdir}/initqueue/finished" - test -d "${initqueuedir}" || { - initqueuedir="${hookdir}/initqueue-finished" - } - echo '[ -e /dev/zfs ]' > "${initqueuedir}/zfs.sh" +if decode_root_args; then + if [ "$root" = "zfs:AUTO" ]; then + info "ZFS: Boot dataset autodetected from bootfs=." + else + info "ZFS: Boot dataset is ${root}." + fi + + rootok=1 + # Make sure Dracut is happy that we have a root and will wait for ZFS + # modules to settle before mounting. + if [ -n "${wait_for_zfs}" ]; then + ln -s null /dev/root + echo '[ -e /dev/zfs ]' > "${hookdir}/initqueue/finished/zfs.sh" + fi +else + info "ZFS: no ZFS-on-root." fi diff --git a/contrib/dracut/90zfs/zfs-env-bootfs.service.in b/contrib/dracut/90zfs/zfs-env-bootfs.service.in index e143cb5ec1ed..34c88037cac2 100644 --- a/contrib/dracut/90zfs/zfs-env-bootfs.service.in +++ b/contrib/dracut/90zfs/zfs-env-bootfs.service.in @@ -8,7 +8,7 @@ Before=zfs-import.target [Service] Type=oneshot -ExecStart=/bin/sh -c "exec systemctl set-environment BOOTFS=$(@sbindir@/zpool list -H -o bootfs | grep -m1 -v '^-$')" +ExecStart=/bin/sh -c "exec systemctl set-environment BOOTFS=$(@sbindir@/zpool list -H -o bootfs | grep -m1 -vFx -)" [Install] WantedBy=zfs-import.target diff --git a/contrib/dracut/90zfs/zfs-generator.sh.in b/contrib/dracut/90zfs/zfs-generator.sh.in index e50b9530c4f0..56f7ca9785ba 100755 --- a/contrib/dracut/90zfs/zfs-generator.sh.in +++ b/contrib/dracut/90zfs/zfs-generator.sh.in @@ -1,5 +1,5 @@ #!/bin/sh -# shellcheck disable=SC2016,SC1004 +# shellcheck disable=SC2016,SC1004,SC2154 grep -wq debug /proc/cmdline && debug=1 [ -n "$debug" ] && echo "zfs-generator: starting" >> /dev/kmsg @@ -10,37 +10,17 @@ GENERATOR_DIR="$1" exit 1 } -[ -f /lib/dracut-lib.sh ] && dracutlib=/lib/dracut-lib.sh -[ -f /usr/lib/dracut/modules.d/99base/dracut-lib.sh ] && dracutlib=/usr/lib/dracut/modules.d/99base/dracut-lib.sh -command -v getarg >/dev/null 2>&1 || { - [ -n "$debug" ] && echo "zfs-generator: loading Dracut library from $dracutlib" >> /dev/kmsg - . "$dracutlib" -} - +# shellcheck source=zfs-lib.sh.in . /lib/dracut-zfs-lib.sh +decode_root_args || exit 0 -[ -z "$root" ] && root=$(getarg root=) -[ -z "$rootfstype" ] && rootfstype=$(getarg rootfstype=) -[ -z "$rootflags" ] && rootflags=$(getarg rootflags=) - -# If root is not ZFS= or zfs: or rootfstype is not zfs -# then we are not supposed to handle it. -[ "${root##zfs:}" = "${root}" ] && - [ "${root##ZFS=}" = "${root}" ] && - [ "$rootfstype" != "zfs" ] && - exit 0 - +[ -z "${rootflags}" ] && rootflags=$(getarg rootflags=) case ",${rootflags}," in *,zfsutil,*) ;; ,,) rootflags=zfsutil ;; *) rootflags="zfsutil,${rootflags}" ;; esac -if [ "${root}" != "zfs:AUTO" ]; then - root="${root##zfs:}" - root="${root##ZFS=}" -fi - [ -n "$debug" ] && echo "zfs-generator: writing extension for sysroot.mount to $GENERATOR_DIR/sysroot.mount.d/zfs-enhancement.conf" >> /dev/kmsg @@ -89,7 +69,7 @@ else _zfs_generator_cb() { dset="${1}" mpnt="${2}" - unit="sysroot$(echo "$mpnt" | tr '/' '-').mount" + unit="$(systemd-escape --suffix=mount -p "/sysroot${mpnt}")" { echo "[Unit]" diff --git a/contrib/dracut/90zfs/zfs-lib.sh.in b/contrib/dracut/90zfs/zfs-lib.sh.in index afd872d69d58..e44673c2d75b 100755 --- a/contrib/dracut/90zfs/zfs-lib.sh.in +++ b/contrib/dracut/90zfs/zfs-lib.sh.in @@ -1,74 +1,16 @@ #!/bin/sh +# shellcheck disable=SC2034 -command -v getarg >/dev/null || . /lib/dracut-lib.sh -command -v getargbool >/dev/null || { - # Compatibility with older Dracut versions. - # With apologies to the Dracut developers. - getargbool() { - _default="$1"; shift - ! _b=$(getarg "$@") && [ -z "$_b" ] && _b="$_default" - if [ -n "$_b" ]; then - [ "$_b" = "0" ] && return 1 - [ "$_b" = "no" ] && return 1 - [ "$_b" = "off" ] && return 1 - fi - return 0 - } -} +command -v getarg >/dev/null || . /lib/dracut-lib.sh || . /usr/lib/dracut/modules.d/99base/dracut-lib.sh -OLDIFS="${IFS}" -NEWLINE=" -" TAB=" " -ZPOOL_IMPORT_OPTS="" -if getargbool 0 zfs_force -y zfs.force -y zfsforce ; then +ZPOOL_IMPORT_OPTS= +if getargbool 0 zfs_force -y zfs.force -y zfsforce; then warn "ZFS: Will force-import pools if necessary." - ZPOOL_IMPORT_OPTS="${ZPOOL_IMPORT_OPTS} -f" + ZPOOL_IMPORT_OPTS=-f fi -# find_bootfs -# returns the first dataset with the bootfs attribute. -find_bootfs() { - IFS="${NEWLINE}" - for dataset in $(zpool list -H -o bootfs); do - case "${dataset}" in - "" | "-") - continue - ;; - "no pools available") - IFS="${OLDIFS}" - return 1 - ;; - *) - IFS="${OLDIFS}" - echo "${dataset}" - return 0 - ;; - esac - done - - IFS="${OLDIFS}" - return 1 -} - -# import_pool POOL -# imports the given zfs pool if it isn't imported already. -import_pool() { - pool="${1}" - - if ! zpool list -H "${pool}" > /dev/null 2>&1; then - info "ZFS: Importing pool ${pool}..." - # shellcheck disable=SC2086 - if ! zpool import -N ${ZPOOL_IMPORT_OPTS} "${pool}" ; then - warn "ZFS: Unable to import pool ${pool}" - return 1 - fi - fi - - return 0 -} - _mount_dataset_cb() { # shellcheck disable=SC2154 mount -o zfsutil -t zfs "${1}" "${NEWROOT}${2}" @@ -122,72 +64,57 @@ for_relevant_root_children() { ) } -# ask_for_password +# Parse root=, rootfstype=, return them decoded and normalised to zfs:AUTO for auto, plain dset for explicit +# +# True if ZFS-on-root, false if we shouldn't +# +# Supported values: +# root= +# root=zfs +# root=zfs: +# root=zfs:AUTO # -# Wraps around plymouth ask-for-password and adds fallback to tty password ask -# if plymouth is not present. +# root=ZFS=data/set +# root=zfs:data/set +# root=zfs:ZFS=data/set (as a side-effect; allowed but undocumented) # -# --cmd command -# Command to execute. Required. -# --prompt prompt -# Password prompt. Note that function already adds ':' at the end. -# Recommended. -# --tries n -# How many times repeat command on its failure. Default is 3. -# --ply-[cmd|prompt|tries] -# Command/prompt/tries specific for plymouth password ask only. -# --tty-[cmd|prompt|tries] -# Command/prompt/tries specific for tty password ask only. -# --tty-echo-off -# Turn off input echo before tty command is executed and turn on after. -# It's useful when password is read from stdin. -ask_for_password() { - ply_tries=3 - tty_tries=3 - while [ "$#" -gt 0 ]; do - case "$1" in - --cmd) ply_cmd="$2"; tty_cmd="$2"; shift;; - --ply-cmd) ply_cmd="$2"; shift;; - --tty-cmd) tty_cmd="$2"; shift;; - --prompt) ply_prompt="$2"; tty_prompt="$2"; shift;; - --ply-prompt) ply_prompt="$2"; shift;; - --tty-prompt) tty_prompt="$2"; shift;; - --tries) ply_tries="$2"; tty_tries="$2"; shift;; - --ply-tries) ply_tries="$2"; shift;; - --tty-tries) tty_tries="$2"; shift;; - --tty-echo-off) tty_echo_off=yes;; - *) echo "ask_for_password(): wrong opt '$1'" >&2;; +# rootfstype=zfs AND root=data/set <=> root=data/set +# rootfstype=zfs AND root= <=> root=zfs:AUTO +# +# '+'es in explicit dataset decoded to ' 's. +decode_root_args() { + if [ -n "$rootfstype" ]; then + [ "$rootfstype" = zfs ] + return + fi + + root=$(getarg root=) + rootfstype=$(getarg rootfstype=) + + # shellcheck disable=SC2249 + case "$root" in + ""|zfs|zfs:|zfs:AUTO) + root=zfs:AUTO + rootfstype=zfs + return 0 + ;; + + ZFS=*|zfs:*) + root="${root#zfs:}" + root="${root#ZFS=}" + root=$(echo "$root" | tr '+' ' ') + rootfstype=zfs + return 0 + ;; + esac + + if [ "$rootfstype" = "zfs" ]; then + case "$root" in + "") root=zfs:AUTO ;; + *) root=$(echo "$root" | tr '+' ' ') ;; esac - shift - done - - { flock -s 9; - # Prompt for password with plymouth, if installed and running. - if plymouth --ping 2>/dev/null; then - plymouth ask-for-password \ - --prompt "$ply_prompt" --number-of-tries="$ply_tries" | \ - eval "$ply_cmd" - ret=$? - else - if [ "$tty_echo_off" = yes ]; then - stty_orig="$(stty -g)" - stty -echo - fi - - i=1 - while [ "$i" -le "$tty_tries" ]; do - [ -n "$tty_prompt" ] && \ - printf "%s [%i/%i]:" "$tty_prompt" "$i" "$tty_tries" >&2 - eval "$tty_cmd" && ret=0 && break - ret=$? - i=$((i+1)) - [ -n "$tty_prompt" ] && printf '\n' >&2 - done - unset i - [ "$tty_echo_off" = yes ] && stty "$stty_orig" - fi - } 9>/.console_lock + return 0 + fi - [ "$ret" -ne 0 ] && echo "Wrong password" >&2 - return "$ret" + return 1 } diff --git a/contrib/dracut/90zfs/zfs-load-key.sh.in b/contrib/dracut/90zfs/zfs-load-key.sh.in index c974b3d9ec4c..d916f43b4e95 100755 --- a/contrib/dracut/90zfs/zfs-load-key.sh.in +++ b/contrib/dracut/90zfs/zfs-load-key.sh.in @@ -4,70 +4,61 @@ # only run this on systemd systems, we handle the decrypt in mount-zfs.sh in the mount hook otherwise [ -e /bin/systemctl ] || [ -e /usr/bin/systemctl ] || return 0 -# This script only gets executed on systemd systems, see mount-zfs.sh for non-systemd systems +# shellcheck source=zfs-lib.sh.in +. /lib/dracut-zfs-lib.sh -# import the libs now that we know the pool imported -[ -f /lib/dracut-lib.sh ] && dracutlib=/lib/dracut-lib.sh -[ -f /usr/lib/dracut/modules.d/99base/dracut-lib.sh ] && dracutlib=/usr/lib/dracut/modules.d/99base/dracut-lib.sh -# shellcheck source=./lib-zfs.sh.in -. "$dracutlib" - -# load the kernel command line vars -[ -z "$root" ] && root="$(getarg root=)" -# If root is not ZFS= or zfs: or rootfstype is not zfs then we are not supposed to handle it. -[ "${root##zfs:}" = "${root}" ] && [ "${root##ZFS=}" = "${root}" ] && [ "$rootfstype" != "zfs" ] && exit 0 +decode_root_args || return 0 # There is a race between the zpool import and the pre-mount hooks, so we wait for a pool to be imported -while [ "$(zpool list -H)" = "" ]; do - systemctl is-failed --quiet zfs-import-cache.service zfs-import-scan.service && exit 1 +while ! systemctl is-active --quiet zfs-import.target; do + systemctl is-failed --quiet zfs-import-cache.service zfs-import-scan.service && return 1 sleep 0.1s done -# run this after import as zfs-import-cache/scan service is confirmed good -# we do not overwrite the ${root} variable, but create a new one, BOOTFS, to hold the dataset -if [ "${root}" = "zfs:AUTO" ] ; then - BOOTFS="$(zpool list -H -o bootfs | awk '$1 != "-" {print; exit}')" -else - BOOTFS="${root##zfs:}" - BOOTFS="${BOOTFS##ZFS=}" +BOOTFS="$root" +if [ "$BOOTFS" = "zfs:AUTO" ]; then + BOOTFS="$(zpool get -Ho value bootfs | grep -m1 -vFx -)" fi -# if pool encryption is active and the zfs command understands '-o encryption' -if [ "$(zpool list -H -o feature@encryption "${BOOTFS%%/*}")" = 'active' ]; then - # if the root dataset has encryption enabled - ENCRYPTIONROOT="$(zfs get -H -o value encryptionroot "${BOOTFS}")" - if ! [ "${ENCRYPTIONROOT}" = "-" ]; then - KEYSTATUS="$(zfs get -H -o value keystatus "${ENCRYPTIONROOT}")" - # continue only if the key needs to be loaded - [ "$KEYSTATUS" = "unavailable" ] || exit 0 +[ "$(zpool get -Ho value feature@encryption "${BOOTFS%%/*}")" = 'active' ] || return 0 + +_load_key_cb() { + dataset="$1" + + ENCRYPTIONROOT="$(zfs get -Ho value encryptionroot "${dataset}")" + [ "${ENCRYPTIONROOT}" = "-" ] && return 0 - KEYLOCATION="$(zfs get -H -o value keylocation "${ENCRYPTIONROOT}")" - case "${KEYLOCATION%%://*}" in - prompt) - for _ in 1 2 3; do - systemd-ask-password --no-tty "Encrypted ZFS password for ${BOOTFS}" | zfs load-key "${ENCRYPTIONROOT}" && break + [ "$(zfs get -Ho value keystatus "${ENCRYPTIONROOT}")" = "unavailable" ] || return 0 + + KEYLOCATION="$(zfs get -Ho value keylocation "${ENCRYPTIONROOT}")" + case "${KEYLOCATION%%://*}" in + prompt) + for _ in 1 2 3; do + systemd-ask-password --no-tty "Encrypted ZFS password for ${dataset}" | zfs load-key "${ENCRYPTIONROOT}" && break + done + ;; + http*) + systemctl start network-online.target + zfs load-key "${ENCRYPTIONROOT}" + ;; + file) + KEYFILE="${KEYLOCATION#file://}" + [ -r "${KEYFILE}" ] || udevadm settle + [ -r "${KEYFILE}" ] || { + info "ZFS: Waiting for key ${KEYFILE} for ${ENCRYPTIONROOT}..." + for _ in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do + sleep 0.5s + [ -r "${KEYFILE}" ] && break done - ;; - http*) - systemctl start network-online.target - zfs load-key "${ENCRYPTIONROOT}" - ;; - file) - KEYFILE="${KEYLOCATION#file://}" - [ -r "${KEYFILE}" ] || udevadm settle - [ -r "${KEYFILE}" ] || { - info "Waiting for key ${KEYFILE} for ${ENCRYPTIONROOT}..." - for _ in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do - sleep 0.5s - [ -r "${KEYFILE}" ] && break - done - } - [ -r "${KEYFILE}" ] || warn "Key ${KEYFILE} for ${ENCRYPTIONROOT} hasn't appeared. Trying anyway." - zfs load-key "${ENCRYPTIONROOT}" - ;; - *) - zfs load-key "${ENCRYPTIONROOT}" - ;; - esac - fi -fi + } + [ -r "${KEYFILE}" ] || warn "ZFS: Key ${KEYFILE} for ${ENCRYPTIONROOT} hasn't appeared. Trying anyway." + zfs load-key "${ENCRYPTIONROOT}" + ;; + *) + zfs load-key "${ENCRYPTIONROOT}" + ;; + esac +} + +_load_key_cb "$BOOTFS" +for_relevant_root_children "$BOOTFS" _load_key_cb diff --git a/contrib/dracut/90zfs/zfs-needshutdown.sh.in b/contrib/dracut/90zfs/zfs-needshutdown.sh.in index dd6de30c2704..7fb825bc95a2 100755 --- a/contrib/dracut/90zfs/zfs-needshutdown.sh.in +++ b/contrib/dracut/90zfs/zfs-needshutdown.sh.in @@ -2,7 +2,7 @@ command -v getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh -if zpool list 2>&1 | grep -q 'no pools available' ; then +if [ -z "$(zpool get -Ho value name)" ]; then info "ZFS: No active pools, no need to export anything." else info "ZFS: There is an active pool, will export it." diff --git a/contrib/dracut/90zfs/zfs-rollback-bootfs.service.in b/contrib/dracut/90zfs/zfs-rollback-bootfs.service.in index 477b64f2b750..b4f5707516ce 100644 --- a/contrib/dracut/90zfs/zfs-rollback-bootfs.service.in +++ b/contrib/dracut/90zfs/zfs-rollback-bootfs.service.in @@ -1,14 +1,12 @@ [Unit] Description=Rollback bootfs just before it is mounted Requisite=zfs-import.target -After=zfs-import.target zfs-snapshot-bootfs.service +After=zfs-import.target dracut-pre-mount.service zfs-snapshot-bootfs.service Before=dracut-mount.service DefaultDependencies=no ConditionKernelCommandLine=bootfs.rollback [Service] -# ${BOOTFS} should have been set by zfs-env-bootfs.service Type=oneshot -ExecStartPre=/bin/test -n ${BOOTFS} -ExecStart=/bin/sh -c '. /lib/dracut-lib.sh; SNAPNAME="$(getarg bootfs.rollback)"; exec @sbindir@/zfs rollback -Rf "${BOOTFS}@${SNAPNAME:-%v}"' +ExecStart=/bin/sh -c '. /lib/dracut-zfs-lib.sh; decode_root_args || exit; [ "$root" = "zfs:AUTO" ] && root="$BOOTFS" SNAPNAME="$(getarg bootfs.rollback)"; exec @sbindir@/zfs rollback -Rf "$root@${SNAPNAME:-%v}"' RemainAfterExit=yes diff --git a/contrib/dracut/90zfs/zfs-snapshot-bootfs.service.in b/contrib/dracut/90zfs/zfs-snapshot-bootfs.service.in index 8eae04adfb99..afdba2c9d194 100644 --- a/contrib/dracut/90zfs/zfs-snapshot-bootfs.service.in +++ b/contrib/dracut/90zfs/zfs-snapshot-bootfs.service.in @@ -1,14 +1,12 @@ [Unit] Description=Snapshot bootfs just before it is mounted Requisite=zfs-import.target -After=zfs-import.target +After=zfs-import.target dracut-pre-mount.service Before=dracut-mount.service DefaultDependencies=no ConditionKernelCommandLine=bootfs.snapshot [Service] -# ${BOOTFS} should have been set by zfs-env-bootfs.service Type=oneshot -ExecStartPre=/bin/test -n ${BOOTFS} -ExecStart=-/bin/sh -c '. /lib/dracut-lib.sh; SNAPNAME="$(getarg bootfs.snapshot)"; exec @sbindir@/zfs snapshot "${BOOTFS}@${SNAPNAME:-%v}"' +ExecStart=/bin/sh -c '. /lib/dracut-zfs-lib.sh; decode_root_args || exit; [ "$root" = "zfs:AUTO" ] && root="$BOOTFS" SNAPNAME="$(getarg bootfs.snapshot)"; exec @sbindir@/zfs snapshot "$root@${SNAPNAME:-%v}"' RemainAfterExit=yes diff --git a/contrib/dracut/README.md b/contrib/dracut/README.md index fc3d504ef705..b7cd8c8125eb 100644 --- a/contrib/dracut/README.md +++ b/contrib/dracut/README.md @@ -15,19 +15,21 @@ Encrypted datasets have keys loaded automatically or prompted for. If the root dataset contains children with `mountpoint=`s of `/etc`, `/bin`, `/lib*`, or `/usr`, they're mounted too. +For complete documentation, see `dracut.zfs(7)`. + ## cmdline -1. `root=` | Root dataset is… | Pools imported | - -------------------|----------------------------------------------------------|----------------| - *(empty)* | the first `bootfs=` after `zpool import -aN` | all | - `zfs:AUTO` | *(as above, but overriding other autoselection methods)* | all | - `ZFS=pool/dataset` | `pool/dataset` | `pool` | - `zfs:pool/dataset` | *(as above)* | `pool` | +1. `root=` | Root dataset is… | + ---------------------------|----------------------------------------------------------| + *(empty)* | the first `bootfs=` after `zpool import -aN` | + `zfs:AUTO`, `zfs:`, `zfs` | *(as above, but overriding other autoselection methods)* | + `ZFS=pool/dataset` | `pool/dataset` | + `zfs:pool/dataset` | *(as above)* | All `+`es are replaced with spaces (i.e. to boot from `root pool/data set`, pass `root=zfs:root+pool/data+set`). The dataset can be at any depth, including being the pool's root dataset (i.e. `root=zfs:pool`). - `rootfstype=zfs` is mostly equivalent to `root=zfs:AUTO`. + `rootfstype=zfs` is equivalent to `root=zfs:AUTO`, `rootfstype=zfs root=pool/dataset` is equivalent to `root=zfs:pool/dataset`. 2. `spl_hostid`: passed to `zgenhostid -f`, useful to override the `/etc/hostid` file baked into the initrd. diff --git a/etc/init.d/zfs-zed.in b/etc/init.d/zfs-zed.in index 47f742259b27..e9cf8867403c 100755 --- a/etc/init.d/zfs-zed.in +++ b/etc/init.d/zfs-zed.in @@ -69,8 +69,7 @@ do_stop() then # No pools imported, it is/should be safe/possible to # unload modules. - zfs_action "Unloading modules" rmmod zfs zunicode \ - zavl zcommon znvpair zlua spl + zfs_action "Unloading modules" rmmod zfs spl return "$?" fi } diff --git a/include/os/freebsd/spl/sys/ccompile.h b/include/os/freebsd/spl/sys/ccompile.h index 23e637983475..a46a3a18be14 100644 --- a/include/os/freebsd/spl/sys/ccompile.h +++ b/include/os/freebsd/spl/sys/ccompile.h @@ -42,9 +42,6 @@ extern "C" { #endif #define EXPORT_SYMBOL(x) -#define MODULE_AUTHOR(s) -#define MODULE_DESCRIPTION(s) -#define MODULE_LICENSE(s) #define module_param(a, b, c) #define module_param_call(a, b, c, d, e) #define module_param_named(a, b, c, d) diff --git a/include/os/freebsd/spl/sys/mod_os.h b/include/os/freebsd/spl/sys/mod_os.h index 293bd7d2b983..3a9ebbfc3bc4 100644 --- a/include/os/freebsd/spl/sys/mod_os.h +++ b/include/os/freebsd/spl/sys/mod_os.h @@ -31,11 +31,6 @@ #include -#define ZFS_MODULE_DESCRIPTION(s) -#define ZFS_MODULE_AUTHOR(s) -#define ZFS_MODULE_LICENSE(s) -#define ZFS_MODULE_VERSION(s) - #define EXPORT_SYMBOL(x) #define module_param(a, b, c) #define MODULE_PARM_DESC(a, b) diff --git a/include/os/linux/kernel/linux/mod_compat.h b/include/os/linux/kernel/linux/mod_compat.h index 68a16f228e37..16002b99873e 100644 --- a/include/os/linux/kernel/linux/mod_compat.h +++ b/include/os/linux/kernel/linux/mod_compat.h @@ -161,11 +161,4 @@ enum scope_prefix_types { #define ZFS_MODULE_PARAM_ARGS const char *buf, zfs_kernel_param_t *kp -#define ZFS_MODULE_DESCRIPTION(s) MODULE_DESCRIPTION(s) -#define ZFS_MODULE_AUTHOR(s) MODULE_AUTHOR(s) -#define ZFS_MODULE_LICENSE(s) MODULE_LICENSE(s) -#define ZFS_MODULE_VERSION(s) MODULE_VERSION(s) - -#define module_init_early(fn) module_init(fn) - #endif /* _MOD_COMPAT_H */ diff --git a/include/sys/dmu.h b/include/sys/dmu.h index 13fc0b6fada7..ee6b68f852b3 100644 --- a/include/sys/dmu.h +++ b/include/sys/dmu.h @@ -1080,6 +1080,8 @@ int dmu_diff(const char *tosnap_name, const char *fromsnap_name, #define ZFS_CRC64_POLY 0xC96C5795D7870F42ULL /* ECMA-182, reflected form */ extern uint64_t zfs_crc64_table[256]; +extern int dmu_prefetch_max; + #ifdef __cplusplus } #endif diff --git a/include/sys/dsl_dataset.h b/include/sys/dsl_dataset.h index 02147171ae14..a8ca7444a3f8 100644 --- a/include/sys/dsl_dataset.h +++ b/include/sys/dsl_dataset.h @@ -487,6 +487,9 @@ boolean_t dsl_dataset_get_uint64_array_feature(dsl_dataset_t *ds, void dsl_dataset_activate_redaction(dsl_dataset_t *ds, uint64_t *redact_snaps, uint64_t num_redact_snaps, dmu_tx_t *tx); +int dsl_dataset_oldest_snapshot(spa_t *spa, uint64_t head_ds, uint64_t min_txg, + uint64_t *oldest_dsobj); + #ifdef ZFS_DEBUG #define dprintf_ds(ds, fmt, ...) do { \ if (zfs_flags & ZFS_DEBUG_DPRINTF) { \ diff --git a/include/sys/metaslab.h b/include/sys/metaslab.h index 759d55874e43..df70e13ffe62 100644 --- a/include/sys/metaslab.h +++ b/include/sys/metaslab.h @@ -51,11 +51,14 @@ int metaslab_init(metaslab_group_t *, uint64_t, uint64_t, uint64_t, metaslab_t **); void metaslab_fini(metaslab_t *); +void metaslab_set_unflushed_dirty(metaslab_t *, boolean_t); void metaslab_set_unflushed_txg(metaslab_t *, uint64_t, dmu_tx_t *); void metaslab_set_estimated_condensed_size(metaslab_t *, uint64_t, dmu_tx_t *); +boolean_t metaslab_unflushed_dirty(metaslab_t *); uint64_t metaslab_unflushed_txg(metaslab_t *); uint64_t metaslab_estimated_condensed_size(metaslab_t *); int metaslab_sort_by_flushed(const void *, const void *); +void metaslab_unflushed_bump(metaslab_t *, dmu_tx_t *, boolean_t); uint64_t metaslab_unflushed_changes_memused(metaslab_t *); int metaslab_load(metaslab_t *); diff --git a/include/sys/metaslab_impl.h b/include/sys/metaslab_impl.h index 3dbee4c17fef..820c61a252e2 100644 --- a/include/sys/metaslab_impl.h +++ b/include/sys/metaslab_impl.h @@ -553,6 +553,7 @@ struct metaslab { * log space maps. */ uint64_t ms_unflushed_txg; + boolean_t ms_unflushed_dirty; /* updated every time we are done syncing the metaslab's space map */ uint64_t ms_synced_length; diff --git a/include/sys/mod.h b/include/sys/mod.h index a5a73ed0ee00..aba211423773 100644 --- a/include/sys/mod.h +++ b/include/sys/mod.h @@ -30,11 +30,6 @@ * Exported symbols */ #define EXPORT_SYMBOL(x) - -#define ZFS_MODULE_DESCRIPTION(s) -#define ZFS_MODULE_AUTHOR(s) -#define ZFS_MODULE_LICENSE(s) -#define ZFS_MODULE_VERSION(s) #endif #endif /* SYS_MOD_H */ diff --git a/include/sys/spa.h b/include/sys/spa.h index 1a61a6ee2504..74b6ecd40cbd 100644 --- a/include/sys/spa.h +++ b/include/sys/spa.h @@ -1150,11 +1150,17 @@ extern void zfs_post_remove(spa_t *spa, vdev_t *vd); extern void zfs_post_state_change(spa_t *spa, vdev_t *vd, uint64_t laststate); extern void zfs_post_autoreplace(spa_t *spa, vdev_t *vd); extern uint64_t spa_get_errlog_size(spa_t *spa); -extern int spa_get_errlog(spa_t *spa, void *uaddr, size_t *count); +extern int spa_get_errlog(spa_t *spa, void *uaddr, uint64_t *count); extern void spa_errlog_rotate(spa_t *spa); extern void spa_errlog_drain(spa_t *spa); extern void spa_errlog_sync(spa_t *spa, uint64_t txg); extern void spa_get_errlists(spa_t *spa, avl_tree_t *last, avl_tree_t *scrub); +extern void spa_delete_dataset_errlog(spa_t *spa, uint64_t ds, dmu_tx_t *tx); +extern void spa_swap_errlog(spa_t *spa, uint64_t new_head_ds, + uint64_t old_head_ds, dmu_tx_t *tx); +extern void sync_error_list(spa_t *spa, avl_tree_t *t, uint64_t *obj, + dmu_tx_t *tx); +extern void spa_upgrade_errlog(spa_t *spa, dmu_tx_t *tx); /* vdev cache */ extern void vdev_cache_stat_init(void); diff --git a/include/sys/spa_log_spacemap.h b/include/sys/spa_log_spacemap.h index b2ed77fac3e4..72229df6cd16 100644 --- a/include/sys/spa_log_spacemap.h +++ b/include/sys/spa_log_spacemap.h @@ -30,7 +30,10 @@ typedef struct log_summary_entry { uint64_t lse_start; /* start TXG */ + uint64_t lse_end; /* last TXG */ + uint64_t lse_txgcount; /* # of TXGs */ uint64_t lse_mscount; /* # of metaslabs needed to be flushed */ + uint64_t lse_msdcount; /* # of dirty metaslabs needed to be flushed */ uint64_t lse_blkcount; /* blocks held by this entry */ list_node_t lse_node; } log_summary_entry_t; @@ -50,6 +53,7 @@ typedef struct spa_log_sm { uint64_t sls_nblocks; /* number of blocks in this log */ uint64_t sls_mscount; /* # of metaslabs flushed in the log's txg */ avl_node_t sls_node; /* node in spa_sm_logs_by_txg */ + space_map_t *sls_sm; /* space map pointer, if open */ } spa_log_sm_t; int spa_ld_log_spacemaps(spa_t *); @@ -68,8 +72,9 @@ uint64_t spa_log_sm_memused(spa_t *); void spa_log_sm_decrement_mscount(spa_t *, uint64_t); void spa_log_sm_increment_current_mscount(spa_t *); -void spa_log_summary_add_flushed_metaslab(spa_t *); -void spa_log_summary_decrement_mscount(spa_t *, uint64_t); +void spa_log_summary_add_flushed_metaslab(spa_t *, boolean_t); +void spa_log_summary_dirty_flushed_metaslab(spa_t *, uint64_t); +void spa_log_summary_decrement_mscount(spa_t *, uint64_t, boolean_t); void spa_log_summary_decrement_blkcount(spa_t *, uint64_t); boolean_t spa_flush_all_logs_requested(spa_t *); diff --git a/include/sys/zio.h b/include/sys/zio.h index 530098ba3dd0..6d59311f3323 100644 --- a/include/sys/zio.h +++ b/include/sys/zio.h @@ -283,6 +283,13 @@ extern const char *const zio_type_name[ZIO_TYPES]; * Note: this structure is passed between userland and the kernel, and is * stored on disk (by virtue of being incorporated into other on-disk * structures, e.g. dsl_scan_phys_t). + * + * If the head_errlog feature is enabled a different on-disk format for error + * logs is used. This introduces the use of an error bookmark, a four-tuple + * that uniquely identifies any error block + * in the pool. The birth transaction group is used to track whether the block + * has been overwritten by newer data or added to a snapshot since its marking + * as an error. */ struct zbookmark_phys { uint64_t zb_objset; @@ -291,6 +298,13 @@ struct zbookmark_phys { uint64_t zb_blkid; }; +typedef struct zbookmark_err_phys { + uint64_t zb_object; + int64_t zb_level; + uint64_t zb_blkid; + uint64_t zb_birth; +} zbookmark_err_phys_t; + #define SET_BOOKMARK(zb, objset, object, level, blkid) \ { \ (zb)->zb_objset = objset; \ diff --git a/include/zfeature_common.h b/include/zfeature_common.h index 70ca5271ae86..3bd6c7db8bed 100644 --- a/include/zfeature_common.h +++ b/include/zfeature_common.h @@ -76,6 +76,7 @@ typedef enum spa_feature { SPA_FEATURE_ZSTD_COMPRESS, SPA_FEATURE_DRAID, SPA_FEATURE_ZILSAXATTR, + SPA_FEATURE_HEAD_ERRLOG, SPA_FEATURES } spa_feature_t; diff --git a/lib/libnvpair/libnvpair.abi b/lib/libnvpair/libnvpair.abi index f9874da81f82..de15237da583 100644 --- a/lib/libnvpair/libnvpair.abi +++ b/lib/libnvpair/libnvpair.abi @@ -2794,7 +2794,7 @@ - + diff --git a/lib/libuutil/libuutil.abi b/lib/libuutil/libuutil.abi index bf13d62e2f04..86220b44b229 100644 --- a/lib/libuutil/libuutil.abi +++ b/lib/libuutil/libuutil.abi @@ -5,8 +5,6 @@ - - @@ -366,7 +364,7 @@ - + @@ -430,6 +428,11 @@ + + + + + @@ -450,7 +453,7 @@ - + @@ -475,11 +478,6 @@ - - - - - @@ -552,6 +550,11 @@ + + + + + @@ -572,7 +575,7 @@ - + @@ -597,11 +600,6 @@ - - - - - @@ -642,6 +640,12 @@ + + + + + + @@ -666,12 +670,6 @@ - - - - - - @@ -858,6 +856,9 @@ + + + @@ -892,18 +893,66 @@ - - + - + - + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -972,16 +1021,16 @@ - + - + - + - + @@ -993,65 +1042,6 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -1062,11 +1052,16 @@ + + + + + @@ -1122,6 +1117,7 @@ + @@ -1154,7 +1150,6 @@ - @@ -1307,15 +1302,6 @@ - - - - - - - - - @@ -1342,6 +1328,15 @@ + + + + + + + + + diff --git a/lib/libzfs/libzfs.abi b/lib/libzfs/libzfs.abi index 480c6579b21c..0f08393aadc2 100644 --- a/lib/libzfs/libzfs.abi +++ b/lib/libzfs/libzfs.abi @@ -599,7 +599,7 @@ - + @@ -1858,8 +1858,8 @@ - - + + @@ -1899,7 +1899,8 @@ - + + @@ -1957,7 +1958,7 @@ - + diff --git a/lib/libzfsbootenv/libzfsbootenv.abi b/lib/libzfsbootenv/libzfsbootenv.abi index f1401a14f7a0..86ec25cf8470 100644 --- a/lib/libzfsbootenv/libzfsbootenv.abi +++ b/lib/libzfsbootenv/libzfsbootenv.abi @@ -5,8 +5,6 @@ - - @@ -20,6 +18,7 @@ + @@ -84,21 +83,16 @@ + + + + + + - - - - - - - - - - - @@ -167,16 +161,16 @@ - + - + - + - + @@ -188,19 +182,20 @@ - - - + + + + + - diff --git a/lib/libzutil/zutil_pool.c b/lib/libzutil/zutil_pool.c index 734650f3cffc..21dc1f9d9458 100644 --- a/lib/libzutil/zutil_pool.c +++ b/lib/libzutil/zutil_pool.c @@ -120,8 +120,9 @@ zpool_history_unpack(char *buf, uint64_t bytes_read, uint64_t *leftover, break; /* unpack record */ - if (nvlist_unpack(buf + sizeof (reclen), reclen, &nv, 0) != 0) - return (ENOMEM); + int err = nvlist_unpack(buf + sizeof (reclen), reclen, &nv, 0); + if (err != 0) + return (err); bytes_read -= sizeof (reclen) + reclen; buf += sizeof (reclen) + reclen; diff --git a/man/Makefile.am b/man/Makefile.am index 8ab1b757242c..64650c2b988a 100644 --- a/man/Makefile.am +++ b/man/Makefile.am @@ -15,6 +15,7 @@ dist_man_MANS = \ man4/spl.4 \ man4/zfs.4 \ \ + man7/dracut.zfs.7 \ man7/zpool-features.7 \ man7/zfsconcepts.7 \ man7/zfsprops.7 \ diff --git a/man/man4/zfs.4 b/man/man4/zfs.4 index 01e9c5de445d..a18917eb1e42 100644 --- a/man/man4/zfs.4 +++ b/man/man4/zfs.4 @@ -454,6 +454,13 @@ If we have less than this amount of free space, most ZPL operations (e.g. write, create) will return .Sy ENOSPC . . +.It Sy spa_upgrade_errlog_limit Ns = Ns Sy 0 Pq uint +Limits the number of on-disk error log entries that will be converted to the +new format when enabling the +.Sy head_errlog +feature. +The default is to convert all log entries. +. .It Sy vdev_removal_max_span Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq int During top-level vdev removal, chunks of data are copied from the vdev which may include free space in order to trade bandwidth for IOPS. @@ -975,13 +982,13 @@ log spacemap in memory, in bytes. Part of overall system memory that ZFS allows to be used for unflushed metadata changes by the log spacemap, in millionths. . -.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 262144 Po 256k Pc Pq ulong +.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 131072 Po 128k Pc Pq ulong Describes the maximum number of log spacemap blocks allowed for each pool. The default value means that the space in all the log spacemaps can add up to no more than -.Sy 262144 +.Sy 131072 blocks (which means -.Em 32GB +.Em 16GB of logical space before compression and ditto blocks, assuming that blocksize is .Em 128kB ) . @@ -1011,7 +1018,12 @@ Thus we always allow at least this many log blocks. .It Sy zfs_unflushed_log_block_pct Ns = Ns Sy 400 Ns % Pq ulong Tunable used to determine the number of blocks that can be used for the spacemap log, expressed as a percentage of the total number of -metaslabs in the pool. +unflushed metaslabs in the pool. +. +.It Sy zfs_unflushed_log_txg_max Ns = Ns Sy 1000 Pq ulong +Tunable limiting maximum time in TXGs any metaslab may remain unflushed. +It effectively limits maximum number of unflushed per-TXG spacemap logs +that need to be read after unclean pool export. . .It Sy zfs_unlink_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq uint When enabled, files will not be asynchronously removed from the list of pending diff --git a/man/man7/dracut.zfs.7 b/man/man7/dracut.zfs.7 new file mode 100644 index 000000000000..0f446fe2fe3f --- /dev/null +++ b/man/man7/dracut.zfs.7 @@ -0,0 +1,278 @@ +.\" SPDX-License-Identifier: 0BSD +.\" +.Dd April 4, 2022 +.Dt DRACUT.ZFS 7 +.Os +. +.Sh NAME +.Nm dracut.zfs +.Nd overview of ZFS dracut hooks +. +.Sh SYNOPSIS +.Bd -literal -compact + parse-zfs.sh \(-> dracut-cmdline.service + | \(da + | … + | \(da + \e\(em\(em\(em\(em\(em\(em\(em\(em\(-> dracut-initqueue.service + | zfs-import-opts.sh + zfs-load-module.service \(da | | + | | sysinit.target \(da | + \(da | | zfs-import-scan.service \(da +zfs-import-scan.service \(da \(da | zfs-import-cache.service + | zfs-import-cache.service basic.target | | + \e__________________| | \(da \(da + \(da | zfs-load-key.sh + zfs-env-bootfs.service | | + \(da \(da \(da + zfs-import.target \(-> dracut-pre-mount.service + | \(ua | + | dracut-zfs-generator | + | ____________________/| + |/ \(da + | sysroot.mount \(<-\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em dracut-zfs-generator + | | \(da | + | \(da sysroot-{usr,etc,lib,&c.}.mount | + | initrd-root-fs.target \(<-\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em or \(da + | | zfs-nonroot-necessities.service + | \(da | + \(da dracut-mount.service | + zfs-snapshot-bootfs.service | | + | \(da | + \(da … | + zfs-rollback-bootfs.service | | + | \(da | + | sysroot-usr.mount \(<-\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em\(em/ + | | + | \(da + | initrd-fs.target + \e______________________ | + \e| + \(da + export-zfs.sh initrd.target + | | + \(da \(da + dracut-shutdown.service … + | + \(da + zfs-needshutdown.sh \(-> initrd-cleanup.service +.Ed +.Pp +Compare +.Xr dracut.bootup 7 +for the full flowchart. +. +.Sh DESCRIPTION +Under dracut, booting with +.No ZFS-on- Ns Pa / +is facilitated by a number of hooks in the +.Nm 90zfs +module. +.Pp +Booting into a ZFS dataset requires +.Sy mountpoint Ns = Ns Pa / +to be set on the dataset containing the root filesystem (henceforth "the boot dataset") and at the very least either the +.Sy bootfs +property to be set to that dataset, or the +.Sy root= +kernel cmdline (or dracut drop-in) argument to specify it. +.Pp +All children of the boot dataset with +.Sy canmount Ns = Ns Sy on +with +.Sy mountpoint Ns s +matching +.Pa /etc , /bin , /lib , /lib?? , /libx32 , No and Pa /usr +globs are deemed essential and will be mounted as well. +.Pp +.Xr zfs-mount-generator 8 +is recommended for proper functioning of the system afterward (correct mount properties, remounting, &c.). +. +.Sh CMDLINE +.Ss Standard +.Bl -tag -compact -width ".Sy root=zfs:AUTO , root=zfs: , root=zfs , Op Sy root=" +.It Sy root=zfs:\& Ns Ar dataset , Sy root=ZFS= Ns Ar dataset +Use +.Ar dataset +as the boot dataset. +All pluses +.Pq Sq + +are replaced with spaces +.Pq Sq \ . +. +.It Sy root=zfs:AUTO , root=zfs:\& , root=zfs , Op Sy root= +After import, search for the first pool with the +.Sy bootfs +property set, use its value as-if specified as the +.Ar dataset +above. +. +.It Sy rootfstype=zfs root= Ns Ar dataset +Equivalent to +.Sy root=zfs:\& Ns Ar dataset . +. +.It Sy rootfstype=zfs Op Sy root= +Equivalent to +.Sy root=zfs:AUTO . +. +.It Sy rootflags= Ns Ar flags +Mount the boot dataset with +.Fl o Ar flags ; +cf.\& +.Sx Temporary Mount Point Properties +in +.Xr zfsprops 7 . +These properties will not last, since all filesystems will be re-mounted from the real root. +. +.It Sy debug +If specified, +.Nm dracut-zfs-generator +logs to the journal. +.El +.Pp +Be careful about setting neither +.Sy rootfstype=zfs +nor +.Sy root=zfs:\& Ns Ar dataset +\(em other automatic boot selection methods, like +.Nm systemd-gpt-auto-generator +and +.Nm systemd-fstab-generator +might take precedent. +. +.Ss ZFS-specific +.Bl -tag -compact -width ".Sy bootfs.snapshot Ns Op Sy = Ns Ar snapshot-name" +.It Sy bootfs.snapshot Ns Op Sy = Ns Ar snapshot-name +Execute +.Nm zfs Cm snapshot Ar boot-dataset Ns Sy @ Ns Ar snapshot-name +before pivoting to the real root. +.Ar snapshot-name +defaults to the current kernel release. +. +.It Sy bootfs.rollback Ns Op Sy = Ns Ar snapshot-name +Execute +.Nm zfs Cm snapshot Fl Rf Ar boot-dataset Ns Sy @ Ns Ar snapshot-name +before pivoting to the real root. +.Ar snapshot-name +defaults to the current kernel release. +. +.It Sy spl_hostid= Ns Ar host-id +Use +.Xr zgenhostid 8 +to set the host ID to +.Ar host-id ; +otherwise, +.Pa /etc/hostid +inherited from the real root is used. +. +.It Sy zfs_force , zfs.force , zfsforce +Appends +.Fl f +to all +.Nm zpool Cm import +invocations; primarily useful in conjunction with +.Sy spl_hostid= , +or if no host ID was inherited. +.El +. +.Sh FILES +.Bl -tag -width 0 +.It Pa parse-zfs.sh Pq Sy cmdline +Processes +.Sy spl_hostid= . +If +.Sy root= +matches a known pattern, above, provides +.Pa /dev/root +and delays the initqueue until +.Xr zfs 4 +is loaded, +. +.It Pa zfs-import-opts.sh Pq Nm systemd No environment generator +Turns +.Sy zfs_force , zfs.force , No or Sy zfsforce +into +.Ev ZPOOL_IMPORT_OPTS Ns = Ns Fl f +for +.Pa zfs-import-scan.service +or +.Pa zfs-import-cache.service . +. +.It Pa zfs-load-key.sh Pq Sy pre-mount +Loads encryption keys for the boot dataset and its essential descendants. +.Bl -tag -compact -offset 4n -width ".Sy keylocation Ns = Ns Sy https:// Ns Ar URL , Sy keylocation Ns = Ns Sy http:// Ns Ar URL" +.It Sy keylocation Ns = Ns Sy prompt +Is prompted for via +.Nm systemd-ask-password +thrice. +. +.It Sy keylocation Ns = Ns Sy https:// Ns Ar URL , Sy keylocation Ns = Ns Sy http:// Ns Ar URL +.Pa network-online.target +is started before loading. +. +.It Sy keylocation Ns = Ns Sy file:// Ns Ar path +If +.Ar path +doesn't exist, +.Nm udevadm No is Cm settle Ns d . +If it still doesn't, it's waited for for up to +.Sy 10 Ns s . +.El +. +.It Pa zfs-env-bootfs.service Pq Nm systemd No service +After pool import, sets +.Ev BOOTFS Ns = +in the systemd environment to the first non-null +.Sy bootfs +value in iteration order. +. +.It Pa dracut-zfs-generator Pq Nm systemd No generator +Generates +.Pa sysroot.mount Pq using Sy rootflags= , No if any . +If an explicit boot dataset was specified, also generates essential mountpoints +.Pq Pa sysroot-etc.mount , sysroot-bin.mount , No &c.\& , +otherwise generates +.Pa zfs-nonroot-necessities.service +which mounts them explicitly after +.Pa /sysroot +using +.Ev BOOTFS Ns = . +. +.It Pa zfs-snapshot-bootfs.service , zfs-rollback-bootfs.service Pq Nm systemd No services +Consume +.Sy bootfs.snapshot +and +.Sy bootfs.rollback +as described in +.Sx CMDLINE . +Use +.Ev BOOTFS Ns = +if no explicit boot dataset was specified. +. +.It Pa zfs-needshutdown.sh Pq Sy cleanup +If any pools were imported, signals that shutdown hooks are required. +. +.It Pa export-zfs.sh Pq Sy shutdown +Forcibly exports all pools. +. +.It Pa /etc/hostid , /etc/zfs/zpool.cache , /etc/zfs/vdev_id.conf Pq regular files +Included verbatim, hostonly. +. +.It Pa mount-zfs.sh Pq Sy mount +Does nothing on +.Nm systemd +systems +.Pq if Pa dracut-zfs-generator No succeeded . +Otherwise, loads encryption key for the boot dataset from the console or via plymouth. +It may not work at all! +.El +. +.Sh SEE ALSO +.Xr dracut.bootup 7 , +.Xr zfsprops 7 , +.Xr zpoolprops 7 , +.Xr dracut-shutdown.service 8 , +.Xr systemd-fstab-generator 8 , +.Xr systemd-gpt-auto-generator 8 , +.Xr zfs-mount-generator 8 , +.Xr zgenhostid 8 diff --git a/man/man7/zfsprops.7 b/man/man7/zfsprops.7 index 178f9f45c511..8fdeddb6262c 100644 --- a/man/man7/zfsprops.7 +++ b/man/man7/zfsprops.7 @@ -388,7 +388,7 @@ privilege with can access everyone's usage. .Pp The -.Sy userused Ns @ Ns Ar ... +.Sy userused Ns @ Ns Ar … properties are not displayed by .Nm zfs Cm get Sy all . The user's name must be appended after the @@ -872,14 +872,17 @@ This is done using .Sy zstd-fast- Ns Ar N , where .Ar N -is an integer in [1-9,10,20,30,...,100,500,1000] which maps to a negative +is an integer in +.Bq Sy 1 Ns - Ns Sy 10 , 20 , 30 , No … , Sy 100 , 500 , 1000 +which maps to a negative .Sy zstd level. -The lower the level the faster the compression - -.Ar 1000 No provides the fastest compression and lowest compression ratio. +The lower the level the faster the compression \(em +.Sy 1000 +provides the fastest compression and lowest compression ratio. .Sy zstd-fast is equivalent to -.Sy zstd-fast-1 . +.Sy zstd-fast- Ns Ar 1 . .Pp The .Sy zle @@ -1319,7 +1322,7 @@ can get and set everyone's quota. This property is not available on volumes, on file systems before version 4, or on pools before version 15. The -.Sy userquota@ Ns Ar ... +.Sy userquota@ Ns Ar … properties are not displayed by .Nm zfs Cm get Sy all . The user's name must be appended after the diff --git a/man/man7/zpool-features.7 b/man/man7/zpool-features.7 index 9a202ca8a596..705bd5433fc2 100644 --- a/man/man7/zpool-features.7 +++ b/man/man7/zpool-features.7 @@ -507,6 +507,17 @@ once either of the limit properties has been set on a dataset and will never return to being .Sy enabled . . +.feature com.delphix head_errlog no +This feature enables the upgraded version of errlog, which required an on-disk +error log format change. +Now the error log of each head dataset is stored separately in the zap object +and keyed by the head id. +With this feature enabled, every dataset affected by an error block is listed +in the output of +.Nm zpool Cm status . +.Pp +\*[instant-never] +. .feature com.delphix hole_birth no enabled_txg This feature has/had bugs, the result of which is that, if you do a .Nm zfs Cm send Fl i diff --git a/man/man8/zed.8.in b/man/man8/zed.8.in index d3297605206e..6c51f10695cc 100644 --- a/man/man8/zed.8.in +++ b/man/man8/zed.8.in @@ -75,7 +75,7 @@ Custom .Ev $PATH for zedlets to use. Normally zedlets run in a locked-down environment, with hardcoded paths to the ZFS commands -.Pq Ev $ZFS , $ZPOOL , $ZED , ... , +.Pq Ev $ZFS , $ZPOOL , $ZED , … , and a hard-coded .Ev $PATH . This is done for security reasons. diff --git a/man/man8/zfs-allow.8 b/man/man8/zfs-allow.8 index f949a0a5eee5..52b7c43f44ba 100644 --- a/man/man8/zfs-allow.8 +++ b/man/man8/zfs-allow.8 @@ -215,19 +215,19 @@ send subcommand share subcommand Allows sharing file systems over NFS or SMB protocols snapshot subcommand Must also have the \fBmount\fR ability -groupquota other Allows accessing any \fBgroupquota@\fI...\fR property -groupobjquota other Allows accessing any \fBgroupobjquota@\fI...\fR property -groupused other Allows reading any \fBgroupused@\fI...\fR property -groupobjused other Allows reading any \fBgroupobjused@\fI...\fR property +groupquota other Allows accessing any \fBgroupquota@\fI…\fR property +groupobjquota other Allows accessing any \fBgroupobjquota@\fI…\fR property +groupused other Allows reading any \fBgroupused@\fI…\fR property +groupobjused other Allows reading any \fBgroupobjused@\fI…\fR property userprop other Allows changing any user property -userquota other Allows accessing any \fBuserquota@\fI...\fR property -userobjquota other Allows accessing any \fBuserobjquota@\fI...\fR property -userused other Allows reading any \fBuserused@\fI...\fR property -userobjused other Allows reading any \fBuserobjused@\fI...\fR property -projectobjquota other Allows accessing any \fBprojectobjquota@\fI...\fR property -projectquota other Allows accessing any \fBprojectquota@\fI...\fR property -projectobjused other Allows reading any \fBprojectobjused@\fI...\fR property -projectused other Allows reading any \fBprojectused@\fI...\fR property +userquota other Allows accessing any \fBuserquota@\fI…\fR property +userobjquota other Allows accessing any \fBuserobjquota@\fI…\fR property +userused other Allows reading any \fBuserused@\fI…\fR property +userobjused other Allows reading any \fBuserobjused@\fI…\fR property +projectobjquota other Allows accessing any \fBprojectobjquota@\fI…\fR property +projectquota other Allows accessing any \fBprojectquota@\fI…\fR property +projectobjused other Allows reading any \fBprojectobjused@\fI…\fR property +projectused other Allows reading any \fBprojectused@\fI…\fR property aclinherit property aclmode property diff --git a/man/man8/zfs-program.8 b/man/man8/zfs-program.8 index 4a9718cdcfcb..100cba03ee51 100644 --- a/man/man8/zfs-program.8 +++ b/man/man8/zfs-program.8 @@ -96,16 +96,17 @@ argv = args["argv"] -- argv == {1="arg1", 2="arg2", ...} .Ed .Pp -If invoked from the libZFS interface, an arbitrary argument list can be +If invoked from the libzfs interface, an arbitrary argument list can be passed to the channel program, which is accessible via the same -"..." syntax in Lua: +.Qq Li ... +syntax in Lua: .Bd -literal -compact -offset indent args = ... -- args == {"foo"="bar", "baz"={...}, ...} .Ed .Pp Note that because Lua arrays are 1-indexed, arrays passed to Lua from the -libZFS interface will have their indices incremented by 1. +libzfs interface will have their indices incremented by 1. That is, the element in .Va arr[0] @@ -166,7 +167,7 @@ See the section below for function-specific details on error return codes. . .Ss Lua to C Value Conversion -When invoking a channel program via the libZFS interface, it is necessary to +When invoking a channel program via the libzfs interface, it is necessary to translate arguments and return values from Lua values to their C equivalents, and vice-versa. .Pp diff --git a/man/man8/zfs-send.8 b/man/man8/zfs-send.8 index 67e94ca85bc7..70367f19d7b7 100644 --- a/man/man8/zfs-send.8 +++ b/man/man8/zfs-send.8 @@ -39,8 +39,8 @@ .Sh SYNOPSIS .Nm zfs .Cm send -.Op Fl DLPRbcehnpsvw -.Op Fl X Ar dataset Ns Oo , Ns Ar dataset Oc Ns ... +.Op Fl DLPbcehnpsvw +.Op Fl R Op Fl X Ar dataset Ns Oo , Ns Ar dataset Oc Ns … .Op Oo Fl I Ns | Ns Fl i Oc Ar snapshot .Ar snapshot .Nm zfs @@ -73,8 +73,8 @@ .It Xo .Nm zfs .Cm send -.Op Fl DLPRbcehnpvw -.Op Fl X Ar dataset Ns Oo , Ns Ar dataset Oc Ns ... +.Op Fl DLPbcehnpsvw +.Op Fl R Op Fl X Ar dataset Ns Oo , Ns Ar dataset Oc Ns … .Op Oo Fl I Ns | Ns Fl i Oc Ar snapshot .Ar snapshot .Xc @@ -142,23 +142,16 @@ If the flag is used to send encrypted datasets, then .Fl w must also be specified. -.It Fl X , -exclude Ar dataset Ns Oo , Ns Ar dataset Oc Ns ... -When the -.Fl R -flag is given, -.Fl X -can be used to specify a list of datasets to be excluded from the -data stream. -The +.It Fl X , -exclude Ar dataset Ns Oo , Ns Ar dataset Oc Ns … +With +.Fl R , .Fl X -option can be used multiple times, or the list of datasets can be -specified as a comma-separated list, or both. -.Ar dataset -must not be the pool's root dataset, and all descendant datasets of -.Ar dataset -will be excluded from the send stream. -Requires -.Fl R . +specifies a set of datasets (and, hence, their descendants), +to be excluded from the send stream. +The root dataset may not be excluded. +.Fl X Ar a Fl X Ar b +is equivalent to +.Fl X Ar a , Ns Ar b . .It Fl e , -embed Generate a more compact stream by using .Sy WRITE_EMBEDDED diff --git a/man/man8/zfs-set.8 b/man/man8/zfs-set.8 index 6092e49dfbcf..9d0c437df217 100644 --- a/man/man8/zfs-set.8 +++ b/man/man8/zfs-set.8 @@ -170,8 +170,9 @@ inherited. .It Fl r Recursively inherit the given property for all children. .It Fl S -Revert the property to the received value if one exists; otherwise operate as -if the +Revert the property to the received value, if one exists; +otherwise, for non-inheritable properties, to the default; +otherwise, operate as if the .Fl S option was not specified. .El diff --git a/module/Kbuild.in b/module/Kbuild.in index 1507965c5750..75e7a0688e98 100644 --- a/module/Kbuild.in +++ b/module/Kbuild.in @@ -1,20 +1,6 @@ # When integrated in to a monolithic kernel the spl module must appear # first. This ensures its module initialization function is run before # any of the other module initialization functions which depend on it. -ZFS_MODULES += spl/ -ZFS_MODULES += avl/ -ZFS_MODULES += icp/ -ZFS_MODULES += lua/ -ZFS_MODULES += nvpair/ -ZFS_MODULES += unicode/ -ZFS_MODULES += zcommon/ -ZFS_MODULES += zfs/ -ZFS_MODULES += zstd/ - -# The rest is only relevant when run by kbuild -ifneq ($(KERNELRELEASE),) - -obj-$(CONFIG_ZFS) := $(ZFS_MODULES) ZFS_MODULE_CFLAGS += -std=gnu99 -Wno-declaration-after-statement ZFS_MODULE_CFLAGS += -Wmissing-prototypes @@ -22,10 +8,16 @@ ZFS_MODULE_CFLAGS += @KERNEL_DEBUG_CFLAGS@ @NO_FORMAT_ZERO_LENGTH@ ifneq ($(KBUILD_EXTMOD),) zfs_include = @abs_top_srcdir@/include +icp_include = @abs_srcdir@/icp/include +zstd_include = @abs_srcdir@/zstd/include ZFS_MODULE_CFLAGS += -include @abs_top_builddir@/zfs_config.h ZFS_MODULE_CFLAGS += -I@abs_top_builddir@/include +src = @abs_srcdir@ +obj = @abs_builddir@ else zfs_include = $(srctree)/include/zfs +icp_include = $(srctree)/$(src)/icp/include +zstd_include = $(srctree)/$(src)/zstd/include ZFS_MODULE_CFLAGS += -include $(zfs_include)/zfs_config.h endif @@ -41,7 +33,406 @@ ifneq ($(KBUILD_EXTMOD),) @CONFIG_QAT_TRUE@KBUILD_EXTRA_SYMBOLS += @QAT_SYMBOLS@ endif -subdir-asflags-y := $(ZFS_MODULE_CFLAGS) $(ZFS_MODULE_CPPFLAGS) -subdir-ccflags-y := $(ZFS_MODULE_CFLAGS) $(ZFS_MODULE_CPPFLAGS) +asflags-y := $(ZFS_MODULE_CFLAGS) $(ZFS_MODULE_CPPFLAGS) +ccflags-y := $(ZFS_MODULE_CFLAGS) $(ZFS_MODULE_CPPFLAGS) + +# Suppress unused-value warnings in sparc64 architecture headers +ccflags-$(CONFIG_SPARC64) += -Wno-unused-value + + +obj-$(CONFIG_ZFS) := spl.o zfs.o + +SPL_OBJS := \ + spl-atomic.o \ + spl-condvar.o \ + spl-cred.o \ + spl-err.o \ + spl-generic.o \ + spl-kmem-cache.o \ + spl-kmem.o \ + spl-kstat.o \ + spl-proc.o \ + spl-procfs-list.o \ + spl-taskq.o \ + spl-thread.o \ + spl-trace.o \ + spl-tsd.o \ + spl-vmem.o \ + spl-xdr.o \ + spl-zlib.o + +spl-objs += $(addprefix os/linux/spl/,$(SPL_OBJS)) + +zfs-objs += avl/avl.o + +ICP_OBJS := \ + algs/aes/aes_impl.o \ + algs/aes/aes_impl_generic.o \ + algs/aes/aes_modes.o \ + algs/edonr/edonr.o \ + algs/modes/cbc.o \ + algs/modes/ccm.o \ + algs/modes/ctr.o \ + algs/modes/ecb.o \ + algs/modes/gcm.o \ + algs/modes/gcm_generic.o \ + algs/modes/modes.o \ + algs/sha2/sha2.o \ + algs/skein/skein.o \ + algs/skein/skein_block.o \ + algs/skein/skein_iv.o \ + api/kcf_cipher.o \ + api/kcf_ctxops.o \ + api/kcf_mac.o \ + core/kcf_callprov.o \ + core/kcf_mech_tabs.o \ + core/kcf_prov_lib.o \ + core/kcf_prov_tabs.o \ + core/kcf_sched.o \ + illumos-crypto.o \ + io/aes.o \ + io/sha2_mod.o \ + io/skein_mod.o \ + spi/kcf_spi.o + +ICP_OBJS_X86_64 := \ + asm-x86_64/aes/aes_aesni.o \ + asm-x86_64/aes/aes_amd64.o \ + asm-x86_64/aes/aeskey.o \ + asm-x86_64/modes/aesni-gcm-x86_64.o \ + asm-x86_64/modes/gcm_pclmulqdq.o \ + asm-x86_64/modes/ghash-x86_64.o \ + asm-x86_64/sha2/sha256_impl.o \ + asm-x86_64/sha2/sha512_impl.o + +ICP_OBJS_X86 := \ + algs/aes/aes_impl_aesni.o \ + algs/aes/aes_impl_x86-64.o \ + algs/modes/gcm_pclmulqdq.o + +zfs-objs += $(addprefix icp/,$(ICP_OBJS)) +zfs-$(CONFIG_X86) += $(addprefix icp/,$(ICP_OBJS_X86)) +zfs-$(CONFIG_X86_64) += $(addprefix icp/,$(ICP_OBJS_X86_64)) + +$(addprefix $(obj)/icp/,$(ICP_OBJS) $(ICP_OBJS_X86) $(ICP_OBJS_X86_64)) : asflags-y += -I$(icp_include) +$(addprefix $(obj)/icp/,$(ICP_OBJS) $(ICP_OBJS_X86) $(ICP_OBJS_X86_64)) : ccflags-y += -I$(icp_include) + +# Suppress objtool "can't find jump dest instruction at" warnings. They +# are caused by the constants which are defined in the text section of the +# assembly file using .byte instructions (e.g. bswap_mask). The objtool +# utility tries to interpret them as opcodes and obviously fails doing so. +OBJECT_FILES_NON_STANDARD_aesni-gcm-x86_64.o := y +OBJECT_FILES_NON_STANDARD_ghash-x86_64.o := y +# Suppress objtool "unsupported stack pointer realignment" warnings. We are +# not using a DRAP register while aligning the stack to a 64 byte boundary. +# See #6950 for the reasoning. +OBJECT_FILES_NON_STANDARD_sha256_impl.o := y +OBJECT_FILES_NON_STANDARD_sha512_impl.o := y + + +LUA_OBJS := \ + lapi.o \ + lauxlib.o \ + lbaselib.o \ + lcode.o \ + lcompat.o \ + lcorolib.o \ + lctype.o \ + ldebug.o \ + ldo.o \ + lfunc.o \ + lgc.o \ + llex.o \ + lmem.o \ + lobject.o \ + lopcodes.o \ + lparser.o \ + lstate.o \ + lstring.o \ + lstrlib.o \ + ltable.o \ + ltablib.o \ + ltm.o \ + lvm.o \ + lzio.o \ + setjmp/setjmp.o + +zfs-objs += $(addprefix lua/,$(LUA_OBJS)) + + +NVPAIR_OBJS := \ + fnvpair.o \ + nvpair.o \ + nvpair_alloc_fixed.o \ + nvpair_alloc_spl.o + +zfs-objs += $(addprefix nvpair/,$(NVPAIR_OBJS)) + + +UNICODE_OBJS := \ + u8_textprep.o \ + uconv.o + +zfs-objs += $(addprefix unicode/,$(UNICODE_OBJS)) + + +ZCOMMON_OBJS := \ + cityhash.o \ + zfeature_common.o \ + zfs_comutil.o \ + zfs_deleg.o \ + zfs_fletcher.o \ + zfs_fletcher_superscalar.o \ + zfs_fletcher_superscalar4.o \ + zfs_namecheck.o \ + zfs_prop.o \ + zpool_prop.o \ + zprop_common.o + +ZCOMMON_OBJS_X86 := \ + zfs_fletcher_avx512.o \ + zfs_fletcher_intel.o \ + zfs_fletcher_sse.o + +ZCOMMON_OBJS_ARM64 := \ + zfs_fletcher_aarch64_neon.o + +zfs-objs += $(addprefix zcommon/,$(ZCOMMON_OBJS)) +zfs-$(CONFIG_X86) += $(addprefix zcommon/,$(ZCOMMON_OBJS_X86)) +zfs-$(CONFIG_ARM64) += $(addprefix zcommon/,$(ZCOMMON_OBJS_ARM64)) + + +# Zstd uses -O3 by default, so we should follow +ZFS_ZSTD_FLAGS := -O3 + +# -fno-tree-vectorize gets set for gcc in zstd/common/compiler.h +# Set it for other compilers, too. +ZFS_ZSTD_FLAGS += -fno-tree-vectorize + +# SSE register return with SSE disabled if -march=znverX is passed +ZFS_ZSTD_FLAGS += -U__BMI__ + +# Quiet warnings about frame size due to unused code in unmodified zstd lib +ZFS_ZSTD_FLAGS += -Wframe-larger-than=20480 + +ZSTD_OBJS := \ + zfs_zstd.o \ + zstd_sparc.o + +ZSTD_UPSTREAM_OBJS := \ + lib/common/entropy_common.o \ + lib/common/error_private.o \ + lib/common/fse_decompress.o \ + lib/common/pool.o \ + lib/common/zstd_common.o \ + lib/compress/fse_compress.o \ + lib/compress/hist.o \ + lib/compress/huf_compress.o \ + lib/compress/zstd_compress.o \ + lib/compress/zstd_compress_literals.o \ + lib/compress/zstd_compress_sequences.o \ + lib/compress/zstd_compress_superblock.o \ + lib/compress/zstd_double_fast.o \ + lib/compress/zstd_fast.o \ + lib/compress/zstd_lazy.o \ + lib/compress/zstd_ldm.o \ + lib/compress/zstd_opt.o \ + lib/decompress/huf_decompress.o \ + lib/decompress/zstd_ddict.o \ + lib/decompress/zstd_decompress.o \ + lib/decompress/zstd_decompress_block.o + +zfs-objs += $(addprefix zstd/,$(ZSTD_OBJS) $(ZSTD_UPSTREAM_OBJS)) + +# Disable aarch64 neon SIMD instructions for kernel mode +$(addprefix $(obj)/zstd/,$(ZSTD_OBJS) $(ZSTD_UPSTREAM_OBJS)) : ccflags-y += -I$(zstd_include) $(ZFS_ZSTD_FLAGS) +$(addprefix $(obj)/zstd/,$(ZSTD_OBJS) $(ZSTD_UPSTREAM_OBJS)) : asflags-y += -I$(zstd_include) +$(addprefix $(obj)/zstd/,$(ZSTD_UPSTREAM_OBJS)) : ccflags-y += -include $(zstd_include)/aarch64_compat.h -include $(zstd_include)/zstd_compat_wrapper.h -Wp,-w +$(obj)/zstd/zfs_zstd.o : ccflags-y += -include $(zstd_include)/zstd_compat_wrapper.h + + +ZFS_OBJS := \ + abd.o \ + aggsum.o \ + arc.o \ + blkptr.o \ + bplist.o \ + bpobj.o \ + bptree.o \ + bqueue.o \ + btree.o \ + dataset_kstats.o \ + dbuf.o \ + dbuf_stats.o \ + ddt.o \ + ddt_zap.o \ + dmu.o \ + dmu_diff.o \ + dmu_object.o \ + dmu_objset.o \ + dmu_recv.o \ + dmu_redact.o \ + dmu_send.o \ + dmu_traverse.o \ + dmu_tx.o \ + dmu_zfetch.o \ + dnode.o \ + dnode_sync.o \ + dsl_bookmark.o \ + dsl_crypt.o \ + dsl_dataset.o \ + dsl_deadlist.o \ + dsl_deleg.o \ + dsl_destroy.o \ + dsl_dir.o \ + dsl_pool.o \ + dsl_prop.o \ + dsl_scan.o \ + dsl_synctask.o \ + dsl_userhold.o \ + edonr_zfs.o \ + fm.o \ + gzip.o \ + hkdf.o \ + lz4.o \ + lz4_zfs.o \ + lzjb.o \ + metaslab.o \ + mmp.o \ + multilist.o \ + objlist.o \ + pathname.o \ + range_tree.o \ + refcount.o \ + rrwlock.o \ + sa.o \ + sha256.o \ + skein_zfs.o \ + spa.o \ + spa_boot.o \ + spa_checkpoint.o \ + spa_config.o \ + spa_errlog.o \ + spa_history.o \ + spa_log_spacemap.o \ + spa_misc.o \ + spa_stats.o \ + space_map.o \ + space_reftree.o \ + txg.o \ + uberblock.o \ + unique.o \ + vdev.o \ + vdev_cache.o \ + vdev_draid.o \ + vdev_draid_rand.o \ + vdev_indirect.o \ + vdev_indirect_births.o \ + vdev_indirect_mapping.o \ + vdev_initialize.o \ + vdev_label.o \ + vdev_mirror.o \ + vdev_missing.o \ + vdev_queue.o \ + vdev_raidz.o \ + vdev_raidz_math.o \ + vdev_raidz_math_scalar.o \ + vdev_rebuild.o \ + vdev_removal.o \ + vdev_root.o \ + vdev_trim.o \ + zap.o \ + zap_leaf.o \ + zap_micro.o \ + zcp.o \ + zcp_get.o \ + zcp_global.o \ + zcp_iter.o \ + zcp_set.o \ + zcp_synctask.o \ + zfeature.o \ + zfs_byteswap.o \ + zfs_fm.o \ + zfs_fuid.o \ + zfs_ioctl.o \ + zfs_log.o \ + zfs_onexit.o \ + zfs_quota.o \ + zfs_ratelimit.o \ + zfs_replay.o \ + zfs_rlock.o \ + zfs_sa.o \ + zfs_vnops.o \ + zil.o \ + zio.o \ + zio_checksum.o \ + zio_compress.o \ + zio_inject.o \ + zle.o \ + zrlock.o \ + zthr.o \ + zvol.o + +ZFS_OBJS_OS := \ + abd_os.o \ + arc_os.o \ + mmp_os.o \ + policy.o \ + qat.o \ + qat_compress.o \ + qat_crypt.o \ + spa_misc_os.o \ + trace.o \ + vdev_disk.o \ + vdev_file.o \ + vdev_object_store.o \ + sock.o \ + zfs_acl.o \ + zfs_ctldir.o \ + zfs_debug.o \ + zfs_dir.o \ + zfs_file_os.o \ + zfs_ioctl_os.o \ + zfs_racct.o \ + zfs_sysfs.o \ + zfs_uio.o \ + zfs_vfsops.o \ + zfs_vnops_os.o \ + zfs_znode.o \ + zio_crypt.o \ + zpl_ctldir.o \ + zpl_export.o \ + zpl_file.o \ + zpl_inode.o \ + zpl_super.o \ + zpl_xattr.o \ + zvol_os.o + +ZFS_OBJS_X86 := \ + vdev_raidz_math_avx2.o \ + vdev_raidz_math_avx512bw.o \ + vdev_raidz_math_avx512f.o \ + vdev_raidz_math_sse2.o \ + vdev_raidz_math_ssse3.o + +ZFS_OBJS_ARM64 := \ + vdev_raidz_math_aarch64_neon.o \ + vdev_raidz_math_aarch64_neonx2.o + +ZFS_OBJS_PPC_PPC64 := \ + vdev_raidz_math_powerpc_altivec.o + +zfs-objs += $(addprefix zfs/,$(ZFS_OBJS)) $(addprefix os/linux/zfs/,$(ZFS_OBJS_OS)) +zfs-$(CONFIG_X86) += $(addprefix zfs/,$(ZFS_OBJS_X86)) +zfs-$(CONFIG_ARM64) += $(addprefix zfs/,$(ZFS_OBJS_ARM64)) +zfs-$(CONFIG_PPC) += $(addprefix zfs/,$(ZFS_OBJS_PPC_PPC64)) +zfs-$(CONFIG_PPC64) += $(addprefix zfs/,$(ZFS_OBJS_PPC_PPC64)) + +# Suppress incorrect warnings from versions of objtool which are not +# aware of x86 EVEX prefix instructions used for AVX512. +OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512bw.o := y +OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512f.o := y +ifeq ($(CONFIG_ALTIVEC),y) +$(obj)/zfs/vdev_raidz_math_powerpc_altivec.o : c_flags += -maltivec endif diff --git a/module/Makefile.in b/module/Makefile.in index e23fa9f92892..a75602f1641d 100644 --- a/module/Makefile.in +++ b/module/Makefile.in @@ -5,8 +5,6 @@ LINUX_MOD_DIR ?= /lib/modules/@LINUX_VERSION@ LINUX_DEBUG_MOD_DIR ?= /usr/lib/debug/$(LINUX_MOD_DIR) INSTALL_MOD_PATH ?= $(DESTDIR) -SUBDIR_TARGETS = icp lua zstd - all: modules distclean maintainer-clean: clean install: modules_install @@ -53,7 +51,8 @@ endif FMAKE = env -u MAKEFLAGS make $(FMAKEFLAGS) modules-Linux: - list='$(SUBDIR_TARGETS)'; for td in $$list; do $(MAKE) -C $$td; done + mkdir -p $(sort $(dir $(spl-objs) $(spl-))) + mkdir -p $(sort $(dir $(zfs-objs) $(zfs-))) $(MAKE) -C @LINUX_OBJ@ $(if @KERNEL_CC@,CC=@KERNEL_CC@) \ $(if @KERNEL_LD@,LD=@KERNEL_LD@) $(if @KERNEL_LLVM@,LLVM=@KERNEL_LLVM@) \ M="$$PWD" @KERNEL_MAKE@ CONFIG_ZFS=m modules @@ -79,16 +78,20 @@ clean-FreeBSD: clean: clean-@ac_system@ -modules_install-Linux: +.PHONY: modules_uninstall-Linux-legacy +modules_uninstall-Linux-legacy: + $(RM) -r $(addprefix $(KMODDIR)/$(INSTALL_MOD_DIR)/,spl/ avl/ icp/ lua/ nvpair/ unicode/ zcommon/ zfs/ zstd/) + +KMODDIR := $(INSTALL_MOD_PATH)/lib/modules/@LINUX_VERSION@ +modules_install-Linux: modules_uninstall-Linux-legacy @# Install the kernel modules $(MAKE) -C @LINUX_OBJ@ M="$$PWD" modules_install \ INSTALL_MOD_PATH=$(INSTALL_MOD_PATH) \ INSTALL_MOD_DIR=$(INSTALL_MOD_DIR) \ KERNELRELEASE=@LINUX_VERSION@ @# Remove extraneous build products when packaging - kmoddir=$(INSTALL_MOD_PATH)/lib/modules/@LINUX_VERSION@; \ if [ -n "$(DESTDIR)" ]; then \ - find $$kmoddir -name 'modules.*' -delete; \ + find $(KMODDIR) -name 'modules.*' -delete; \ fi @# Delphix-specific: split debug info kmoddir=$(INSTALL_MOD_PATH)$(LINUX_MOD_DIR); \ @@ -118,8 +121,9 @@ modules_install-FreeBSD: modules_install: modules_install-@ac_system@ -modules_uninstall-Linux: +modules_uninstall-Linux: modules_uninstall-Linux-legacy @# Uninstall the kernel modules + @# XXX is the Delphix version needed anymore? kmoddir=$(INSTALL_MOD_PATH)/lib/modules/@LINUX_VERSION@; \ debugdir=$(INSTALL_MOD_PATH)$(LINUX_DEBUG_MOD_DIR); \ for objdir in $(ZFS_MODULES); do \ @@ -127,6 +131,8 @@ modules_uninstall-Linux: $(RM) -fR $$debugdir/$(INSTALL_MOD_DIR)/$$objdir; \ done + $(RM) $(addprefix $(KMODDIR)/$(INSTALL_MOD_DIR)/,zfs.ko spl.ko) + modules_uninstall-FreeBSD: @false @@ -147,7 +153,7 @@ cppcheck-Linux: -I @top_srcdir@/include/os/linux/spl \ -I @top_srcdir@/include/os/linux/zfs \ -I @top_srcdir@/include \ - avl icp lua nvpair spl unicode zcommon zfs zstd os/linux + avl icp lua nvpair unicode zcommon zfs zstd os/linux cppcheck-FreeBSD: @true @@ -155,9 +161,11 @@ cppcheck-FreeBSD: cppcheck: cppcheck-@ac_system@ distdir: - (cd @srcdir@ && find $(ZFS_MODULES) os -name '*.[chS]') | \ - while read path; do \ - mkdir -p $$distdir/$${path%/*}; \ - cp @srcdir@/$$path $$distdir/$$path; \ - done; \ + cd @srcdir@ && find . -name '*.[chS]' -exec sh -c 'for f; do mkdir -p $$distdir/$${f%/*}; cp @srcdir@/$$f $$distdir/$$f; done' _ {} + cp @srcdir@/Makefile.bsd $$distdir/Makefile.bsd + +gen-zstd-symbols: + for obj in $(addprefix zstd/,$(ZSTD_UPSTREAM_OBJS)); do echo; echo "/* $${obj#zstd/}: */"; @OBJDUMP@ -t $$obj | awk '$$2 == "g" && !/ zfs_/ {print "#define\t" $$6 " zfs_" $$6}' | sort; done >> zstd/include/zstd_compat_wrapper.h + +check-zstd-symbols: + @OBJDUMP@ -t $(addprefix zstd/,$(ZSTD_UPSTREAM_OBJS)) | awk '/file format/ {print} $$2 == "g" && !/ zfs_/ {++ret; print} END {exit ret}' diff --git a/module/avl/Makefile.in b/module/avl/Makefile.in deleted file mode 100644 index 991d5f95b8c0..000000000000 --- a/module/avl/Makefile.in +++ /dev/null @@ -1,10 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -endif - -MODULE := zavl - -obj-$(CONFIG_ZFS) := $(MODULE).o - -$(MODULE)-objs += avl.o diff --git a/module/avl/avl.c b/module/avl/avl.c index 3891a2d62880..69cb8bf6815b 100644 --- a/module/avl/avl.c +++ b/module/avl/avl.c @@ -1044,28 +1044,6 @@ avl_destroy_nodes(avl_tree_t *tree, void **cookie) return (AVL_NODE2DATA(node, off)); } -#if defined(_KERNEL) - -static int __init -avl_init(void) -{ - return (0); -} - -static void __exit -avl_fini(void) -{ -} - -module_init(avl_init); -module_exit(avl_fini); -#endif - -ZFS_MODULE_DESCRIPTION("Generic AVL tree implementation"); -ZFS_MODULE_AUTHOR(ZFS_META_AUTHOR); -ZFS_MODULE_LICENSE(ZFS_META_LICENSE); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); - EXPORT_SYMBOL(avl_create); EXPORT_SYMBOL(avl_find); EXPORT_SYMBOL(avl_insert); diff --git a/module/icp/Makefile.in b/module/icp/Makefile.in deleted file mode 100644 index 72c9ab12adb7..000000000000 --- a/module/icp/Makefile.in +++ /dev/null @@ -1,90 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -icp_include = $(src)/include -else -icp_include = $(srctree)/$(src)/include -endif - -MODULE := icp - -obj-$(CONFIG_ZFS) := $(MODULE).o - -asflags-y := -I$(icp_include) -ccflags-y := -I$(icp_include) - -$(MODULE)-objs += illumos-crypto.o -$(MODULE)-objs += api/kcf_cipher.o -$(MODULE)-objs += api/kcf_mac.o -$(MODULE)-objs += api/kcf_ctxops.o -$(MODULE)-objs += core/kcf_callprov.o -$(MODULE)-objs += core/kcf_prov_tabs.o -$(MODULE)-objs += core/kcf_sched.o -$(MODULE)-objs += core/kcf_mech_tabs.o -$(MODULE)-objs += core/kcf_prov_lib.o -$(MODULE)-objs += spi/kcf_spi.o -$(MODULE)-objs += io/aes.o -$(MODULE)-objs += io/sha2_mod.o -$(MODULE)-objs += io/skein_mod.o -$(MODULE)-objs += algs/modes/cbc.o -$(MODULE)-objs += algs/modes/ccm.o -$(MODULE)-objs += algs/modes/ctr.o -$(MODULE)-objs += algs/modes/ecb.o -$(MODULE)-objs += algs/modes/gcm_generic.o -$(MODULE)-objs += algs/modes/gcm.o -$(MODULE)-objs += algs/modes/modes.o -$(MODULE)-objs += algs/aes/aes_impl_generic.o -$(MODULE)-objs += algs/aes/aes_impl.o -$(MODULE)-objs += algs/aes/aes_modes.o -$(MODULE)-objs += algs/edonr/edonr.o -$(MODULE)-objs += algs/sha2/sha2.o -$(MODULE)-objs += algs/skein/skein.o -$(MODULE)-objs += algs/skein/skein_block.o -$(MODULE)-objs += algs/skein/skein_iv.o - -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/aes/aeskey.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/aes/aes_amd64.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/aes/aes_aesni.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/modes/gcm_pclmulqdq.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/modes/aesni-gcm-x86_64.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/modes/ghash-x86_64.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/sha2/sha256_impl.o -$(MODULE)-$(CONFIG_X86_64) += asm-x86_64/sha2/sha512_impl.o - -$(MODULE)-$(CONFIG_X86) += algs/modes/gcm_pclmulqdq.o -$(MODULE)-$(CONFIG_X86) += algs/aes/aes_impl_aesni.o -$(MODULE)-$(CONFIG_X86) += algs/aes/aes_impl_x86-64.o - -# Suppress objtool "can't find jump dest instruction at" warnings. They -# are caused by the constants which are defined in the text section of the -# assembly file using .byte instructions (e.g. bswap_mask). The objtool -# utility tries to interpret them as opcodes and obviously fails doing so. -OBJECT_FILES_NON_STANDARD_aesni-gcm-x86_64.o := y -OBJECT_FILES_NON_STANDARD_ghash-x86_64.o := y -# Suppress objtool "unsupported stack pointer realignment" warnings. We are -# not using a DRAP register while aligning the stack to a 64 byte boundary. -# See #6950 for the reasoning. -OBJECT_FILES_NON_STANDARD_sha256_impl.o := y -OBJECT_FILES_NON_STANDARD_sha512_impl.o := y - -ICP_DIRS = \ - api \ - core \ - spi \ - io \ - os \ - algs \ - algs/aes \ - algs/edonr \ - algs/modes \ - algs/sha2 \ - algs/skein \ - asm-x86_64 \ - asm-x86_64/aes \ - asm-x86_64/modes \ - asm-x86_64/sha2 \ - asm-i386 \ - asm-generic - -all: - mkdir -p $(ICP_DIRS) diff --git a/module/icp/illumos-crypto.c b/module/icp/illumos-crypto.c index f68f6bc765a2..d17b90e7200a 100644 --- a/module/icp/illumos-crypto.c +++ b/module/icp/illumos-crypto.c @@ -104,7 +104,7 @@ * ZFS Makefiles. */ -void __exit +void icp_fini(void) { skein_mod_fini(); @@ -139,10 +139,7 @@ icp_init(void) return (0); } -#if defined(_KERNEL) +#if defined(_KERNEL) && defined(__FreeBSD__) module_exit(icp_fini); module_init(icp_init); -MODULE_AUTHOR(ZFS_META_AUTHOR); -MODULE_LICENSE(ZFS_META_LICENSE); -MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); #endif diff --git a/module/lua/Makefile.in b/module/lua/Makefile.in deleted file mode 100644 index 0a74c17e64e8..000000000000 --- a/module/lua/Makefile.in +++ /dev/null @@ -1,39 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -endif - -MODULE := zlua - -obj-$(CONFIG_ZFS) := $(MODULE).o - -ccflags-y := -DLUA_USE_LONGLONG - -$(MODULE)-objs += lapi.o -$(MODULE)-objs += lauxlib.o -$(MODULE)-objs += lbaselib.o -$(MODULE)-objs += lcode.o -$(MODULE)-objs += lcompat.o -$(MODULE)-objs += lcorolib.o -$(MODULE)-objs += lctype.o -$(MODULE)-objs += ldebug.o -$(MODULE)-objs += ldo.o -$(MODULE)-objs += lfunc.o -$(MODULE)-objs += lgc.o -$(MODULE)-objs += llex.o -$(MODULE)-objs += lmem.o -$(MODULE)-objs += lobject.o -$(MODULE)-objs += lopcodes.o -$(MODULE)-objs += lparser.o -$(MODULE)-objs += lstate.o -$(MODULE)-objs += lstring.o -$(MODULE)-objs += lstrlib.o -$(MODULE)-objs += ltable.o -$(MODULE)-objs += ltablib.o -$(MODULE)-objs += ltm.o -$(MODULE)-objs += lvm.o -$(MODULE)-objs += lzio.o -$(MODULE)-objs += setjmp/setjmp.o - -all: - mkdir -p setjmp diff --git a/module/lua/lapi.c b/module/lua/lapi.c index 72b0037aa9a9..726e5c2ad4bb 100644 --- a/module/lua/lapi.c +++ b/module/lua/lapi.c @@ -1278,29 +1278,6 @@ LUA_API void lua_upvaluejoin (lua_State *L, int fidx1, int n1, luaC_objbarrier(L, f1, *up2); } -#if defined(_KERNEL) - -static int __init -lua_init(void) -{ - return (0); -} - -static void __exit -lua_fini(void) -{ -} - -module_init(lua_init); -module_exit(lua_fini); - -#endif - -ZFS_MODULE_DESCRIPTION("Lua Interpreter for ZFS"); -ZFS_MODULE_AUTHOR("Lua.org"); -ZFS_MODULE_LICENSE("Dual MIT/GPL"); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); - EXPORT_SYMBOL(lua_absindex); EXPORT_SYMBOL(lua_atpanic); EXPORT_SYMBOL(lua_checkstack); diff --git a/module/nvpair/Makefile.in b/module/nvpair/Makefile.in deleted file mode 100644 index d8145236674b..000000000000 --- a/module/nvpair/Makefile.in +++ /dev/null @@ -1,13 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -endif - -MODULE := znvpair - -obj-$(CONFIG_ZFS) := $(MODULE).o - -$(MODULE)-objs += nvpair.o -$(MODULE)-objs += fnvpair.o -$(MODULE)-objs += nvpair_alloc_spl.o -$(MODULE)-objs += nvpair_alloc_fixed.o diff --git a/module/nvpair/nvpair.c b/module/nvpair/nvpair.c index a5222dac7849..a442990dade0 100644 --- a/module/nvpair/nvpair.c +++ b/module/nvpair/nvpair.c @@ -3678,27 +3678,6 @@ nvs_xdr(nvstream_t *nvs, nvlist_t *nvl, char *buf, size_t *buflen) return (err); } -#if defined(_KERNEL) -static int __init -nvpair_init(void) -{ - return (0); -} - -static void __exit -nvpair_fini(void) -{ -} - -module_init(nvpair_init); -module_exit(nvpair_fini); -#endif - -ZFS_MODULE_DESCRIPTION("Generic name/value pair implementation"); -ZFS_MODULE_AUTHOR(ZFS_META_AUTHOR); -ZFS_MODULE_LICENSE(ZFS_META_LICENSE); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); - EXPORT_SYMBOL(nv_alloc_init); EXPORT_SYMBOL(nv_alloc_reset); EXPORT_SYMBOL(nv_alloc_fini); diff --git a/module/os/freebsd/zfs/vdev_geom.c b/module/os/freebsd/zfs/vdev_geom.c index 914e0e6ded66..1ac41f616a0d 100644 --- a/module/os/freebsd/zfs/vdev_geom.c +++ b/module/os/freebsd/zfs/vdev_geom.c @@ -1131,8 +1131,12 @@ vdev_geom_fill_unmap_cb(void *buf, size_t len, void *priv) vm_offset_t addr = (vm_offset_t)buf; vm_offset_t end = addr + len; - if (bp->bio_ma_n == 0) + if (bp->bio_ma_n == 0) { bp->bio_ma_offset = addr & PAGE_MASK; + addr &= ~PAGE_MASK; + } else { + ASSERT0(P2PHASE(addr, PAGE_SIZE)); + } do { bp->bio_ma[bp->bio_ma_n++] = PHYS_TO_VM_PAGE(pmap_kextract(addr)); diff --git a/module/os/linux/spl/Makefile.in b/module/os/linux/spl/Makefile.in deleted file mode 100644 index b2325f91b4a7..000000000000 --- a/module/os/linux/spl/Makefile.in +++ /dev/null @@ -1,17 +0,0 @@ -$(MODULE)-objs += ../os/linux/spl/spl-atomic.o -$(MODULE)-objs += ../os/linux/spl/spl-condvar.o -$(MODULE)-objs += ../os/linux/spl/spl-cred.o -$(MODULE)-objs += ../os/linux/spl/spl-err.o -$(MODULE)-objs += ../os/linux/spl/spl-generic.o -$(MODULE)-objs += ../os/linux/spl/spl-kmem.o -$(MODULE)-objs += ../os/linux/spl/spl-kmem-cache.o -$(MODULE)-objs += ../os/linux/spl/spl-kstat.o -$(MODULE)-objs += ../os/linux/spl/spl-proc.o -$(MODULE)-objs += ../os/linux/spl/spl-procfs-list.o -$(MODULE)-objs += ../os/linux/spl/spl-taskq.o -$(MODULE)-objs += ../os/linux/spl/spl-thread.o -$(MODULE)-objs += ../os/linux/spl/spl-trace.o -$(MODULE)-objs += ../os/linux/spl/spl-tsd.o -$(MODULE)-objs += ../os/linux/spl/spl-vmem.o -$(MODULE)-objs += ../os/linux/spl/spl-xdr.o -$(MODULE)-objs += ../os/linux/spl/spl-zlib.o diff --git a/module/os/linux/spl/spl-generic.c b/module/os/linux/spl/spl-generic.c index cc9a973fef62..143f34598588 100644 --- a/module/os/linux/spl/spl-generic.c +++ b/module/os/linux/spl/spl-generic.c @@ -828,7 +828,7 @@ spl_fini(void) module_init(spl_init); module_exit(spl_fini); -ZFS_MODULE_DESCRIPTION("Solaris Porting Layer"); -ZFS_MODULE_AUTHOR(ZFS_META_AUTHOR); -ZFS_MODULE_LICENSE("GPL"); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); +MODULE_DESCRIPTION("Solaris Porting Layer"); +MODULE_AUTHOR(ZFS_META_AUTHOR); +MODULE_LICENSE("GPL"); +MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); diff --git a/module/os/linux/zfs/Makefile.in b/module/os/linux/zfs/Makefile.in deleted file mode 100644 index 0cf7d0422bf5..000000000000 --- a/module/os/linux/zfs/Makefile.in +++ /dev/null @@ -1,40 +0,0 @@ -# -# Linux specific sources included from module/zfs/Makefile.in -# - -# Suppress unused-value warnings in sparc64 architecture headers -ccflags-$(CONFIG_SPARC64) += -Wno-unused-value - -$(MODULE)-objs += ../os/linux/zfs/abd_os.o -$(MODULE)-objs += ../os/linux/zfs/arc_os.o -$(MODULE)-objs += ../os/linux/zfs/mmp_os.o -$(MODULE)-objs += ../os/linux/zfs/policy.o -$(MODULE)-objs += ../os/linux/zfs/trace.o -$(MODULE)-objs += ../os/linux/zfs/qat.o -$(MODULE)-objs += ../os/linux/zfs/qat_compress.o -$(MODULE)-objs += ../os/linux/zfs/qat_crypt.o -$(MODULE)-objs += ../os/linux/zfs/spa_misc_os.o -$(MODULE)-objs += ../os/linux/zfs/vdev_disk.o -$(MODULE)-objs += ../os/linux/zfs/vdev_file.o -$(MODULE)-objs += ../os/linux/zfs/vdev_object_store.o -$(MODULE)-objs += ../os/linux/zfs/sock.o -$(MODULE)-objs += ../os/linux/zfs/zfs_acl.o -$(MODULE)-objs += ../os/linux/zfs/zfs_ctldir.o -$(MODULE)-objs += ../os/linux/zfs/zfs_debug.o -$(MODULE)-objs += ../os/linux/zfs/zfs_dir.o -$(MODULE)-objs += ../os/linux/zfs/zfs_file_os.o -$(MODULE)-objs += ../os/linux/zfs/zfs_ioctl_os.o -$(MODULE)-objs += ../os/linux/zfs/zfs_racct.o -$(MODULE)-objs += ../os/linux/zfs/zfs_sysfs.o -$(MODULE)-objs += ../os/linux/zfs/zfs_uio.o -$(MODULE)-objs += ../os/linux/zfs/zfs_vfsops.o -$(MODULE)-objs += ../os/linux/zfs/zfs_vnops_os.o -$(MODULE)-objs += ../os/linux/zfs/zfs_znode.o -$(MODULE)-objs += ../os/linux/zfs/zio_crypt.o -$(MODULE)-objs += ../os/linux/zfs/zpl_ctldir.o -$(MODULE)-objs += ../os/linux/zfs/zpl_export.o -$(MODULE)-objs += ../os/linux/zfs/zpl_file.o -$(MODULE)-objs += ../os/linux/zfs/zpl_inode.o -$(MODULE)-objs += ../os/linux/zfs/zpl_super.o -$(MODULE)-objs += ../os/linux/zfs/zpl_xattr.o -$(MODULE)-objs += ../os/linux/zfs/zvol_os.o diff --git a/module/os/linux/zfs/zfs_ioctl_os.c b/module/os/linux/zfs/zfs_ioctl_os.c index fee3fe540b90..c65702e1a053 100644 --- a/module/os/linux/zfs/zfs_ioctl_os.c +++ b/module/os/linux/zfs/zfs_ioctl_os.c @@ -58,6 +58,8 @@ #include #include #include +#include +#include #include @@ -233,8 +235,8 @@ zfsdev_detach(void) #define ZFS_DEBUG_STR "" #endif -static int __init -openzfs_init(void) +static int +openzfs_init_os(void) { int error; @@ -259,8 +261,8 @@ openzfs_init(void) return (0); } -static void __exit -openzfs_fini(void) +static void +openzfs_fini_os(void) { zfs_sysfs_fini(); zfs_kmod_fini(); @@ -269,12 +271,59 @@ openzfs_fini(void) ZFS_META_VERSION, ZFS_META_RELEASE, ZFS_DEBUG_STR); } + +extern int __init zcommon_init(void); +extern void zcommon_fini(void); + +static int __init +openzfs_init(void) +{ + int err; + if ((err = zcommon_init()) != 0) + goto zcommon_failed; + if ((err = icp_init()) != 0) + goto icp_failed; + if ((err = zstd_init()) != 0) + goto zstd_failed; + if ((err = openzfs_init_os()) != 0) + goto openzfs_os_failed; + return (0); + +openzfs_os_failed: + zstd_fini(); +zstd_failed: + icp_fini(); +icp_failed: + zcommon_fini(); +zcommon_failed: + return (err); +} + +static void __exit +openzfs_fini(void) +{ + openzfs_fini_os(); + zstd_fini(); + icp_fini(); + zcommon_fini(); +} + #if defined(_KERNEL) module_init(openzfs_init); module_exit(openzfs_fini); #endif -ZFS_MODULE_DESCRIPTION("ZFS"); -ZFS_MODULE_AUTHOR(ZFS_META_AUTHOR); -ZFS_MODULE_LICENSE(ZFS_META_LICENSE); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); +MODULE_ALIAS("zavl"); +MODULE_ALIAS("icp"); +MODULE_ALIAS("zlua"); +MODULE_ALIAS("znvpair"); +MODULE_ALIAS("zunicode"); +MODULE_ALIAS("zcommon"); +MODULE_ALIAS("zzstd"); +MODULE_DESCRIPTION("ZFS"); +MODULE_AUTHOR(ZFS_META_AUTHOR); +MODULE_LICENSE("Lua: MIT"); +MODULE_LICENSE("zstd: Dual BSD/GPL"); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_LICENSE(ZFS_META_LICENSE); +MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); diff --git a/module/os/linux/zfs/zfs_sysfs.c b/module/os/linux/zfs/zfs_sysfs.c index 608a8a678f30..b1d6670e504e 100644 --- a/module/os/linux/zfs/zfs_sysfs.c +++ b/module/os/linux/zfs/zfs_sysfs.c @@ -65,16 +65,15 @@ /* * A zfs_mod_kobj_t represents a zfs kobject under '/sys/module/zfs' */ -struct zfs_mod_kobj; typedef struct zfs_mod_kobj zfs_mod_kobj_t; - struct zfs_mod_kobj { struct kobject zko_kobj; struct kobj_type zko_kobj_type; struct sysfs_ops zko_sysfs_ops; size_t zko_attr_count; struct attribute *zko_attr_list; /* allocated */ - struct attribute **zko_default_attrs; /* allocated */ + struct attribute_group zko_default_group; /* .attrs allocated */ + const struct attribute_group *zko_default_groups[2]; size_t zko_child_count; zfs_mod_kobj_t *zko_children; /* allocated */ }; @@ -127,10 +126,10 @@ zfs_kobj_release(struct kobject *kobj) zkobj->zko_attr_list = NULL; } - if (zkobj->zko_default_attrs != NULL) { - kmem_free(zkobj->zko_default_attrs, + if (zkobj->zko_default_group.attrs != NULL) { + kmem_free(zkobj->zko_default_group.attrs, DEFAULT_ATTR_SIZE(zkobj->zko_attr_count)); - zkobj->zko_default_attrs = NULL; + zkobj->zko_default_group.attrs = NULL; } if (zkobj->zko_child_count != 0) { @@ -154,11 +153,12 @@ zfs_kobj_add_attr(zfs_mod_kobj_t *zkobj, int attr_num, const char *attr_name) { VERIFY3U(attr_num, <, zkobj->zko_attr_count); ASSERT(zkobj->zko_attr_list); - ASSERT(zkobj->zko_default_attrs); + ASSERT(zkobj->zko_default_group.attrs); zkobj->zko_attr_list[attr_num].name = attr_name; zkobj->zko_attr_list[attr_num].mode = 0444; - zkobj->zko_default_attrs[attr_num] = &zkobj->zko_attr_list[attr_num]; + zkobj->zko_default_group.attrs[attr_num] = + &zkobj->zko_attr_list[attr_num]; sysfs_attr_init(&zkobj->zko_attr_list[attr_num]); } @@ -176,9 +176,9 @@ zfs_kobj_init(zfs_mod_kobj_t *zkobj, int attr_cnt, int child_cnt, return (ENOMEM); } /* this will always have at least one slot for NULL termination */ - zkobj->zko_default_attrs = kmem_zalloc(DEFAULT_ATTR_SIZE(attr_cnt), - KM_SLEEP); - if (zkobj->zko_default_attrs == NULL) { + zkobj->zko_default_group.attrs = + kmem_zalloc(DEFAULT_ATTR_SIZE(attr_cnt), KM_SLEEP); + if (zkobj->zko_default_group.attrs == NULL) { if (zkobj->zko_attr_list != NULL) { kmem_free(zkobj->zko_attr_list, ATTR_TABLE_SIZE(attr_cnt)); @@ -186,14 +186,19 @@ zfs_kobj_init(zfs_mod_kobj_t *zkobj, int attr_cnt, int child_cnt, return (ENOMEM); } zkobj->zko_attr_count = attr_cnt; - zkobj->zko_kobj_type.default_attrs = zkobj->zko_default_attrs; + zkobj->zko_default_groups[0] = &zkobj->zko_default_group; +#ifdef HAVE_SYSFS_DEFAULT_GROUPS + zkobj->zko_kobj_type.default_groups = zkobj->zko_default_groups; +#else + zkobj->zko_kobj_type.default_attrs = zkobj->zko_default_group.attrs; +#endif if (child_cnt > 0) { zkobj->zko_children = kmem_zalloc(CHILD_TABLE_SIZE(child_cnt), KM_SLEEP); if (zkobj->zko_children == NULL) { - if (zkobj->zko_default_attrs != NULL) { - kmem_free(zkobj->zko_default_attrs, + if (zkobj->zko_default_group.attrs != NULL) { + kmem_free(zkobj->zko_default_group.attrs, DEFAULT_ATTR_SIZE(attr_cnt)); } if (zkobj->zko_attr_list != NULL) { @@ -215,9 +220,9 @@ zfs_kobj_init(zfs_mod_kobj_t *zkobj, int attr_cnt, int child_cnt, static int zfs_kobj_add(zfs_mod_kobj_t *zkobj, struct kobject *parent, const char *name) { - /* zko_default_attrs must be NULL terminated */ - ASSERT(zkobj->zko_default_attrs != NULL); - ASSERT(zkobj->zko_default_attrs[zkobj->zko_attr_count] == NULL); + /* zko_default_group.attrs must be NULL terminated */ + ASSERT(zkobj->zko_default_group.attrs != NULL); + ASSERT(zkobj->zko_default_group.attrs[zkobj->zko_attr_count] == NULL); kobject_init(&zkobj->zko_kobj, &zkobj->zko_kobj_type); return (kobject_add(&zkobj->zko_kobj, parent, name)); @@ -226,7 +231,7 @@ zfs_kobj_add(zfs_mod_kobj_t *zkobj, struct kobject *parent, const char *name) /* * Each zfs property has these common attributes */ -static const char *zprop_attrs[] = { +static const char *const zprop_attrs[] = { "type", "readonly", "setonce", @@ -239,7 +244,7 @@ static const char *zprop_attrs[] = { #define ZFS_PROP_ATTR_COUNT ARRAY_SIZE(zprop_attrs) #define ZPOOL_PROP_ATTR_COUNT (ZFS_PROP_ATTR_COUNT - 1) -static const char *zprop_types[] = { +static const char *const zprop_types[] = { "number", "string", "index", @@ -250,7 +255,7 @@ typedef struct zfs_type_map { const char *ztm_name; } zfs_type_map_t; -static zfs_type_map_t type_map[] = { +static const zfs_type_map_t type_map[] = { {ZFS_TYPE_FILESYSTEM, "filesystem"}, {ZFS_TYPE_SNAPSHOT, "snapshot"}, {ZFS_TYPE_VOLUME, "volume"}, @@ -371,7 +376,7 @@ pool_property_show(struct kobject *kobj, struct attribute *attr, char *buf) * A user process can easily check if the running zfs kernel module * supports the new feature. */ -static const char *zfs_kernel_features[] = { +static const char *const zfs_kernel_features[] = { /* --> Add new kernel features here */ "com.delphix:vdev_initialize", "org.zfsonlinux:vdev_trim", @@ -440,7 +445,7 @@ zfs_kernel_features_init(zfs_mod_kobj_t *zfs_kobj, struct kobject *parent) /* * Each pool feature has these common attributes */ -static const char *pool_feature_attrs[] = { +static const char *const pool_feature_attrs[] = { "description", "guid", "uname", diff --git a/module/os/linux/zfs/zfs_uio.c b/module/os/linux/zfs/zfs_uio.c index ce47b3e6087a..0d4b4c583118 100644 --- a/module/os/linux/zfs/zfs_uio.c +++ b/module/os/linux/zfs/zfs_uio.c @@ -75,6 +75,7 @@ zfs_uiomove_iov(void *p, size_t n, zfs_uio_rw_t rw, zfs_uio_t *uio) } else { unsigned long b_left = 0; if (uio->uio_fault_disable) { +#if defined(HAVE___COPY_FROM_USER_INATOMIC) if (!zfs_access_ok(VERIFY_READ, (iov->iov_base + skip), cnt)) { return (EFAULT); @@ -84,6 +85,9 @@ zfs_uiomove_iov(void *p, size_t n, zfs_uio_rw_t rw, zfs_uio_t *uio) __copy_from_user_inatomic(p, (iov->iov_base + skip), cnt); pagefault_enable(); +#else + return (EFAULT); +#endif } else { b_left = copy_from_user(p, @@ -248,7 +252,7 @@ zfs_uio_prefaultpages(ssize_t n, zfs_uio_t *uio) /* touch each page in this segment. */ p = iov->iov_base + skip; while (cnt) { - if (get_user(tmp, (uint8_t *)p)) + if (copy_from_user(&tmp, p, 1)) return (EFAULT); ulong_t incr = MIN(cnt, PAGESIZE); p += incr; @@ -256,7 +260,7 @@ zfs_uio_prefaultpages(ssize_t n, zfs_uio_t *uio) } /* touch the last byte in case it straddles a page. */ p--; - if (get_user(tmp, (uint8_t *)p)) + if (copy_from_user(&tmp, p, 1)) return (EFAULT); } } diff --git a/module/os/linux/zfs/zfs_vnops_os.c b/module/os/linux/zfs/zfs_vnops_os.c index b65728f0d4c4..2ba90d889369 100644 --- a/module/os/linux/zfs/zfs_vnops_os.c +++ b/module/os/linux/zfs/zfs_vnops_os.c @@ -3556,7 +3556,11 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc) dmu_tx_wait(tx); dmu_tx_abort(tx); +#ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO + filemap_dirty_folio(page_mapping(pp), page_folio(pp)); +#else __set_page_dirty_nobuffers(pp); +#endif ClearPageError(pp); end_page_writeback(pp); zfs_rangelock_exit(lr); diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c index e826818fea6c..34605957d13f 100644 --- a/module/os/linux/zfs/zpl_file.c +++ b/module/os/linux/zfs/zpl_file.c @@ -33,9 +33,13 @@ #include #include #include -#ifdef HAVE_VFS_SET_PAGE_DIRTY_NOBUFFERS +#if defined(HAVE_VFS_SET_PAGE_DIRTY_NOBUFFERS) || \ + defined(HAVE_VFS_FILEMAP_DIRTY_FOLIO) #include #endif +#ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO +#include +#endif /* * When using fallocate(2) to preallocate space, inflate the requested @@ -413,6 +417,8 @@ zpl_aio_write(struct kiocb *kiocb, const struct iovec *iov, if (ret) return (ret); + kiocb->ki_pos = pos; + zfs_uio_t uio; zfs_uio_iovec_init(&uio, iov, nr_segs, kiocb->ki_pos, UIO_USERSPACE, count, 0); @@ -781,11 +787,13 @@ zpl_fallocate_common(struct inode *ip, int mode, loff_t offset, loff_t len) if (mode & (test_mode)) { flock64_t bf; - if (offset > olen) - goto out_unmark; + if (mode & FALLOC_FL_KEEP_SIZE) { + if (offset > olen) + goto out_unmark; - if (offset + len > olen) - len = olen - offset; + if (offset + len > olen) + len = olen - offset; + } bf.l_type = F_WRLCK; bf.l_whence = SEEK_SET; bf.l_start = offset; @@ -1223,6 +1231,9 @@ const struct address_space_operations zpl_address_space_operations = { #ifdef HAVE_VFS_SET_PAGE_DIRTY_NOBUFFERS .set_page_dirty = __set_page_dirty_nobuffers, #endif +#ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO + .dirty_folio = filemap_dirty_folio, +#endif }; const struct file_operations zpl_file_operations = { diff --git a/module/spl/Makefile.in b/module/spl/Makefile.in deleted file mode 100644 index cedbfe92b58a..000000000000 --- a/module/spl/Makefile.in +++ /dev/null @@ -1,13 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -mfdir = $(obj) -else -mfdir = $(srctree)/$(src) -endif - -MODULE := spl - -obj-$(CONFIG_ZFS) := $(MODULE).o - -include $(mfdir)/../os/linux/spl/Makefile diff --git a/module/unicode/Makefile.in b/module/unicode/Makefile.in deleted file mode 100644 index 59c07c4555b7..000000000000 --- a/module/unicode/Makefile.in +++ /dev/null @@ -1,11 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -endif - -MODULE := zunicode - -obj-$(CONFIG_ZFS) := $(MODULE).o - -$(MODULE)-objs += u8_textprep.o -$(MODULE)-objs += uconv.o diff --git a/module/unicode/u8_textprep.c b/module/unicode/u8_textprep.c index b6b07b2453af..37d648b2172d 100644 --- a/module/unicode/u8_textprep.c +++ b/module/unicode/u8_textprep.c @@ -2129,27 +2129,6 @@ u8_textprep_str(char *inarray, size_t *inlen, char *outarray, size_t *outlen, return (ret_val); } -#if defined(_KERNEL) -static int __init -unicode_init(void) -{ - return (0); -} - -static void __exit -unicode_fini(void) -{ -} - -module_init(unicode_init); -module_exit(unicode_fini); -#endif - -ZFS_MODULE_DESCRIPTION("Unicode implementation"); -ZFS_MODULE_AUTHOR(ZFS_META_AUTHOR); -ZFS_MODULE_LICENSE(ZFS_META_LICENSE); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); - EXPORT_SYMBOL(u8_validate); EXPORT_SYMBOL(u8_strcmp); EXPORT_SYMBOL(u8_textprep_str); diff --git a/module/zcommon/Makefile.in b/module/zcommon/Makefile.in deleted file mode 100644 index ebc538440445..000000000000 --- a/module/zcommon/Makefile.in +++ /dev/null @@ -1,28 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -endif - -MODULE := zcommon - -obj-$(CONFIG_ZFS) := $(MODULE).o - -# Suppress unused-value warnings in sparc64 architecture headers -ccflags-$(CONFIG_SPARC64) += -Wno-unused-value - -$(MODULE)-objs += cityhash.o -$(MODULE)-objs += zfeature_common.o -$(MODULE)-objs += zfs_comutil.o -$(MODULE)-objs += zfs_deleg.o -$(MODULE)-objs += zfs_fletcher.o -$(MODULE)-objs += zfs_fletcher_superscalar.o -$(MODULE)-objs += zfs_fletcher_superscalar4.o -$(MODULE)-objs += zfs_namecheck.o -$(MODULE)-objs += zfs_prop.o -$(MODULE)-objs += zpool_prop.o -$(MODULE)-objs += zprop_common.o - -$(MODULE)-$(CONFIG_X86) += zfs_fletcher_intel.o -$(MODULE)-$(CONFIG_X86) += zfs_fletcher_sse.o -$(MODULE)-$(CONFIG_X86) += zfs_fletcher_avx512.o -$(MODULE)-$(CONFIG_ARM64) += zfs_fletcher_aarch64_neon.o diff --git a/module/zcommon/zfeature_common.c b/module/zcommon/zfeature_common.c index 3c2b0da2010e..e2bf42fda1c7 100644 --- a/module/zcommon/zfeature_common.c +++ b/module/zcommon/zfeature_common.c @@ -700,6 +700,7 @@ zpool_feature_init(void) ZFEATURE_FLAG_MOS, ZFEATURE_TYPE_BOOLEAN, NULL, sfeatures); { + static const spa_feature_t zilsaxattr_deps[] = { SPA_FEATURE_EXTENSIBLE_DATASET, SPA_FEATURE_NONE @@ -711,6 +712,12 @@ zpool_feature_init(void) ZFEATURE_TYPE_BOOLEAN, zilsaxattr_deps, sfeatures); } + zfeature_register(SPA_FEATURE_HEAD_ERRLOG, + "com.delphix:head_errlog", "head_errlog", + "Support for per-dataset on-disk error logs.", + ZFEATURE_FLAG_ACTIVATE_ON_ENABLE, ZFEATURE_TYPE_BOOLEAN, NULL, + sfeatures); + zfs_mod_list_supported_free(sfeatures); } diff --git a/module/zcommon/zfs_prop.c b/module/zcommon/zfs_prop.c index 8b3e774d99ec..500d80a33b6b 100644 --- a/module/zcommon/zfs_prop.c +++ b/module/zcommon/zfs_prop.c @@ -1006,7 +1006,10 @@ uint8_t **zfs_kfpu_fpregs; EXPORT_SYMBOL(zfs_kfpu_fpregs); #endif /* defined(HAVE_KERNEL_FPU_INTERNAL) */ -static int __init +extern int __init zcommon_init(void); +extern void zcommon_fini(void); + +int __init zcommon_init(void) { int error = kfpu_init(); @@ -1018,22 +1021,19 @@ zcommon_init(void) return (0); } -static void __exit +void zcommon_fini(void) { fletcher_4_fini(); kfpu_fini(); } +#ifdef __FreeBSD__ module_init_early(zcommon_init); module_exit(zcommon_fini); - #endif -ZFS_MODULE_DESCRIPTION("Generic ZFS support"); -ZFS_MODULE_AUTHOR(ZFS_META_AUTHOR); -ZFS_MODULE_LICENSE(ZFS_META_LICENSE); -ZFS_MODULE_VERSION(ZFS_META_VERSION "-" ZFS_META_RELEASE); +#endif /* zfs dataset property functions */ EXPORT_SYMBOL(zfs_userquota_prop_prefixes); diff --git a/module/zfs/Makefile.in b/module/zfs/Makefile.in deleted file mode 100644 index 30dc91a7eb59..000000000000 --- a/module/zfs/Makefile.in +++ /dev/null @@ -1,158 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -mfdir = $(obj) -else -mfdir = $(srctree)/$(src) -endif - -MODULE := zfs - -obj-$(CONFIG_ZFS) := $(MODULE).o - -# Suppress unused-value warnings in sparc64 architecture headers -ccflags-$(CONFIG_SPARC64) += -Wno-unused-value - -$(MODULE)-objs += abd.o -$(MODULE)-objs += aggsum.o -$(MODULE)-objs += arc.o -$(MODULE)-objs += blkptr.o -$(MODULE)-objs += bplist.o -$(MODULE)-objs += bpobj.o -$(MODULE)-objs += bptree.o -$(MODULE)-objs += btree.o -$(MODULE)-objs += bqueue.o -$(MODULE)-objs += dataset_kstats.o -$(MODULE)-objs += dbuf.o -$(MODULE)-objs += dbuf_stats.o -$(MODULE)-objs += ddt.o -$(MODULE)-objs += ddt_zap.o -$(MODULE)-objs += dmu.o -$(MODULE)-objs += dmu_diff.o -$(MODULE)-objs += dmu_object.o -$(MODULE)-objs += dmu_objset.o -$(MODULE)-objs += dmu_recv.o -$(MODULE)-objs += dmu_redact.o -$(MODULE)-objs += dmu_send.o -$(MODULE)-objs += dmu_traverse.o -$(MODULE)-objs += dmu_tx.o -$(MODULE)-objs += dmu_zfetch.o -$(MODULE)-objs += dnode.o -$(MODULE)-objs += dnode_sync.o -$(MODULE)-objs += dsl_bookmark.o -$(MODULE)-objs += dsl_crypt.o -$(MODULE)-objs += dsl_dataset.o -$(MODULE)-objs += dsl_deadlist.o -$(MODULE)-objs += dsl_deleg.o -$(MODULE)-objs += dsl_destroy.o -$(MODULE)-objs += dsl_dir.o -$(MODULE)-objs += dsl_pool.o -$(MODULE)-objs += dsl_prop.o -$(MODULE)-objs += dsl_scan.o -$(MODULE)-objs += dsl_synctask.o -$(MODULE)-objs += dsl_userhold.o -$(MODULE)-objs += edonr_zfs.o -$(MODULE)-objs += fm.o -$(MODULE)-objs += gzip.o -$(MODULE)-objs += hkdf.o -$(MODULE)-objs += lz4.o -$(MODULE)-objs += lz4_zfs.o -$(MODULE)-objs += lzjb.o -$(MODULE)-objs += metaslab.o -$(MODULE)-objs += mmp.o -$(MODULE)-objs += multilist.o -$(MODULE)-objs += objlist.o -$(MODULE)-objs += pathname.o -$(MODULE)-objs += range_tree.o -$(MODULE)-objs += refcount.o -$(MODULE)-objs += rrwlock.o -$(MODULE)-objs += sa.o -$(MODULE)-objs += sha256.o -$(MODULE)-objs += skein_zfs.o -$(MODULE)-objs += spa.o -$(MODULE)-objs += spa_boot.o -$(MODULE)-objs += spa_checkpoint.o -$(MODULE)-objs += spa_config.o -$(MODULE)-objs += spa_errlog.o -$(MODULE)-objs += spa_history.o -$(MODULE)-objs += spa_log_spacemap.o -$(MODULE)-objs += spa_misc.o -$(MODULE)-objs += spa_stats.o -$(MODULE)-objs += space_map.o -$(MODULE)-objs += space_reftree.o -$(MODULE)-objs += txg.o -$(MODULE)-objs += uberblock.o -$(MODULE)-objs += unique.o -$(MODULE)-objs += vdev.o -$(MODULE)-objs += vdev_cache.o -$(MODULE)-objs += vdev_draid.o -$(MODULE)-objs += vdev_draid_rand.o -$(MODULE)-objs += vdev_indirect.o -$(MODULE)-objs += vdev_indirect_births.o -$(MODULE)-objs += vdev_indirect_mapping.o -$(MODULE)-objs += vdev_initialize.o -$(MODULE)-objs += vdev_label.o -$(MODULE)-objs += vdev_mirror.o -$(MODULE)-objs += vdev_missing.o -$(MODULE)-objs += vdev_queue.o -$(MODULE)-objs += vdev_raidz.o -$(MODULE)-objs += vdev_raidz_math.o -$(MODULE)-objs += vdev_raidz_math_scalar.o -$(MODULE)-objs += vdev_rebuild.o -$(MODULE)-objs += vdev_removal.o -$(MODULE)-objs += vdev_root.o -$(MODULE)-objs += vdev_trim.o -$(MODULE)-objs += zap.o -$(MODULE)-objs += zap_leaf.o -$(MODULE)-objs += zap_micro.o -$(MODULE)-objs += zcp.o -$(MODULE)-objs += zcp_get.o -$(MODULE)-objs += zcp_global.o -$(MODULE)-objs += zcp_iter.o -$(MODULE)-objs += zcp_set.o -$(MODULE)-objs += zcp_synctask.o -$(MODULE)-objs += zfeature.o -$(MODULE)-objs += zfs_byteswap.o -$(MODULE)-objs += zfs_fm.o -$(MODULE)-objs += zfs_fuid.o -$(MODULE)-objs += zfs_ioctl.o -$(MODULE)-objs += zfs_log.o -$(MODULE)-objs += zfs_onexit.o -$(MODULE)-objs += zfs_quota.o -$(MODULE)-objs += zfs_ratelimit.o -$(MODULE)-objs += zfs_replay.o -$(MODULE)-objs += zfs_rlock.o -$(MODULE)-objs += zfs_sa.o -$(MODULE)-objs += zfs_vnops.o -$(MODULE)-objs += zil.o -$(MODULE)-objs += zio.o -$(MODULE)-objs += zio_checksum.o -$(MODULE)-objs += zio_compress.o -$(MODULE)-objs += zio_inject.o -$(MODULE)-objs += zle.o -$(MODULE)-objs += zrlock.o -$(MODULE)-objs += zthr.o -$(MODULE)-objs += zvol.o - -# Suppress incorrect warnings from versions of objtool which are not -# aware of x86 EVEX prefix instructions used for AVX512. -OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512bw.o := y -OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512f.o := y - -$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_sse2.o -$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_ssse3.o -$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_avx2.o -$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_avx512f.o -$(MODULE)-$(CONFIG_X86) += vdev_raidz_math_avx512bw.o - -$(MODULE)-$(CONFIG_ARM64) += vdev_raidz_math_aarch64_neon.o -$(MODULE)-$(CONFIG_ARM64) += vdev_raidz_math_aarch64_neonx2.o - -$(MODULE)-$(CONFIG_PPC) += vdev_raidz_math_powerpc_altivec.o -$(MODULE)-$(CONFIG_PPC64) += vdev_raidz_math_powerpc_altivec.o - -ifeq ($(CONFIG_ALTIVEC),y) -$(obj)/vdev_raidz_math_powerpc_altivec.o: c_flags += -maltivec -endif - -include $(mfdir)/../os/linux/zfs/Makefile diff --git a/module/zfs/arc.c b/module/zfs/arc.c index 89fef8a10a22..9fe2c67ffc08 100644 --- a/module/zfs/arc.c +++ b/module/zfs/arc.c @@ -11046,20 +11046,20 @@ EXPORT_SYMBOL(arc_add_prune_callback); EXPORT_SYMBOL(arc_remove_prune_callback); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, min, param_set_arc_min, - param_get_long, ZMOD_RW, "Min arc size"); + param_get_long, ZMOD_RW, "Minimum ARC size in bytes"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, max, param_set_arc_max, - param_get_long, ZMOD_RW, "Max arc size"); + param_get_long, ZMOD_RW, "Maximum ARC size in bytes"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, meta_limit, param_set_arc_long, - param_get_long, ZMOD_RW, "Metadata limit for arc size"); + param_get_long, ZMOD_RW, "Metadata limit for ARC size in bytes"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, meta_limit_percent, param_set_arc_long, param_get_long, ZMOD_RW, - "Percent of arc size for arc meta limit"); + "Percent of ARC size for ARC meta limit"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, meta_min, param_set_arc_long, - param_get_long, ZMOD_RW, "Min arc metadata"); + param_get_long, ZMOD_RW, "Minimum ARC metadata size in bytes"); ZFS_MODULE_PARAM(zfs_arc, zfs_arc_, meta_prune, INT, ZMOD_RW, "Meta objects to scan for prune"); @@ -11071,16 +11071,16 @@ ZFS_MODULE_PARAM(zfs_arc, zfs_arc_, meta_strategy, INT, ZMOD_RW, "Meta reclaim strategy"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, grow_retry, param_set_arc_int, - param_get_int, ZMOD_RW, "Seconds before growing arc size"); + param_get_int, ZMOD_RW, "Seconds before growing ARC size"); ZFS_MODULE_PARAM(zfs_arc, zfs_arc_, p_dampener_disable, INT, ZMOD_RW, "Disable arc_p adapt dampener"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, shrink_shift, param_set_arc_int, - param_get_int, ZMOD_RW, "log2(fraction of arc to reclaim)"); + param_get_int, ZMOD_RW, "log2(fraction of ARC to reclaim)"); ZFS_MODULE_PARAM(zfs_arc, zfs_arc_, pc_percent, UINT, ZMOD_RW, - "Percent of pagecache to reclaim arc to"); + "Percent of pagecache to reclaim ARC to"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, p_min_shift, param_set_arc_int, param_get_int, ZMOD_RW, "arc_c shift to calc min/max arc_p"); @@ -11089,7 +11089,7 @@ ZFS_MODULE_PARAM(zfs_arc, zfs_arc_, average_blocksize, INT, ZMOD_RD, "Target average block size"); ZFS_MODULE_PARAM(zfs, zfs_, compressed_arc_enabled, INT, ZMOD_RW, - "Disable compressed arc buffers"); + "Disable compressed ARC buffers"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, min_prefetch_ms, param_set_arc_int, param_get_int, ZMOD_RW, "Min life of prefetch block in ms"); @@ -11150,7 +11150,7 @@ ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, sys_free, param_set_arc_long, param_get_long, ZMOD_RW, "System free memory target size in bytes"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, dnode_limit, param_set_arc_long, - param_get_long, ZMOD_RW, "Minimum bytes of dnodes in arc"); + param_get_long, ZMOD_RW, "Minimum bytes of dnodes in ARC"); ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, dnode_limit_percent, param_set_arc_long, param_get_long, ZMOD_RW, diff --git a/module/zfs/dmu.c b/module/zfs/dmu.c index 16a13098bc45..46e3e2dd43dd 100644 --- a/module/zfs/dmu.c +++ b/module/zfs/dmu.c @@ -86,7 +86,7 @@ static int zfs_dmu_offset_next_sync = 1; * helps to limit the amount of memory that can be used by prefetching. * Larger objects should be prefetched a bit at a time. */ -static int dmu_prefetch_max = 8 * SPA_MAXBLOCKSIZE; +int dmu_prefetch_max = 8 * SPA_MAXBLOCKSIZE; const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = { {DMU_BSWAP_UINT8, TRUE, FALSE, FALSE, "unallocated" }, diff --git a/module/zfs/dsl_dataset.c b/module/zfs/dsl_dataset.c index e836d681e920..2d98c2f04d12 100644 --- a/module/zfs/dsl_dataset.c +++ b/module/zfs/dsl_dataset.c @@ -3708,6 +3708,15 @@ dsl_dataset_promote_sync(void *arg, dmu_tx_t *tx) dsl_dir_rele(odd, FTAG); promote_rele(ddpa, FTAG); + + /* + * Transfer common error blocks from old head to new head. + */ + if (spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_HEAD_ERRLOG)) { + uint64_t old_head = origin_head->ds_object; + uint64_t new_head = hds->ds_object; + spa_swap_errlog(dp->dp_spa, new_head, old_head, tx); + } } /* @@ -4924,6 +4933,37 @@ dsl_dataset_activate_redaction(dsl_dataset_t *ds, uint64_t *redact_snaps, ds->ds_feature[SPA_FEATURE_REDACTED_DATASETS] = ftuaa; } +/* + * Find and return (in *oldest_dsobj) the oldest snapshot of the dsobj + * dataset whose birth time is >= min_txg. + */ +int +dsl_dataset_oldest_snapshot(spa_t *spa, uint64_t head_ds, uint64_t min_txg, + uint64_t *oldest_dsobj) +{ + dsl_dataset_t *ds; + dsl_pool_t *dp = spa->spa_dsl_pool; + + int error = dsl_dataset_hold_obj(dp, head_ds, FTAG, &ds); + if (error != 0) + return (error); + + uint64_t prev_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj; + uint64_t prev_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg; + + while (prev_obj != 0 && min_txg < prev_obj_txg) { + dsl_dataset_rele(ds, FTAG); + if ((error = dsl_dataset_hold_obj(dp, prev_obj, + FTAG, &ds)) != 0) + return (error); + prev_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg; + prev_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj; + } + *oldest_dsobj = ds->ds_object; + dsl_dataset_rele(ds, FTAG); + return (0); +} + #if defined(_LP64) #define RECORDSIZE_PERM ZMOD_RW #else diff --git a/module/zfs/dsl_destroy.c b/module/zfs/dsl_destroy.c index b32929b3320c..7dddd8eed5e9 100644 --- a/module/zfs/dsl_destroy.c +++ b/module/zfs/dsl_destroy.c @@ -1153,6 +1153,9 @@ dsl_destroy_head_sync_impl(dsl_dataset_t *ds, dmu_tx_t *tx) dsl_destroy_snapshot_sync_impl(prev, B_FALSE, tx); dsl_dataset_rele(prev, FTAG); } + /* Delete errlog. */ + if (spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_HEAD_ERRLOG)) + spa_delete_dataset_errlog(dp->dp_spa, ds->ds_object, tx); } void diff --git a/module/zfs/metaslab.c b/module/zfs/metaslab.c index 1b4bda000582..032252e4129b 100644 --- a/module/zfs/metaslab.c +++ b/module/zfs/metaslab.c @@ -2767,7 +2767,8 @@ metaslab_fini_flush_data(metaslab_t *msp) mutex_exit(&spa->spa_flushed_ms_lock); spa_log_sm_decrement_mscount(spa, metaslab_unflushed_txg(msp)); - spa_log_summary_decrement_mscount(spa, metaslab_unflushed_txg(msp)); + spa_log_summary_decrement_mscount(spa, metaslab_unflushed_txg(msp), + metaslab_unflushed_dirty(msp)); } uint64_t @@ -3752,50 +3753,45 @@ metaslab_condense(metaslab_t *msp, dmu_tx_t *tx) metaslab_flush_update(msp, tx); } -/* - * Called when the metaslab has been flushed (its own spacemap now reflects - * all the contents of the pool-wide spacemap log). Updates the metaslab's - * metadata and any pool-wide related log space map data (e.g. summary, - * obsolete logs, etc..) to reflect that. - */ static void -metaslab_flush_update(metaslab_t *msp, dmu_tx_t *tx) +metaslab_unflushed_add(metaslab_t *msp, dmu_tx_t *tx) { - metaslab_group_t *mg = msp->ms_group; - spa_t *spa = mg->mg_vd->vdev_spa; - - ASSERT(MUTEX_HELD(&msp->ms_lock)); - - ASSERT3U(spa_sync_pass(spa), ==, 1); + spa_t *spa = msp->ms_group->mg_vd->vdev_spa; + ASSERT(spa_syncing_log_sm(spa) != NULL); + ASSERT(msp->ms_sm != NULL); ASSERT(range_tree_is_empty(msp->ms_unflushed_allocs)); ASSERT(range_tree_is_empty(msp->ms_unflushed_frees)); - /* - * Just because a metaslab got flushed, that doesn't mean that - * it will pass through metaslab_sync_done(). Thus, make sure to - * update ms_synced_length here in case it doesn't. - */ - msp->ms_synced_length = space_map_length(msp->ms_sm); + mutex_enter(&spa->spa_flushed_ms_lock); + metaslab_set_unflushed_txg(msp, spa_syncing_txg(spa), tx); + metaslab_set_unflushed_dirty(msp, B_TRUE); + avl_add(&spa->spa_metaslabs_by_flushed, msp); + mutex_exit(&spa->spa_flushed_ms_lock); - /* - * We may end up here from metaslab_condense() without the - * feature being active. In that case this is a no-op. - */ - if (!spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)) - return; + spa_log_sm_increment_current_mscount(spa); + spa_log_summary_add_flushed_metaslab(spa, B_TRUE); +} +void +metaslab_unflushed_bump(metaslab_t *msp, dmu_tx_t *tx, boolean_t dirty) +{ + spa_t *spa = msp->ms_group->mg_vd->vdev_spa; ASSERT(spa_syncing_log_sm(spa) != NULL); ASSERT(msp->ms_sm != NULL); ASSERT(metaslab_unflushed_txg(msp) != 0); ASSERT3P(avl_find(&spa->spa_metaslabs_by_flushed, msp, NULL), ==, msp); + ASSERT(range_tree_is_empty(msp->ms_unflushed_allocs)); + ASSERT(range_tree_is_empty(msp->ms_unflushed_frees)); VERIFY3U(tx->tx_txg, <=, spa_final_dirty_txg(spa)); /* update metaslab's position in our flushing tree */ uint64_t ms_prev_flushed_txg = metaslab_unflushed_txg(msp); + boolean_t ms_prev_flushed_dirty = metaslab_unflushed_dirty(msp); mutex_enter(&spa->spa_flushed_ms_lock); avl_remove(&spa->spa_metaslabs_by_flushed, msp); metaslab_set_unflushed_txg(msp, spa_syncing_txg(spa), tx); + metaslab_set_unflushed_dirty(msp, dirty); avl_add(&spa->spa_metaslabs_by_flushed, msp); mutex_exit(&spa->spa_flushed_ms_lock); @@ -3803,17 +3799,47 @@ metaslab_flush_update(metaslab_t *msp, dmu_tx_t *tx) spa_log_sm_decrement_mscount(spa, ms_prev_flushed_txg); spa_log_sm_increment_current_mscount(spa); + /* update log space map summary */ + spa_log_summary_decrement_mscount(spa, ms_prev_flushed_txg, + ms_prev_flushed_dirty); + spa_log_summary_add_flushed_metaslab(spa, dirty); + /* cleanup obsolete logs if any */ - uint64_t log_blocks_before = spa_log_sm_nblocks(spa); spa_cleanup_old_sm_logs(spa, tx); - uint64_t log_blocks_after = spa_log_sm_nblocks(spa); - VERIFY3U(log_blocks_after, <=, log_blocks_before); +} - /* update log space map summary */ - uint64_t blocks_gone = log_blocks_before - log_blocks_after; - spa_log_summary_add_flushed_metaslab(spa); - spa_log_summary_decrement_mscount(spa, ms_prev_flushed_txg); - spa_log_summary_decrement_blkcount(spa, blocks_gone); +/* + * Called when the metaslab has been flushed (its own spacemap now reflects + * all the contents of the pool-wide spacemap log). Updates the metaslab's + * metadata and any pool-wide related log space map data (e.g. summary, + * obsolete logs, etc..) to reflect that. + */ +static void +metaslab_flush_update(metaslab_t *msp, dmu_tx_t *tx) +{ + metaslab_group_t *mg = msp->ms_group; + spa_t *spa = mg->mg_vd->vdev_spa; + + ASSERT(MUTEX_HELD(&msp->ms_lock)); + + ASSERT3U(spa_sync_pass(spa), ==, 1); + + /* + * Just because a metaslab got flushed, that doesn't mean that + * it will pass through metaslab_sync_done(). Thus, make sure to + * update ms_synced_length here in case it doesn't. + */ + msp->ms_synced_length = space_map_length(msp->ms_sm); + + /* + * We may end up here from metaslab_condense() without the + * feature being active. In that case this is a no-op. + */ + if (!spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP) || + metaslab_unflushed_txg(msp) == 0) + return; + + metaslab_unflushed_bump(msp, tx, B_FALSE); } boolean_t @@ -4037,23 +4063,6 @@ metaslab_sync(metaslab_t *msp, uint64_t txg) ASSERT0(metaslab_allocated_space(msp)); } - if (metaslab_unflushed_txg(msp) == 0 && - spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)) { - ASSERT(spa_syncing_log_sm(spa) != NULL); - - metaslab_set_unflushed_txg(msp, spa_syncing_txg(spa), tx); - spa_log_sm_increment_current_mscount(spa); - spa_log_summary_add_flushed_metaslab(spa); - - ASSERT(msp->ms_sm != NULL); - mutex_enter(&spa->spa_flushed_ms_lock); - avl_add(&spa->spa_metaslabs_by_flushed, msp); - mutex_exit(&spa->spa_flushed_ms_lock); - - ASSERT(range_tree_is_empty(msp->ms_unflushed_allocs)); - ASSERT(range_tree_is_empty(msp->ms_unflushed_frees)); - } - if (!range_tree_is_empty(msp->ms_checkpointing) && vd->vdev_checkpoint_sm == NULL) { ASSERT(spa_has_checkpoint(spa)); @@ -4101,6 +4110,10 @@ metaslab_sync(metaslab_t *msp, uint64_t txg) space_map_t *log_sm = spa_syncing_log_sm(spa); if (log_sm != NULL) { ASSERT(spa_feature_is_enabled(spa, SPA_FEATURE_LOG_SPACEMAP)); + if (metaslab_unflushed_txg(msp) == 0) + metaslab_unflushed_add(msp, tx); + else if (!metaslab_unflushed_dirty(msp)) + metaslab_unflushed_bump(msp, tx, B_TRUE); space_map_write(log_sm, alloctree, SM_ALLOC, vd->vdev_id, tx); @@ -6209,6 +6222,12 @@ metaslab_enable(metaslab_t *msp, boolean_t sync, boolean_t unload) mutex_exit(&mg->mg_ms_disabled_lock); } +void +metaslab_set_unflushed_dirty(metaslab_t *ms, boolean_t dirty) +{ + ms->ms_unflushed_dirty = dirty; +} + static void metaslab_update_ondisk_flush_data(metaslab_t *ms, dmu_tx_t *tx) { @@ -6245,15 +6264,16 @@ metaslab_update_ondisk_flush_data(metaslab_t *ms, dmu_tx_t *tx) void metaslab_set_unflushed_txg(metaslab_t *ms, uint64_t txg, dmu_tx_t *tx) { - spa_t *spa = ms->ms_group->mg_vd->vdev_spa; - - if (!spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)) - return; - ms->ms_unflushed_txg = txg; metaslab_update_ondisk_flush_data(ms, tx); } +boolean_t +metaslab_unflushed_dirty(metaslab_t *ms) +{ + return (ms->ms_unflushed_dirty); +} + uint64_t metaslab_unflushed_txg(metaslab_t *ms) { diff --git a/module/zfs/spa.c b/module/zfs/spa.c index 1c8fd15b4d8d..12ff929e184e 100644 --- a/module/zfs/spa.c +++ b/module/zfs/spa.c @@ -4428,7 +4428,7 @@ spa_ld_load_vdev_metadata(spa_t *spa) error = spa_ld_log_spacemaps(spa); if (error != 0) { - spa_load_failed(spa, "spa_ld_log_sm_data failed [error=%d]", + spa_load_failed(spa, "spa_ld_log_spacemaps failed [error=%d]", error); return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, error)); } diff --git a/module/zfs/spa_errlog.c b/module/zfs/spa_errlog.c index c6b28ea7d1b8..9e5d1de63c0b 100644 --- a/module/zfs/spa_errlog.c +++ b/module/zfs/spa_errlog.c @@ -20,7 +20,8 @@ */ /* * Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2013, 2014 by Delphix. All rights reserved. + * Copyright (c) 2013, 2014, Delphix. All rights reserved. + * Copyright (c) 2021, George Amanakis. All rights reserved. */ /* @@ -43,6 +44,16 @@ * calculation when the data is requested, storing the result so future queries * will be faster. * + * If the head_errlog feature is enabled, a different on-disk format is used. + * The error log of each head dataset is stored separately in the zap object + * and keyed by the head id. This enables listing every dataset affected in + * userland. In order to be able to track whether an error block has been + * modified or added to snapshots since it was marked as an error, a new tuple + * is introduced: zbookmark_err_phys_t. It allows the storage of the birth + * transaction group of an error block on-disk. The birth transaction group is + * used by check_filesystem() to assess whether this block was freed, + * re-written or added to a snapshot since its marking as an error. + * * This log is then shipped into an nvlist where the key is the dataset name and * the value is the object name. Userland is then responsible for uniquifying * this list and displaying it to the user. @@ -53,7 +64,17 @@ #include #include #include +#include +#include +#include +/* + * spa_upgrade_errlog_limit : A zfs module parameter that controls the number + * of on-disk error log entries that will be converted to the new + * format when enabling head_errlog. Defaults to 0 which converts + * all log entries. + */ +static uint32_t spa_upgrade_errlog_limit = 0; /* * Convert a bookmark to a string. @@ -67,9 +88,35 @@ bookmark_to_name(zbookmark_phys_t *zb, char *buf, size_t len) } /* - * Convert a string to a bookmark + * Convert an err_phys to a string. + */ +static void +errphys_to_name(zbookmark_err_phys_t *zep, char *buf, size_t len) +{ + (void) snprintf(buf, len, "%llx:%llx:%llx:%llx", + (u_longlong_t)zep->zb_object, (u_longlong_t)zep->zb_level, + (u_longlong_t)zep->zb_blkid, (u_longlong_t)zep->zb_birth); +} + +/* + * Convert a string to a err_phys. + */ +static void +name_to_errphys(char *buf, zbookmark_err_phys_t *zep) +{ + zep->zb_object = zfs_strtonum(buf, &buf); + ASSERT(*buf == ':'); + zep->zb_level = (int)zfs_strtonum(buf + 1, &buf); + ASSERT(*buf == ':'); + zep->zb_blkid = zfs_strtonum(buf + 1, &buf); + ASSERT(*buf == ':'); + zep->zb_birth = zfs_strtonum(buf + 1, &buf); + ASSERT(*buf == '\0'); +} + +/* + * Convert a string to a bookmark. */ -#ifdef _KERNEL static void name_to_bookmark(char *buf, zbookmark_phys_t *zb) { @@ -82,8 +129,74 @@ name_to_bookmark(char *buf, zbookmark_phys_t *zb) zb->zb_blkid = zfs_strtonum(buf + 1, &buf); ASSERT(*buf == '\0'); } + +#ifdef _KERNEL +static void +zep_to_zb(uint64_t dataset, zbookmark_err_phys_t *zep, zbookmark_phys_t *zb) +{ + zb->zb_objset = dataset; + zb->zb_object = zep->zb_object; + zb->zb_level = zep->zb_level; + zb->zb_blkid = zep->zb_blkid; +} #endif +static void +name_to_object(char *buf, uint64_t *obj) +{ + *obj = zfs_strtonum(buf, &buf); + ASSERT(*buf == '\0'); +} + +static int +get_head_and_birth_txg(spa_t *spa, zbookmark_err_phys_t *zep, uint64_t ds_obj, + uint64_t *head_dataset_id) +{ + dsl_pool_t *dp = spa->spa_dsl_pool; + dsl_dataset_t *ds; + objset_t *os; + + dsl_pool_config_enter(dp, FTAG); + int error = dsl_dataset_hold_obj(dp, ds_obj, FTAG, &ds); + if (error != 0) { + dsl_pool_config_exit(dp, FTAG); + return (error); + } + ASSERT(head_dataset_id); + *head_dataset_id = dsl_dir_phys(ds->ds_dir)->dd_head_dataset_obj; + + error = dmu_objset_from_ds(ds, &os); + if (error != 0) { + dsl_dataset_rele(ds, FTAG); + dsl_pool_config_exit(dp, FTAG); + return (error); + } + + dnode_t *dn; + blkptr_t bp; + + error = dnode_hold(os, zep->zb_object, FTAG, &dn); + if (error != 0) { + dsl_dataset_rele(ds, FTAG); + dsl_pool_config_exit(dp, FTAG); + return (error); + } + + rw_enter(&dn->dn_struct_rwlock, RW_READER); + error = dbuf_dnode_findbp(dn, zep->zb_level, zep->zb_blkid, &bp, NULL, + NULL); + + if (error == 0 && BP_IS_HOLE(&bp)) + error = SET_ERROR(ENOENT); + + zep->zb_birth = bp.blk_birth; + rw_exit(&dn->dn_struct_rwlock); + dnode_rele(dn, FTAG); + dsl_dataset_rele(ds, FTAG); + dsl_pool_config_exit(dp, FTAG); + return (error); +} + /* * Log an uncorrectable error to the persistent error log. We add it to the * spa's list of pending errors. The changes are actually synced out to disk @@ -128,6 +241,276 @@ spa_log_error(spa_t *spa, const zbookmark_phys_t *zb) mutex_exit(&spa->spa_errlist_lock); } +#ifdef _KERNEL +static int +find_birth_txg(dsl_dataset_t *ds, zbookmark_err_phys_t *zep, + uint64_t *birth_txg) +{ + objset_t *os; + int error = dmu_objset_from_ds(ds, &os); + if (error != 0) + return (error); + + dnode_t *dn; + blkptr_t bp; + + error = dnode_hold(os, zep->zb_object, FTAG, &dn); + if (error != 0) + return (error); + + rw_enter(&dn->dn_struct_rwlock, RW_READER); + error = dbuf_dnode_findbp(dn, zep->zb_level, zep->zb_blkid, &bp, NULL, + NULL); + + if (error == 0 && BP_IS_HOLE(&bp)) + error = SET_ERROR(ENOENT); + + *birth_txg = bp.blk_birth; + rw_exit(&dn->dn_struct_rwlock); + dnode_rele(dn, FTAG); + return (error); +} + +/* + * This function serves a double role. If only_count is true, it returns + * (in *count) how many times an error block belonging to this filesystem is + * referenced by snapshots or clones. If only_count is false, each time the + * error block is referenced by a snapshot or clone, it fills the userspace + * array at uaddr with the bookmarks of the error blocks. The array is filled + * from the back and *count is modified to be the number of unused entries at + * the beginning of the array. + */ +static int +check_filesystem(spa_t *spa, uint64_t head_ds, zbookmark_err_phys_t *zep, + uint64_t *count, void *uaddr, boolean_t only_count) +{ + dsl_dataset_t *ds; + dsl_pool_t *dp = spa->spa_dsl_pool; + + int error = dsl_dataset_hold_obj(dp, head_ds, FTAG, &ds); + if (error != 0) + return (error); + + uint64_t latest_txg; + uint64_t txg_to_consider = spa->spa_syncing_txg; + boolean_t check_snapshot = B_TRUE; + error = find_birth_txg(ds, zep, &latest_txg); + if (error == 0) { + if (zep->zb_birth == latest_txg) { + /* Block neither free nor rewritten. */ + if (!only_count) { + zbookmark_phys_t zb; + zep_to_zb(head_ds, zep, &zb); + if (copyout(&zb, (char *)uaddr + (*count - 1) + * sizeof (zbookmark_phys_t), + sizeof (zbookmark_phys_t)) != 0) { + dsl_dataset_rele(ds, FTAG); + return (SET_ERROR(EFAULT)); + } + (*count)--; + } else { + (*count)++; + } + check_snapshot = B_FALSE; + } else { + ASSERT3U(zep->zb_birth, <, latest_txg); + txg_to_consider = latest_txg; + } + } + + /* How many snapshots reference this block. */ + uint64_t snap_count; + error = zap_count(spa->spa_meta_objset, + dsl_dataset_phys(ds)->ds_snapnames_zapobj, &snap_count); + if (error != 0) { + dsl_dataset_rele(ds, FTAG); + return (error); + } + + if (snap_count == 0) { + /* File system has no snapshot. */ + dsl_dataset_rele(ds, FTAG); + return (0); + } + + uint64_t *snap_obj_array = kmem_alloc(snap_count * sizeof (uint64_t), + KM_SLEEP); + + int aff_snap_count = 0; + uint64_t snap_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj; + uint64_t snap_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg; + + /* Check only snapshots created from this file system. */ + while (snap_obj != 0 && zep->zb_birth < snap_obj_txg && + snap_obj_txg <= txg_to_consider) { + + dsl_dataset_rele(ds, FTAG); + error = dsl_dataset_hold_obj(dp, snap_obj, FTAG, &ds); + if (error != 0) + goto out; + + if (dsl_dir_phys(ds->ds_dir)->dd_head_dataset_obj != head_ds) + break; + + boolean_t affected = B_TRUE; + if (check_snapshot) { + uint64_t blk_txg; + error = find_birth_txg(ds, zep, &blk_txg); + affected = (error == 0 && zep->zb_birth == blk_txg); + } + + if (affected) { + snap_obj_array[aff_snap_count] = snap_obj; + aff_snap_count++; + + if (!only_count) { + zbookmark_phys_t zb; + zep_to_zb(snap_obj, zep, &zb); + if (copyout(&zb, (char *)uaddr + (*count - 1) * + sizeof (zbookmark_phys_t), + sizeof (zbookmark_phys_t)) != 0) { + dsl_dataset_rele(ds, FTAG); + error = SET_ERROR(EFAULT); + goto out; + } + (*count)--; + } else { + (*count)++; + } + + /* + * Only clones whose origins were affected could also + * have affected snapshots. + */ + zap_cursor_t zc; + zap_attribute_t za; + for (zap_cursor_init(&zc, spa->spa_meta_objset, + dsl_dataset_phys(ds)->ds_next_clones_obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + error = check_filesystem(spa, + za.za_first_integer, zep, + count, uaddr, only_count); + + if (error != 0) { + zap_cursor_fini(&zc); + goto out; + } + } + zap_cursor_fini(&zc); + } + snap_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg; + snap_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj; + } + dsl_dataset_rele(ds, FTAG); + +out: + kmem_free(snap_obj_array, sizeof (*snap_obj_array)); + return (error); +} + +static int +find_top_affected_fs(spa_t *spa, uint64_t head_ds, zbookmark_err_phys_t *zep, + uint64_t *top_affected_fs) +{ + uint64_t oldest_dsobj; + int error = dsl_dataset_oldest_snapshot(spa, head_ds, zep->zb_birth, + &oldest_dsobj); + if (error != 0) + return (error); + + dsl_dataset_t *ds; + error = dsl_dataset_hold_obj(spa->spa_dsl_pool, oldest_dsobj, + FTAG, &ds); + if (error != 0) + return (error); + + *top_affected_fs = + dsl_dir_phys(ds->ds_dir)->dd_head_dataset_obj; + dsl_dataset_rele(ds, FTAG); + return (0); +} + +static int +process_error_block(spa_t *spa, uint64_t head_ds, zbookmark_err_phys_t *zep, + uint64_t *count, void *uaddr, boolean_t only_count) +{ + dsl_pool_t *dp = spa->spa_dsl_pool; + dsl_pool_config_enter(dp, FTAG); + uint64_t top_affected_fs; + + int error = find_top_affected_fs(spa, head_ds, zep, &top_affected_fs); + if (error == 0) + error = check_filesystem(spa, top_affected_fs, zep, count, + uaddr, only_count); + + dsl_pool_config_exit(dp, FTAG); + return (error); +} + +static uint64_t +get_errlog_size(spa_t *spa, uint64_t spa_err_obj) +{ + if (spa_err_obj == 0) + return (0); + uint64_t total = 0; + + zap_cursor_t zc; + zap_attribute_t za; + for (zap_cursor_init(&zc, spa->spa_meta_objset, spa_err_obj); + zap_cursor_retrieve(&zc, &za) == 0; zap_cursor_advance(&zc)) { + + zap_cursor_t head_ds_cursor; + zap_attribute_t head_ds_attr; + zbookmark_err_phys_t head_ds_block; + + uint64_t head_ds; + name_to_object(za.za_name, &head_ds); + + for (zap_cursor_init(&head_ds_cursor, spa->spa_meta_objset, + za.za_first_integer); zap_cursor_retrieve(&head_ds_cursor, + &head_ds_attr) == 0; zap_cursor_advance(&head_ds_cursor)) { + + name_to_errphys(head_ds_attr.za_name, &head_ds_block); + (void) process_error_block(spa, head_ds, &head_ds_block, + &total, NULL, B_TRUE); + } + zap_cursor_fini(&head_ds_cursor); + } + zap_cursor_fini(&zc); + return (total); +} + +static uint64_t +get_errlist_size(spa_t *spa, avl_tree_t *tree) +{ + if (avl_numnodes(tree) == 0) + return (0); + uint64_t total = 0; + + spa_error_entry_t *se; + for (se = avl_first(tree); se != NULL; se = AVL_NEXT(tree, se)) { + zbookmark_err_phys_t zep; + zep.zb_object = se->se_bookmark.zb_object; + zep.zb_level = se->se_bookmark.zb_level; + zep.zb_blkid = se->se_bookmark.zb_blkid; + + /* + * If we cannot find out the head dataset and birth txg of + * the present error block, we opt not to error out. In the + * next pool sync this information will be retrieved by + * sync_error_list() and written to the on-disk error log. + */ + uint64_t head_ds_obj; + if (get_head_and_birth_txg(spa, &zep, + se->se_bookmark.zb_objset, &head_ds_obj) == 0) + (void) process_error_block(spa, head_ds_obj, &zep, + &total, NULL, B_TRUE); + } + return (total); +} +#endif + /* * Return the number of errors currently in the error log. This is actually the * sum of both the last log and the current log, since we don't know the union @@ -136,83 +519,284 @@ spa_log_error(spa_t *spa, const zbookmark_phys_t *zb) uint64_t spa_get_errlog_size(spa_t *spa) { - uint64_t total = 0, count; + uint64_t total = 0; + + if (!spa_feature_is_enabled(spa, SPA_FEATURE_HEAD_ERRLOG)) { + mutex_enter(&spa->spa_errlog_lock); + uint64_t count; + if (spa->spa_errlog_scrub != 0 && + zap_count(spa->spa_meta_objset, spa->spa_errlog_scrub, + &count) == 0) + total += count; + + if (spa->spa_errlog_last != 0 && !spa->spa_scrub_finished && + zap_count(spa->spa_meta_objset, spa->spa_errlog_last, + &count) == 0) + total += count; + mutex_exit(&spa->spa_errlog_lock); + + mutex_enter(&spa->spa_errlist_lock); + total += avl_numnodes(&spa->spa_errlist_last); + total += avl_numnodes(&spa->spa_errlist_scrub); + mutex_exit(&spa->spa_errlist_lock); + } else { +#ifdef _KERNEL + mutex_enter(&spa->spa_errlog_lock); + total += get_errlog_size(spa, spa->spa_errlog_last); + total += get_errlog_size(spa, spa->spa_errlog_scrub); + mutex_exit(&spa->spa_errlog_lock); + + mutex_enter(&spa->spa_errlist_lock); + total += get_errlist_size(spa, &spa->spa_errlist_last); + total += get_errlist_size(spa, &spa->spa_errlist_scrub); + mutex_exit(&spa->spa_errlist_lock); +#endif + } + return (total); +} - mutex_enter(&spa->spa_errlog_lock); - if (spa->spa_errlog_scrub != 0 && - zap_count(spa->spa_meta_objset, spa->spa_errlog_scrub, - &count) == 0) - total += count; - - if (spa->spa_errlog_last != 0 && !spa->spa_scrub_finished && - zap_count(spa->spa_meta_objset, spa->spa_errlog_last, - &count) == 0) - total += count; - mutex_exit(&spa->spa_errlog_lock); +/* + * This function sweeps through an on-disk error log and stores all bookmarks + * as error bookmarks in a new ZAP object. At the end we discard the old one, + * and spa_update_errlog() will set the spa's on-disk error log to new ZAP + * object. + */ +static void +sync_upgrade_errlog(spa_t *spa, uint64_t spa_err_obj, uint64_t *newobj, + dmu_tx_t *tx) +{ + zap_cursor_t zc; + zap_attribute_t za; + zbookmark_phys_t zb; + uint64_t count; - mutex_enter(&spa->spa_errlist_lock); - total += avl_numnodes(&spa->spa_errlist_last); - total += avl_numnodes(&spa->spa_errlist_scrub); - mutex_exit(&spa->spa_errlist_lock); + *newobj = zap_create(spa->spa_meta_objset, DMU_OT_ERROR_LOG, + DMU_OT_NONE, 0, tx); - return (total); + /* + * If we cannnot perform the upgrade we should clear the old on-disk + * error logs. + */ + if (zap_count(spa->spa_meta_objset, spa_err_obj, &count) != 0) { + VERIFY0(dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)); + return; + } + + for (zap_cursor_init(&zc, spa->spa_meta_objset, spa_err_obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + if (spa_upgrade_errlog_limit != 0 && + zc.zc_cd == spa_upgrade_errlog_limit) + break; + + name_to_bookmark(za.za_name, &zb); + + zbookmark_err_phys_t zep; + zep.zb_object = zb.zb_object; + zep.zb_level = zb.zb_level; + zep.zb_blkid = zb.zb_blkid; + + /* + * We cannot use get_head_and_birth_txg() because it will + * acquire the pool config lock, which we already have. In case + * of an error we simply continue. + */ + uint64_t head_dataset_obj; + dsl_pool_t *dp = spa->spa_dsl_pool; + dsl_dataset_t *ds; + objset_t *os; + + int error = dsl_dataset_hold_obj(dp, zb.zb_objset, FTAG, &ds); + if (error != 0) + continue; + + head_dataset_obj = + dsl_dir_phys(ds->ds_dir)->dd_head_dataset_obj; + + /* + * The objset and the dnode are required for getting the block + * pointer, which is used to determine if BP_IS_HOLE(). If + * getting the objset or the dnode fails, do not create a + * zap entry (presuming we know the dataset) as this may create + * spurious errors that we cannot ever resolve. If an error is + * truly persistent, it should re-appear after a scan. + */ + if (dmu_objset_from_ds(ds, &os) != 0) { + dsl_dataset_rele(ds, FTAG); + continue; + } + + dnode_t *dn; + blkptr_t bp; + + if (dnode_hold(os, zep.zb_object, FTAG, &dn) != 0) { + dsl_dataset_rele(ds, FTAG); + continue; + } + + rw_enter(&dn->dn_struct_rwlock, RW_READER); + error = dbuf_dnode_findbp(dn, zep.zb_level, zep.zb_blkid, &bp, + NULL, NULL); + + zep.zb_birth = bp.blk_birth; + rw_exit(&dn->dn_struct_rwlock); + dnode_rele(dn, FTAG); + dsl_dataset_rele(ds, FTAG); + + if (error != 0 || BP_IS_HOLE(&bp)) + continue; + + uint64_t err_obj; + error = zap_lookup_int_key(spa->spa_meta_objset, *newobj, + head_dataset_obj, &err_obj); + + if (error == ENOENT) { + err_obj = zap_create(spa->spa_meta_objset, + DMU_OT_ERROR_LOG, DMU_OT_NONE, 0, tx); + + (void) zap_update_int_key(spa->spa_meta_objset, + *newobj, head_dataset_obj, err_obj, tx); + } + + char buf[64]; + char *name = ""; + errphys_to_name(&zep, buf, sizeof (buf)); + + (void) zap_update(spa->spa_meta_objset, err_obj, + buf, 1, strlen(name) + 1, name, tx); + } + zap_cursor_fini(&zc); + + VERIFY0(dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)); +} + +void +spa_upgrade_errlog(spa_t *spa, dmu_tx_t *tx) +{ + uint64_t newobj = 0; + + mutex_enter(&spa->spa_errlog_lock); + if (spa->spa_errlog_last != 0) { + sync_upgrade_errlog(spa, spa->spa_errlog_last, &newobj, tx); + spa->spa_errlog_last = newobj; + } + + if (spa->spa_errlog_scrub != 0) { + sync_upgrade_errlog(spa, spa->spa_errlog_scrub, &newobj, tx); + spa->spa_errlog_scrub = newobj; + } + mutex_exit(&spa->spa_errlog_lock); } #ifdef _KERNEL +/* + * If an error block is shared by two datasets it will be counted twice. For + * detailed message see spa_get_errlog_size() above. + */ static int -process_error_log(spa_t *spa, uint64_t obj, void *addr, size_t *count) +process_error_log(spa_t *spa, uint64_t obj, void *uaddr, uint64_t *count) { zap_cursor_t zc; zap_attribute_t za; - zbookmark_phys_t zb; if (obj == 0) return (0); - for (zap_cursor_init(&zc, spa->spa_meta_objset, obj); - zap_cursor_retrieve(&zc, &za) == 0; - zap_cursor_advance(&zc)) { + if (!spa_feature_is_enabled(spa, SPA_FEATURE_HEAD_ERRLOG)) { + for (zap_cursor_init(&zc, spa->spa_meta_objset, obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + if (*count == 0) { + zap_cursor_fini(&zc); + return (SET_ERROR(ENOMEM)); + } + + zbookmark_phys_t zb; + name_to_bookmark(za.za_name, &zb); + + if (copyout(&zb, (char *)uaddr + + (*count - 1) * sizeof (zbookmark_phys_t), + sizeof (zbookmark_phys_t)) != 0) { + zap_cursor_fini(&zc); + return (SET_ERROR(EFAULT)); + } + *count -= 1; - if (*count == 0) { - zap_cursor_fini(&zc); - return (SET_ERROR(ENOMEM)); } + zap_cursor_fini(&zc); + return (0); + } - name_to_bookmark(za.za_name, &zb); + for (zap_cursor_init(&zc, spa->spa_meta_objset, obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { - if (copyout(&zb, (char *)addr + - (*count - 1) * sizeof (zbookmark_phys_t), - sizeof (zbookmark_phys_t)) != 0) { - zap_cursor_fini(&zc); - return (SET_ERROR(EFAULT)); + zap_cursor_t head_ds_cursor; + zap_attribute_t head_ds_attr; + + uint64_t head_ds_err_obj = za.za_first_integer; + uint64_t head_ds; + name_to_object(za.za_name, &head_ds); + for (zap_cursor_init(&head_ds_cursor, spa->spa_meta_objset, + head_ds_err_obj); zap_cursor_retrieve(&head_ds_cursor, + &head_ds_attr) == 0; zap_cursor_advance(&head_ds_cursor)) { + + zbookmark_err_phys_t head_ds_block; + name_to_errphys(head_ds_attr.za_name, &head_ds_block); + int error = process_error_block(spa, head_ds, + &head_ds_block, count, uaddr, B_FALSE); + + if (error != 0) { + zap_cursor_fini(&head_ds_cursor); + zap_cursor_fini(&zc); + return (error); + } } - - *count -= 1; + zap_cursor_fini(&head_ds_cursor); } - zap_cursor_fini(&zc); - return (0); } static int -process_error_list(avl_tree_t *list, void *addr, size_t *count) +process_error_list(spa_t *spa, avl_tree_t *list, void *uaddr, uint64_t *count) { spa_error_entry_t *se; - for (se = avl_first(list); se != NULL; se = AVL_NEXT(list, se)) { + if (!spa_feature_is_enabled(spa, SPA_FEATURE_HEAD_ERRLOG)) { + for (se = avl_first(list); se != NULL; + se = AVL_NEXT(list, se)) { - if (*count == 0) - return (SET_ERROR(ENOMEM)); + if (*count == 0) + return (SET_ERROR(ENOMEM)); - if (copyout(&se->se_bookmark, (char *)addr + - (*count - 1) * sizeof (zbookmark_phys_t), - sizeof (zbookmark_phys_t)) != 0) - return (SET_ERROR(EFAULT)); + if (copyout(&se->se_bookmark, (char *)uaddr + + (*count - 1) * sizeof (zbookmark_phys_t), + sizeof (zbookmark_phys_t)) != 0) + return (SET_ERROR(EFAULT)); - *count -= 1; + *count -= 1; + } + return (0); } + for (se = avl_first(list); se != NULL; se = AVL_NEXT(list, se)) { + zbookmark_err_phys_t zep; + zep.zb_object = se->se_bookmark.zb_object; + zep.zb_level = se->se_bookmark.zb_level; + zep.zb_blkid = se->se_bookmark.zb_blkid; + + uint64_t head_ds_obj; + int error = get_head_and_birth_txg(spa, &zep, + se->se_bookmark.zb_objset, &head_ds_obj); + if (error != 0) + return (error); + + error = process_error_block(spa, head_ds_obj, &zep, count, + uaddr, B_FALSE); + if (error != 0) + return (error); + } return (0); } #endif @@ -229,7 +813,7 @@ process_error_list(avl_tree_t *list, void *addr, size_t *count) * the error list lock when we are finished. */ int -spa_get_errlog(spa_t *spa, void *uaddr, size_t *count) +spa_get_errlog(spa_t *spa, void *uaddr, uint64_t *count) { int ret = 0; @@ -244,10 +828,10 @@ spa_get_errlog(spa_t *spa, void *uaddr, size_t *count) mutex_enter(&spa->spa_errlist_lock); if (!ret) - ret = process_error_list(&spa->spa_errlist_scrub, uaddr, + ret = process_error_list(spa, &spa->spa_errlist_scrub, uaddr, count); if (!ret) - ret = process_error_list(&spa->spa_errlist_last, uaddr, + ret = process_error_list(spa, &spa->spa_errlist_last, uaddr, count); mutex_exit(&spa->spa_errlist_lock); @@ -299,35 +883,91 @@ spa_errlog_drain(spa_t *spa) /* * Process a list of errors into the current on-disk log. */ -static void +void sync_error_list(spa_t *spa, avl_tree_t *t, uint64_t *obj, dmu_tx_t *tx) { spa_error_entry_t *se; char buf[64]; void *cookie; - if (avl_numnodes(t) != 0) { - /* create log if necessary */ - if (*obj == 0) - *obj = zap_create(spa->spa_meta_objset, - DMU_OT_ERROR_LOG, DMU_OT_NONE, - 0, tx); + if (avl_numnodes(t) == 0) + return; + + /* create log if necessary */ + if (*obj == 0) + *obj = zap_create(spa->spa_meta_objset, DMU_OT_ERROR_LOG, + DMU_OT_NONE, 0, tx); - /* add errors to the current log */ + /* add errors to the current log */ + if (!spa_feature_is_enabled(spa, SPA_FEATURE_HEAD_ERRLOG)) { for (se = avl_first(t); se != NULL; se = AVL_NEXT(t, se)) { char *name = se->se_name ? se->se_name : ""; bookmark_to_name(&se->se_bookmark, buf, sizeof (buf)); + (void) zap_update(spa->spa_meta_objset, *obj, buf, 1, + strlen(name) + 1, name, tx); + } + } else { + for (se = avl_first(t); se != NULL; se = AVL_NEXT(t, se)) { + char *name = se->se_name ? se->se_name : ""; + + zbookmark_err_phys_t zep; + zep.zb_object = se->se_bookmark.zb_object; + zep.zb_level = se->se_bookmark.zb_level; + zep.zb_blkid = se->se_bookmark.zb_blkid; + + /* + * If we cannot find out the head dataset and birth txg + * of the present error block, we simply continue. + * Reinserting that error block to the error lists, + * even if we are not syncing the final txg, results + * in duplicate posting of errors. + */ + uint64_t head_dataset_obj; + int error = get_head_and_birth_txg(spa, &zep, + se->se_bookmark.zb_objset, &head_dataset_obj); + if (error != 0) + continue; + + uint64_t err_obj; + error = zap_lookup_int_key(spa->spa_meta_objset, + *obj, head_dataset_obj, &err_obj); + + if (error == ENOENT) { + err_obj = zap_create(spa->spa_meta_objset, + DMU_OT_ERROR_LOG, DMU_OT_NONE, 0, tx); + + (void) zap_update_int_key(spa->spa_meta_objset, + *obj, head_dataset_obj, err_obj, tx); + } + errphys_to_name(&zep, buf, sizeof (buf)); + (void) zap_update(spa->spa_meta_objset, - *obj, buf, 1, strlen(name) + 1, name, tx); + err_obj, buf, 1, strlen(name) + 1, name, tx); } + } + /* purge the error list */ + cookie = NULL; + while ((se = avl_destroy_nodes(t, &cookie)) != NULL) + kmem_free(se, sizeof (spa_error_entry_t)); +} - /* purge the error list */ - cookie = NULL; - while ((se = avl_destroy_nodes(t, &cookie)) != NULL) - kmem_free(se, sizeof (spa_error_entry_t)); +static void +delete_errlog(spa_t *spa, uint64_t spa_err_obj, dmu_tx_t *tx) +{ + if (spa_feature_is_enabled(spa, SPA_FEATURE_HEAD_ERRLOG)) { + zap_cursor_t zc; + zap_attribute_t za; + for (zap_cursor_init(&zc, spa->spa_meta_objset, spa_err_obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + VERIFY0(dmu_object_free(spa->spa_meta_objset, + za.za_first_integer, tx)); + } + zap_cursor_fini(&zc); } + VERIFY0(dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)); } /* @@ -378,8 +1018,7 @@ spa_errlog_sync(spa_t *spa, uint64_t txg) */ if (scrub_finished) { if (spa->spa_errlog_last != 0) - VERIFY(dmu_object_free(spa->spa_meta_objset, - spa->spa_errlog_last, tx) == 0); + delete_errlog(spa, spa->spa_errlog_last, tx); spa->spa_errlog_last = spa->spa_errlog_scrub; spa->spa_errlog_scrub = 0; @@ -406,6 +1045,137 @@ spa_errlog_sync(spa_t *spa, uint64_t txg) mutex_exit(&spa->spa_errlog_lock); } +static void +delete_dataset_errlog(spa_t *spa, uint64_t spa_err_obj, uint64_t ds, + dmu_tx_t *tx) +{ + if (spa_err_obj == 0) + return; + + zap_cursor_t zc; + zap_attribute_t za; + for (zap_cursor_init(&zc, spa->spa_meta_objset, spa_err_obj); + zap_cursor_retrieve(&zc, &za) == 0; zap_cursor_advance(&zc)) { + uint64_t head_ds; + name_to_object(za.za_name, &head_ds); + if (head_ds == ds) { + (void) zap_remove(spa->spa_meta_objset, spa_err_obj, + za.za_name, tx); + VERIFY0(dmu_object_free(spa->spa_meta_objset, + za.za_first_integer, tx)); + break; + } + } + zap_cursor_fini(&zc); +} + +void +spa_delete_dataset_errlog(spa_t *spa, uint64_t ds, dmu_tx_t *tx) +{ + mutex_enter(&spa->spa_errlog_lock); + delete_dataset_errlog(spa, spa->spa_errlog_scrub, ds, tx); + delete_dataset_errlog(spa, spa->spa_errlog_last, ds, tx); + mutex_exit(&spa->spa_errlog_lock); +} + +static int +find_txg_ancestor_snapshot(spa_t *spa, uint64_t new_head, uint64_t old_head, + uint64_t *txg) +{ + dsl_dataset_t *ds; + dsl_pool_t *dp = spa->spa_dsl_pool; + + int error = dsl_dataset_hold_obj(dp, old_head, FTAG, &ds); + if (error != 0) + return (error); + + uint64_t prev_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj; + uint64_t prev_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg; + + while (prev_obj != 0) { + dsl_dataset_rele(ds, FTAG); + if ((error = dsl_dataset_hold_obj(dp, prev_obj, + FTAG, &ds)) == 0 && + dsl_dir_phys(ds->ds_dir)->dd_head_dataset_obj == new_head) + break; + + if (error != 0) + return (error); + + prev_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg; + prev_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj; + } + dsl_dataset_rele(ds, FTAG); + ASSERT(prev_obj != 0); + *txg = prev_obj_txg; + return (0); +} + +static void +swap_errlog(spa_t *spa, uint64_t spa_err_obj, uint64_t new_head, uint64_t + old_head, dmu_tx_t *tx) +{ + if (spa_err_obj == 0) + return; + + uint64_t old_head_errlog; + int error = zap_lookup_int_key(spa->spa_meta_objset, spa_err_obj, + old_head, &old_head_errlog); + + /* If no error log, then there is nothing to do. */ + if (error != 0) + return; + + uint64_t txg; + error = find_txg_ancestor_snapshot(spa, new_head, old_head, &txg); + if (error != 0) + return; + + /* + * Create an error log if the file system being promoted does not + * already have one. + */ + uint64_t new_head_errlog; + error = zap_lookup_int_key(spa->spa_meta_objset, spa_err_obj, new_head, + &new_head_errlog); + + if (error != 0) { + new_head_errlog = zap_create(spa->spa_meta_objset, + DMU_OT_ERROR_LOG, DMU_OT_NONE, 0, tx); + + (void) zap_update_int_key(spa->spa_meta_objset, spa_err_obj, + new_head, new_head_errlog, tx); + } + + zap_cursor_t zc; + zap_attribute_t za; + zbookmark_err_phys_t err_block; + for (zap_cursor_init(&zc, spa->spa_meta_objset, old_head_errlog); + zap_cursor_retrieve(&zc, &za) == 0; zap_cursor_advance(&zc)) { + + char *name = ""; + name_to_errphys(za.za_name, &err_block); + if (err_block.zb_birth < txg) { + (void) zap_update(spa->spa_meta_objset, new_head_errlog, + za.za_name, 1, strlen(name) + 1, name, tx); + + (void) zap_remove(spa->spa_meta_objset, old_head_errlog, + za.za_name, tx); + } + } + zap_cursor_fini(&zc); +} + +void +spa_swap_errlog(spa_t *spa, uint64_t new_head_ds, uint64_t old_head_ds, + dmu_tx_t *tx) +{ + mutex_enter(&spa->spa_errlog_lock); + swap_errlog(spa, spa->spa_errlog_scrub, new_head_ds, old_head_ds, tx); + swap_errlog(spa, spa->spa_errlog_last, new_head_ds, old_head_ds, tx); + mutex_exit(&spa->spa_errlog_lock); +} + #if defined(_KERNEL) /* error handling */ EXPORT_SYMBOL(spa_log_error); @@ -415,4 +1185,14 @@ EXPORT_SYMBOL(spa_errlog_rotate); EXPORT_SYMBOL(spa_errlog_drain); EXPORT_SYMBOL(spa_errlog_sync); EXPORT_SYMBOL(spa_get_errlists); +EXPORT_SYMBOL(spa_delete_dataset_errlog); +EXPORT_SYMBOL(spa_swap_errlog); +EXPORT_SYMBOL(sync_error_list); +EXPORT_SYMBOL(spa_upgrade_errlog); #endif + +/* BEGIN CSTYLED */ +ZFS_MODULE_PARAM(zfs_spa, spa_, upgrade_errlog_limit, INT, ZMOD_RW, + "Limit the number of errors which will be upgraded to the new " + "on-disk error log when enabling head_errlog"); +/* END CSTYLED */ diff --git a/module/zfs/spa_log_spacemap.c b/module/zfs/spa_log_spacemap.c index 1505b5f76684..5e8c64b7f111 100644 --- a/module/zfs/spa_log_spacemap.c +++ b/module/zfs/spa_log_spacemap.c @@ -257,7 +257,12 @@ static unsigned long zfs_unflushed_log_block_min = 1000; * terms of performance. Thus we have a hard limit in the size of the log in * terms of blocks. */ -static unsigned long zfs_unflushed_log_block_max = (1ULL << 18); +static unsigned long zfs_unflushed_log_block_max = (1ULL << 17); + +/* + * Also we have a hard limit in the size of the log in terms of dirty TXGs. + */ +static unsigned long zfs_unflushed_log_txg_max = 1000; /* * Max # of rows allowed for the log_summary. The tradeoff here is accuracy and @@ -333,9 +338,13 @@ spa_log_sm_set_blocklimit(spa_t *spa) return; } - uint64_t calculated_limit = - (spa_total_metaslabs(spa) * zfs_unflushed_log_block_pct) / 100; - spa->spa_unflushed_stats.sus_blocklimit = MIN(MAX(calculated_limit, + uint64_t msdcount = 0; + for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); + e; e = list_next(&spa->spa_log_summary, e)) + msdcount += e->lse_msdcount; + + uint64_t limit = msdcount * zfs_unflushed_log_block_pct / 100; + spa->spa_unflushed_stats.sus_blocklimit = MIN(MAX(limit, zfs_unflushed_log_block_min), zfs_unflushed_log_block_max); } @@ -380,8 +389,13 @@ spa_log_summary_verify_counts(spa_t *spa) } static boolean_t -summary_entry_is_full(spa_t *spa, log_summary_entry_t *e) +summary_entry_is_full(spa_t *spa, log_summary_entry_t *e, uint64_t txg) { + if (e->lse_end == txg) + return (0); + if (e->lse_txgcount >= DIV_ROUND_UP(zfs_unflushed_log_txg_max, + zfs_max_logsm_summary_length)) + return (1); uint64_t blocks_per_row = MAX(1, DIV_ROUND_UP(spa_log_sm_blocklimit(spa), zfs_max_logsm_summary_length)); @@ -401,7 +415,7 @@ summary_entry_is_full(spa_t *spa, log_summary_entry_t *e) * the metaslab. */ void -spa_log_summary_decrement_mscount(spa_t *spa, uint64_t txg) +spa_log_summary_decrement_mscount(spa_t *spa, uint64_t txg, boolean_t dirty) { /* * We don't track summary data for read-only pools and this function @@ -429,6 +443,8 @@ spa_log_summary_decrement_mscount(spa_t *spa, uint64_t txg) } target->lse_mscount--; + if (dirty) + target->lse_msdcount--; } /* @@ -490,8 +506,10 @@ spa_log_summary_decrement_mscount(spa_t *spa, uint64_t txg) void spa_log_summary_decrement_blkcount(spa_t *spa, uint64_t blocks_gone) { - for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); - e != NULL; e = list_head(&spa->spa_log_summary)) { + log_summary_entry_t *e = list_head(&spa->spa_log_summary); + if (e->lse_txgcount > 0) + e->lse_txgcount--; + for (; e != NULL; e = list_head(&spa->spa_log_summary)) { if (e->lse_blkcount > blocks_gone) { /* * Assert that we stopped at an entry that is not @@ -560,31 +578,52 @@ spa_log_sm_increment_current_mscount(spa_t *spa) static void summary_add_data(spa_t *spa, uint64_t txg, uint64_t metaslabs_flushed, - uint64_t nblocks) + uint64_t metaslabs_dirty, uint64_t nblocks) { log_summary_entry_t *e = list_tail(&spa->spa_log_summary); - if (e == NULL || summary_entry_is_full(spa, e)) { + if (e == NULL || summary_entry_is_full(spa, e, txg)) { e = kmem_zalloc(sizeof (log_summary_entry_t), KM_SLEEP); - e->lse_start = txg; + e->lse_start = e->lse_end = txg; + e->lse_txgcount = 1; list_insert_tail(&spa->spa_log_summary, e); } ASSERT3U(e->lse_start, <=, txg); + if (e->lse_end < txg) { + e->lse_end = txg; + e->lse_txgcount++; + } e->lse_mscount += metaslabs_flushed; + e->lse_msdcount += metaslabs_dirty; e->lse_blkcount += nblocks; } static void spa_log_summary_add_incoming_blocks(spa_t *spa, uint64_t nblocks) { - summary_add_data(spa, spa_syncing_txg(spa), 0, nblocks); + summary_add_data(spa, spa_syncing_txg(spa), 0, 0, nblocks); } void -spa_log_summary_add_flushed_metaslab(spa_t *spa) +spa_log_summary_add_flushed_metaslab(spa_t *spa, boolean_t dirty) { - summary_add_data(spa, spa_syncing_txg(spa), 1, 0); + summary_add_data(spa, spa_syncing_txg(spa), 1, dirty ? 1 : 0, 0); +} + +void +spa_log_summary_dirty_flushed_metaslab(spa_t *spa, uint64_t txg) +{ + log_summary_entry_t *target = NULL; + for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); + e != NULL; e = list_next(&spa->spa_log_summary, e)) { + if (e->lse_start > txg) + break; + target = e; + } + ASSERT3P(target, !=, NULL); + ASSERT3U(target->lse_mscount, !=, 0); + target->lse_msdcount++; } /* @@ -637,6 +676,11 @@ spa_estimate_metaslabs_to_flush(spa_t *spa) int64_t available_blocks = spa_log_sm_blocklimit(spa) - spa_log_sm_nblocks(spa) - incoming; + int64_t available_txgs = zfs_unflushed_log_txg_max; + for (log_summary_entry_t *e = list_head(&spa->spa_log_summary); + e; e = list_next(&spa->spa_log_summary, e)) + available_txgs -= e->lse_txgcount; + /* * This variable tells us the total number of flushes needed to * keep the log size within the limit when we reach txgs_in_future. @@ -644,9 +688,7 @@ spa_estimate_metaslabs_to_flush(spa_t *spa) uint64_t total_flushes = 0; /* Holds the current maximum of our estimates so far. */ - uint64_t max_flushes_pertxg = - MIN(avl_numnodes(&spa->spa_metaslabs_by_flushed), - zfs_min_metaslabs_to_flush); + uint64_t max_flushes_pertxg = zfs_min_metaslabs_to_flush; /* * For our estimations we only look as far in the future @@ -660,11 +702,14 @@ spa_estimate_metaslabs_to_flush(spa_t *spa) * then keep skipping TXGs accumulating more blocks * based on the incoming rate until we exceed it. */ - if (available_blocks >= 0) { - uint64_t skip_txgs = (available_blocks / incoming) + 1; + if (available_blocks >= 0 && available_txgs >= 0) { + uint64_t skip_txgs = MIN(available_txgs + 1, + (available_blocks / incoming) + 1); available_blocks -= (skip_txgs * incoming); + available_txgs -= skip_txgs; txgs_in_future += skip_txgs; ASSERT3S(available_blocks, >=, -incoming); + ASSERT3S(available_txgs, >=, -1); } /* @@ -673,9 +718,10 @@ spa_estimate_metaslabs_to_flush(spa_t *spa) * based on the current entry in the summary, updating * our available_blocks. */ - ASSERT3S(available_blocks, <, 0); + ASSERT(available_blocks < 0 || available_txgs < 0); available_blocks += e->lse_blkcount; - total_flushes += e->lse_mscount; + available_txgs += e->lse_txgcount; + total_flushes += e->lse_msdcount; /* * Keep the running maximum of the total_flushes that @@ -687,8 +733,6 @@ spa_estimate_metaslabs_to_flush(spa_t *spa) */ max_flushes_pertxg = MAX(max_flushes_pertxg, DIV_ROUND_UP(total_flushes, txgs_in_future)); - ASSERT3U(avl_numnodes(&spa->spa_metaslabs_by_flushed), >=, - max_flushes_pertxg); } return (max_flushes_pertxg); } @@ -778,14 +822,11 @@ spa_flush_metaslabs(spa_t *spa, dmu_tx_t *tx) uint64_t want_to_flush; if (spa_flush_all_logs_requested(spa)) { ASSERT3S(spa_state(spa), ==, POOL_STATE_EXPORTED); - want_to_flush = avl_numnodes(&spa->spa_metaslabs_by_flushed); + want_to_flush = UINT64_MAX; } else { want_to_flush = spa_estimate_metaslabs_to_flush(spa); } - ASSERT3U(avl_numnodes(&spa->spa_metaslabs_by_flushed), >=, - want_to_flush); - /* Used purely for verification purposes */ uint64_t visited = 0; @@ -816,31 +857,22 @@ spa_flush_metaslabs(spa_t *spa, dmu_tx_t *tx) if (want_to_flush == 0 && !spa_log_exceeds_memlimit(spa)) break; - mutex_enter(&curr->ms_sync_lock); - mutex_enter(&curr->ms_lock); - boolean_t flushed = metaslab_flush(curr, tx); - mutex_exit(&curr->ms_lock); - mutex_exit(&curr->ms_sync_lock); - - /* - * If we failed to flush a metaslab (because it was loading), - * then we are done with the block heuristic as it's not - * possible to destroy any log space maps once you've skipped - * a metaslab. In that case we just set our counter to 0 but - * we continue looping in case there is still memory pressure - * due to unflushed changes. Note that, flushing a metaslab - * that is not the oldest flushed in the pool, will never - * destroy any log space maps [see spa_cleanup_old_sm_logs()]. - */ - if (!flushed) { - want_to_flush = 0; - } else if (want_to_flush > 0) { - want_to_flush--; - } + if (metaslab_unflushed_dirty(curr)) { + mutex_enter(&curr->ms_sync_lock); + mutex_enter(&curr->ms_lock); + metaslab_flush(curr, tx); + mutex_exit(&curr->ms_lock); + mutex_exit(&curr->ms_sync_lock); + if (want_to_flush > 0) + want_to_flush--; + } else + metaslab_unflushed_bump(curr, tx, B_FALSE); visited++; } ASSERT3U(avl_numnodes(&spa->spa_metaslabs_by_flushed), >=, visited); + + spa_log_sm_set_blocklimit(spa); } /* @@ -925,6 +957,7 @@ spa_cleanup_old_sm_logs(spa_t *spa, dmu_tx_t *tx) avl_remove(&spa->spa_sm_logs_by_txg, sls); space_map_free_obj(mos, sls->sls_sm_obj, tx); VERIFY0(zap_remove_int(mos, spacemap_zap, sls->sls_txg, tx)); + spa_log_summary_decrement_blkcount(spa, sls->sls_nblocks); spa->spa_unflushed_stats.sus_nblocks -= sls->sls_nblocks; kmem_free(sls, sizeof (spa_log_sm_t)); } @@ -984,12 +1017,7 @@ spa_generate_syncing_log_sm(spa_t *spa, dmu_tx_t *tx) VERIFY0(space_map_open(&spa->spa_syncing_log_sm, mos, sm_obj, 0, UINT64_MAX, SPA_MINBLOCKSHIFT)); - /* - * If the log space map feature was just enabled, the blocklimit - * has not yet been set. - */ - if (spa_log_sm_blocklimit(spa) == 0) - spa_log_sm_set_blocklimit(spa); + spa_log_sm_set_blocklimit(spa); } /* @@ -1115,12 +1143,18 @@ spa_ld_log_sm_cb(space_map_entry_t *sme, void *arg) panic("invalid maptype_t"); break; } + if (!metaslab_unflushed_dirty(ms)) { + metaslab_set_unflushed_dirty(ms, B_TRUE); + spa_log_summary_dirty_flushed_metaslab(spa, + metaslab_unflushed_txg(ms)); + } return (0); } static int spa_ld_log_sm_data(spa_t *spa) { + spa_log_sm_t *sls, *psls; int error = 0; /* @@ -1134,41 +1168,71 @@ spa_ld_log_sm_data(spa_t *spa) ASSERT0(spa->spa_unflushed_stats.sus_memused); hrtime_t read_logs_starttime = gethrtime(); - /* this is a no-op when we don't have space map logs */ - for (spa_log_sm_t *sls = avl_first(&spa->spa_sm_logs_by_txg); - sls; sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls)) { - space_map_t *sm = NULL; - error = space_map_open(&sm, spa_meta_objset(spa), - sls->sls_sm_obj, 0, UINT64_MAX, SPA_MINBLOCKSHIFT); - if (error != 0) { - spa_load_failed(spa, "spa_ld_log_sm_data(): failed at " - "space_map_open(obj=%llu) [error %d]", - (u_longlong_t)sls->sls_sm_obj, error); - goto out; + + /* Prefetch log spacemaps dnodes. */ + for (sls = avl_first(&spa->spa_sm_logs_by_txg); sls; + sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls)) { + dmu_prefetch(spa_meta_objset(spa), sls->sls_sm_obj, + 0, 0, 0, ZIO_PRIORITY_SYNC_READ); + } + + uint_t pn = 0; + uint64_t ps = 0; + psls = sls = avl_first(&spa->spa_sm_logs_by_txg); + while (sls != NULL) { + /* Prefetch log spacemaps up to 16 TXGs or MBs ahead. */ + if (psls != NULL && pn < 16 && + (pn < 2 || ps < 2 * dmu_prefetch_max)) { + error = space_map_open(&psls->sls_sm, + spa_meta_objset(spa), psls->sls_sm_obj, 0, + UINT64_MAX, SPA_MINBLOCKSHIFT); + if (error != 0) { + spa_load_failed(spa, "spa_ld_log_sm_data(): " + "failed at space_map_open(obj=%llu) " + "[error %d]", + (u_longlong_t)sls->sls_sm_obj, error); + goto out; + } + dmu_prefetch(spa_meta_objset(spa), psls->sls_sm_obj, + 0, 0, space_map_length(psls->sls_sm), + ZIO_PRIORITY_ASYNC_READ); + pn++; + ps += space_map_length(psls->sls_sm); + psls = AVL_NEXT(&spa->spa_sm_logs_by_txg, psls); + continue; } + /* Load TXG log spacemap into ms_unflushed_allocs/frees. */ + cond_resched(); + ASSERT0(sls->sls_nblocks); + sls->sls_nblocks = space_map_nblocks(sls->sls_sm); + spa->spa_unflushed_stats.sus_nblocks += sls->sls_nblocks; + summary_add_data(spa, sls->sls_txg, + sls->sls_mscount, 0, sls->sls_nblocks); + struct spa_ld_log_sm_arg vla = { .slls_spa = spa, .slls_txg = sls->sls_txg }; - error = space_map_iterate(sm, space_map_length(sm), - spa_ld_log_sm_cb, &vla); + error = space_map_iterate(sls->sls_sm, + space_map_length(sls->sls_sm), spa_ld_log_sm_cb, &vla); if (error != 0) { - space_map_close(sm); spa_load_failed(spa, "spa_ld_log_sm_data(): failed " "at space_map_iterate(obj=%llu) [error %d]", (u_longlong_t)sls->sls_sm_obj, error); goto out; } - ASSERT0(sls->sls_nblocks); - sls->sls_nblocks = space_map_nblocks(sm); - spa->spa_unflushed_stats.sus_nblocks += sls->sls_nblocks; - summary_add_data(spa, sls->sls_txg, - sls->sls_mscount, sls->sls_nblocks); + pn--; + ps -= space_map_length(sls->sls_sm); + space_map_close(sls->sls_sm); + sls->sls_sm = NULL; + sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls); - space_map_close(sm); + /* Update log block limits considering just loaded. */ + spa_log_sm_set_blocklimit(spa); } + hrtime_t read_logs_endtime = gethrtime(); spa_load_note(spa, "read %llu log space maps (%llu total blocks - blksz = %llu bytes) " @@ -1178,6 +1242,18 @@ spa_ld_log_sm_data(spa_t *spa) (longlong_t)((read_logs_endtime - read_logs_starttime) / 1000000)); out: + if (error != 0) { + for (spa_log_sm_t *sls = avl_first(&spa->spa_sm_logs_by_txg); + sls; sls = AVL_NEXT(&spa->spa_sm_logs_by_txg, sls)) { + if (sls->sls_sm) { + space_map_close(sls->sls_sm); + sls->sls_sm = NULL; + } + } + } else { + ASSERT0(pn); + ASSERT0(ps); + } /* * Now that the metaslabs contain their unflushed changes: * [1] recalculate their actual allocated space @@ -1258,6 +1334,9 @@ spa_ld_unflushed_txgs(vdev_t *vd) } ms->ms_unflushed_txg = entry.msp_unflushed_txg; + ms->ms_unflushed_dirty = B_FALSE; + ASSERT(range_tree_is_empty(ms->ms_unflushed_allocs)); + ASSERT(range_tree_is_empty(ms->ms_unflushed_frees)); if (ms->ms_unflushed_txg != 0) { mutex_enter(&spa->spa_flushed_ms_lock); avl_add(&spa->spa_metaslabs_by_flushed, ms); @@ -1321,6 +1400,10 @@ ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_block_min, ULONG, ZMOD_RW, "Lower-bound limit for the maximum amount of blocks allowed in " "log spacemap (see zfs_unflushed_log_block_max)"); +ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_txg_max, ULONG, ZMOD_RW, + "Hard limit (upper-bound) in the size of the space map log " + "in terms of dirty TXGs."); + ZFS_MODULE_PARAM(zfs, zfs_, unflushed_log_block_pct, ULONG, ZMOD_RW, "Tunable used to determine the number of blocks that can be used for " "the spacemap log, expressed as a percentage of the total number of " diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c index a42b79613628..cb79a8b5582c 100644 --- a/module/zfs/vdev.c +++ b/module/zfs/vdev.c @@ -1558,13 +1558,6 @@ vdev_metaslab_init(vdev_t *vd, uint64_t txg) if (txg == 0) spa_config_exit(spa, SCL_ALLOC, FTAG); - /* - * Regardless whether this vdev was just added or it is being - * expanded, the metaslab count has changed. Recalculate the - * block limit. - */ - spa_log_sm_set_blocklimit(spa); - return (0); } diff --git a/module/zfs/vdev_removal.c b/module/zfs/vdev_removal.c index f988ca22fa4a..5508d273758d 100644 --- a/module/zfs/vdev_removal.c +++ b/module/zfs/vdev_removal.c @@ -1386,7 +1386,6 @@ vdev_remove_complete(spa_t *spa) vdev_metaslab_fini(vd); metaslab_group_destroy(vd->vdev_mg); vd->vdev_mg = NULL; - spa_log_sm_set_blocklimit(spa); } if (vd->vdev_log_mg != NULL) { ASSERT0(vd->vdev_ms_count); @@ -2131,7 +2130,6 @@ spa_vdev_remove_log(vdev_t *vd, uint64_t *txg) * metaslab_class_histogram_verify() */ vdev_metaslab_fini(vd); - spa_log_sm_set_blocklimit(spa); spa_vdev_config_exit(spa, NULL, *txg, 0, FTAG); *txg = spa_vdev_config_enter(spa); diff --git a/module/zfs/zfeature.c b/module/zfs/zfeature.c index 99a316ae5a90..666d19f2a109 100644 --- a/module/zfs/zfeature.c +++ b/module/zfs/zfeature.c @@ -396,6 +396,13 @@ feature_enable_sync(spa_t *spa, zfeature_info_t *feature, dmu_tx_t *tx) !spa_feature_is_active(spa, SPA_FEATURE_ENCRYPTION) && feature->fi_feature == SPA_FEATURE_BOOKMARK_V2) spa->spa_errata = 0; + + /* + * Convert the old on-disk error log to the new format when activating + * the head_errlog feature. + */ + if (feature->fi_feature == SPA_FEATURE_HEAD_ERRLOG) + spa_upgrade_errlog(spa, tx); } static void diff --git a/module/zfs/zfs_ioctl.c b/module/zfs/zfs_ioctl.c index 5fb6a3071924..e2d6d9dea030 100644 --- a/module/zfs/zfs_ioctl.c +++ b/module/zfs/zfs_ioctl.c @@ -5671,7 +5671,7 @@ zfs_ioc_error_log(zfs_cmd_t *zc) { spa_t *spa; int error; - size_t count = (size_t)zc->zc_nvlist_dst_size; + uint64_t count = zc->zc_nvlist_dst_size; if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0) return (error); diff --git a/module/zstd/Makefile.in b/module/zstd/Makefile.in deleted file mode 100644 index 80096c3e379d..000000000000 --- a/module/zstd/Makefile.in +++ /dev/null @@ -1,69 +0,0 @@ -ifneq ($(KBUILD_EXTMOD),) -src = @abs_srcdir@ -obj = @abs_builddir@ -zstd_include = $(src)/include -else -zstd_include = $(srctree)/$(src)/include -endif - -MODULE := zzstd - -obj-$(CONFIG_ZFS) := $(MODULE).o - -asflags-y := -I$(zstd_include) -ccflags-y := -I$(zstd_include) - -# Zstd uses -O3 by default, so we should follow -ccflags-y += -O3 - -# -fno-tree-vectorize gets set for gcc in zstd/common/compiler.h -# Set it for other compilers, too. -common_flags := -fno-tree-vectorize - -# SSE register return with SSE disabled if -march=znverX is passed -common_flags += -U__BMI__ - -# Quiet warnings about frame size due to unused code in unmodified zstd lib -common_flags += -Wframe-larger-than=20480 - -ccflags-y += $(common_flags) - -vanilla-objs := lib/common/entropy_common.o \ - lib/common/error_private.o \ - lib/common/fse_decompress.o \ - lib/common/pool.o \ - lib/common/zstd_common.o \ - lib/compress/fse_compress.o \ - lib/compress/hist.o \ - lib/compress/huf_compress.o \ - lib/compress/zstd_compress_literals.o \ - lib/compress/zstd_compress_sequences.o \ - lib/compress/zstd_compress_superblock.o \ - lib/compress/zstd_compress.o \ - lib/compress/zstd_double_fast.o \ - lib/compress/zstd_fast.o \ - lib/compress/zstd_lazy.o \ - lib/compress/zstd_ldm.o \ - lib/compress/zstd_opt.o \ - lib/decompress/huf_decompress.o \ - lib/decompress/zstd_ddict.o \ - lib/decompress/zstd_decompress.o \ - lib/decompress/zstd_decompress_block.o - -# Disable aarch64 neon SIMD instructions for kernel mode -$(addprefix $(obj)/,$(vanilla-objs)) : ccflags-y += -include $(zstd_include)/aarch64_compat.h -include $(zstd_include)/zstd_compat_wrapper.h -Wp,-w $(common_flags) - -$(obj)/zfs_zstd.o: ccflags-y += -include $(zstd_include)/zstd_compat_wrapper.h $(common_flags) - -$(MODULE)-objs += zfs_zstd.o -$(MODULE)-objs += zstd_sparc.o -$(MODULE)-objs += $(vanilla-objs) - -all: - mkdir -p lib/common lib/compress lib/decompress - -gensymbols: - for obj in $(vanilla-objs); do echo; echo "/* $$obj: */"; @OBJDUMP@ -t $$obj | awk '$$2 == "g" && !/ zfs_/ {print "#define\t" $$6 " zfs_" $$6}' | sort; done >> include/zstd_compat_wrapper.h - -checksymbols: - @OBJDUMP@ -t $(vanilla-objs) | awk '/file format/ {print} $$2 == "g" && !/ zfs_/ {++ret; print} END {exit ret}' diff --git a/module/zstd/README.md b/module/zstd/README.md index 26d618b61b6e..7ad00e0bd804 100644 --- a/module/zstd/README.md +++ b/module/zstd/README.md @@ -9,7 +9,7 @@ library, besides upgrading to a newer ZSTD release. Tree structure: -* `zfs_zstd.c` is the actual `zzstd` kernel module. +* `zfs_zstd.c` are the actual `zfs` kernel module hooks. * `lib/` contains the unmodified version of the `Zstandard` library * `zstd-in.c` is our template file for generating the single-file library * `include/`: This directory contains supplemental includes for platform @@ -25,16 +25,7 @@ To update ZSTD the following steps need to be taken: `grep include [path to zstd]/contrib/single_file_libs/zstd-in.c | awk '{ print $2 }'` 3. Remove debug.c, threading.c, and zstdmt_compress.c. 4. Update Makefiles with resulting file lists. - -~~~ - -Note: if the zstd library for zfs is updated to a newer version, -the macro list in include/zstd_compat_wrapper.h usually needs to be updated. -this can be done with some hand crafting of the output of the following -script (on the object file generated from the "single-file library" script in zstd's -contrib/single_file_libs): -`nm zstd.o | awk '{print "#define "$3 " zfs_" $3}' > macrotable` - +5. Follow symbol renaming notes in `include/zstd_compat_wrapper.h` ## Altering ZSTD and breaking changes diff --git a/module/zstd/include/zstd_compat_wrapper.h b/module/zstd/include/zstd_compat_wrapper.h index de428175c7df..2c4baad27d4e 100644 --- a/module/zstd/include/zstd_compat_wrapper.h +++ b/module/zstd/include/zstd_compat_wrapper.h @@ -38,7 +38,7 @@ * This will cause a symbol collision with the older in-kernel zstd library. * * On update, truncate this file at the scissor line, rebuild the module, - * and make gensymbols. + * and make gen-zstd-symbols. */ #define MEM_MODULE diff --git a/module/zstd/zfs_zstd.c b/module/zstd/zfs_zstd.c index 5ae164663988..04e52ae3cec6 100644 --- a/module/zstd/zfs_zstd.c +++ b/module/zstd/zfs_zstd.c @@ -702,7 +702,7 @@ zstd_meminit(void) } /* Release object from pool and free memory */ -static void __exit +static void release_pool(struct zstd_pool *pool) { mutex_destroy(&pool->barrier); @@ -712,7 +712,7 @@ release_pool(struct zstd_pool *pool) } /* Release memory pool objects */ -static void __exit +static void zstd_mempool_deinit(void) { for (int i = 0; i < ZSTD_POOL_MAX; i++) { @@ -758,7 +758,7 @@ zstd_init(void) return (0); } -extern void __exit +extern void zstd_fini(void) { /* Deinitialize kstat */ @@ -776,12 +776,10 @@ zstd_fini(void) } #if defined(_KERNEL) +#ifdef __FreeBSD__ module_init(zstd_init); module_exit(zstd_fini); - -ZFS_MODULE_DESCRIPTION("ZSTD Compression for ZFS"); -ZFS_MODULE_LICENSE("Dual BSD/GPL"); -ZFS_MODULE_VERSION(ZSTD_VERSION_STRING "a"); +#endif EXPORT_SYMBOL(zfs_zstd_compress); EXPORT_SYMBOL(zfs_zstd_decompress_level); diff --git a/rpm/generic/zfs-kmod.spec.in b/rpm/generic/zfs-kmod.spec.in index 9cfec8ae9d30..d07000cbd638 100644 --- a/rpm/generic/zfs-kmod.spec.in +++ b/rpm/generic/zfs-kmod.spec.in @@ -185,7 +185,7 @@ for kernel_version in %{?kernel_versions}; do cd .. done # find-debuginfo.sh only considers executables -chmod u+x ${RPM_BUILD_ROOT}%{kmodinstdir_prefix}/*/extra/*/*/* +chmod u+x ${RPM_BUILD_ROOT}%{kmodinstdir_prefix}/*/extra/*/* %{?akmod_install} diff --git a/rpm/redhat/zfs-kmod.spec.in b/rpm/redhat/zfs-kmod.spec.in index 7162f79b6779..03933bb40a86 100644 --- a/rpm/redhat/zfs-kmod.spec.in +++ b/rpm/redhat/zfs-kmod.spec.in @@ -105,7 +105,7 @@ make install \ %{__rm} -f %{buildroot}/lib/modules/%{kverrel}/modules.* # find-debuginfo.sh only considers executables -%{__chmod} u+x %{buildroot}/lib/modules/%{kverrel}/extra/*/*/* +%{__chmod} u+x %{buildroot}/lib/modules/%{kverrel}/extra/*/* %clean rm -rf $RPM_BUILD_ROOT diff --git a/scripts/Makefile.am b/scripts/Makefile.am index e2772cf1d605..fffeb6b212d5 100644 --- a/scripts/Makefile.am +++ b/scripts/Makefile.am @@ -54,16 +54,9 @@ export INSTALL_MOUNT_HELPER_DIR=@mounthelperdir@ export INSTALL_SYSCONF_DIR=@sysconfdir@ export INSTALL_PYTHON_DIR=@pythonsitedir@ -export KMOD_SPL=@abs_top_builddir@/module/spl/spl.ko -export KMOD_ZAVL=@abs_top_builddir@/module/avl/zavl.ko -export KMOD_ZNVPAIR=@abs_top_builddir@/module/nvpair/znvpair.ko -export KMOD_ZUNICODE=@abs_top_builddir@/module/unicode/zunicode.ko -export KMOD_ZCOMMON=@abs_top_builddir@/module/zcommon/zcommon.ko -export KMOD_ZLUA=@abs_top_builddir@/module/lua/zlua.ko -export KMOD_ICP=@abs_top_builddir@/module/icp/icp.ko -export KMOD_ZFS=@abs_top_builddir@/module/zfs/zfs.ko +export KMOD_SPL=@abs_top_builddir@/module/spl.ko +export KMOD_ZFS=@abs_top_builddir@/module/zfs.ko export KMOD_FREEBSD=@abs_top_builddir@/module/openzfs.ko -export KMOD_ZZSTD=@abs_top_builddir@/module/zstd/zzstd.ko endef export EXTRA_ENVIRONMENT diff --git a/scripts/dkms.mkconf b/scripts/dkms.mkconf index 4090efa087f7..0bd383420435 100755 --- a/scripts/dkms.mkconf +++ b/scripts/dkms.mkconf @@ -77,38 +77,10 @@ STRIP[0]="\$( && echo -n no )" STRIP[1]="\${STRIP[0]}" -STRIP[2]="\${STRIP[0]}" -STRIP[3]="\${STRIP[0]}" -STRIP[4]="\${STRIP[0]}" -STRIP[5]="\${STRIP[0]}" -STRIP[6]="\${STRIP[0]}" -STRIP[7]="\${STRIP[0]}" -STRIP[8]="\${STRIP[0]}" -BUILT_MODULE_NAME[0]="zavl" -BUILT_MODULE_LOCATION[0]="module/avl/" -DEST_MODULE_LOCATION[0]="/extra/avl/avl" -BUILT_MODULE_NAME[1]="znvpair" -BUILT_MODULE_LOCATION[1]="module/nvpair/" -DEST_MODULE_LOCATION[1]="/extra/nvpair/znvpair" -BUILT_MODULE_NAME[2]="zunicode" -BUILT_MODULE_LOCATION[2]="module/unicode/" -DEST_MODULE_LOCATION[2]="/extra/unicode/zunicode" -BUILT_MODULE_NAME[3]="zcommon" -BUILT_MODULE_LOCATION[3]="module/zcommon/" -DEST_MODULE_LOCATION[3]="/extra/zcommon/zcommon" -BUILT_MODULE_NAME[4]="zfs" -BUILT_MODULE_LOCATION[4]="module/zfs/" -DEST_MODULE_LOCATION[4]="/extra/zfs/zfs" -BUILT_MODULE_NAME[5]="icp" -BUILT_MODULE_LOCATION[5]="module/icp/" -DEST_MODULE_LOCATION[5]="/extra/icp/icp" -BUILT_MODULE_NAME[6]="zlua" -BUILT_MODULE_LOCATION[6]="module/lua/" -DEST_MODULE_LOCATION[6]="/extra/lua/zlua" -BUILT_MODULE_NAME[7]="spl" -BUILT_MODULE_LOCATION[7]="module/spl/" -DEST_MODULE_LOCATION[7]="/extra/spl/spl" -BUILT_MODULE_NAME[8]="zzstd" -BUILT_MODULE_LOCATION[8]="module/zstd/" -DEST_MODULE_LOCATION[8]="/extra/zstd/zzstd" +BUILT_MODULE_NAME[0]="zfs" +BUILT_MODULE_LOCATION[0]="module/" +DEST_MODULE_LOCATION[0]="/extra" +BUILT_MODULE_NAME[1]="spl" +BUILT_MODULE_LOCATION[1]="module/" +DEST_MODULE_LOCATION[1]="/extra" EOF diff --git a/scripts/zfs.sh b/scripts/zfs.sh index edce2cbd4c64..502c5430ab05 100755 --- a/scripts/zfs.sh +++ b/scripts/zfs.sh @@ -3,7 +3,7 @@ # A simple script to load/unload the ZFS module stack. # -BASE_DIR=$(dirname "$0") +BASE_DIR=${0%/*} SCRIPT_COMMON=common.sh if [ -f "${BASE_DIR}/${SCRIPT_COMMON}" ]; then . "${BASE_DIR}/${SCRIPT_COMMON}" @@ -11,7 +11,6 @@ else echo "Missing helper script ${SCRIPT_COMMON}" && exit 1 fi -PROG=zfs.sh VERBOSE="no" UNLOAD="no" LOAD="yes" @@ -19,44 +18,35 @@ STACK_TRACER="no" ZED_PIDFILE=${ZED_PIDFILE:-/var/run/zed.pid} LDMOD=${LDMOD:-/sbin/modprobe} +DELMOD=${DELMOD:-/sbin/rmmod} KMOD_ZLIB_DEFLATE=${KMOD_ZLIB_DEFLATE:-zlib_deflate} KMOD_ZLIB_INFLATE=${KMOD_ZLIB_INFLATE:-zlib_inflate} KMOD_SPL=${KMOD_SPL:-spl} -KMOD_ZAVL=${KMOD_ZAVL:-zavl} -KMOD_ZNVPAIR=${KMOD_ZNVPAIR:-znvpair} -KMOD_ZUNICODE=${KMOD_ZUNICODE:-zunicode} -KMOD_ZCOMMON=${KMOD_ZCOMMON:-zcommon} -KMOD_ZLUA=${KMOD_ZLUA:-zlua} -KMOD_ICP=${KMOD_ICP:-icp} KMOD_ZFS=${KMOD_ZFS:-zfs} KMOD_FREEBSD=${KMOD_FREEBSD:-openzfs} -KMOD_ZZSTD=${KMOD_ZZSTD:-zzstd} usage() { -cat << EOF + cat << EOF USAGE: -$0 [hvudS] [module-options] +$0 [hvudS] DESCRIPTION: Load/unload the ZFS module stack. OPTIONS: - -h Show this message - -v Verbose + -h Show this message + -v Verbose -r Reload modules - -u Unload modules - -S Enable kernel stack tracer + -u Unload modules + -S Enable kernel stack tracer EOF + exit 1 } while getopts 'hvruS' OPTION; do case $OPTION in - h) - usage - exit 1 - ;; v) VERBOSE="yes" ;; @@ -71,18 +61,17 @@ while getopts 'hvruS' OPTION; do S) STACK_TRACER="yes" ;; - ?) - usage - exit - ;; *) + usage ;; esac done +shift $(( OPTIND - 1 )) +[ $# -eq 0 ] || usage kill_zed() { if [ -f "$ZED_PIDFILE" ]; then - PID=$(cat "$ZED_PIDFILE") + read -r PID <"$ZED_PIDFILE" kill "$PID" fi } @@ -91,8 +80,7 @@ check_modules_linux() { LOADED_MODULES="" MISSING_MODULES="" - for KMOD in $KMOD_SPL $KMOD_ZAVL $KMOD_ZNVPAIR $KMOD_ZUNICODE $KMOD_ZCOMMON \ - $KMOD_ZLUA $KMOD_ZZSTD $KMOD_ICP $KMOD_ZFS; do + for KMOD in $KMOD_SPL $KMOD_ZFS; do NAME="${KMOD##*/}" NAME="${NAME%.ko}" @@ -106,7 +94,7 @@ check_modules_linux() { done if [ -n "$LOADED_MODULES" ]; then - printf "Unload the kernel modules by running '%s -u':\n" "$PROG" + printf "Unload the kernel modules by running '%s -u':\n" "$0" printf "%b" "$LOADED_MODULES" exit 1 fi @@ -123,10 +111,11 @@ check_modules_linux() { load_module_linux() { KMOD=$1 - FILE=$(modinfo "$KMOD" | awk '/^filename:/ {print $2}') - VERSION=$(modinfo "$KMOD" | awk '/^version:/ {print $2}') + FILE=$(modinfo "$KMOD" 2>&1 | awk 'NR == 1 && /zlib/ && /not found/ {print "(builtin)"; exit} /^filename:/ {print $2}') + [ "$FILE" = "(builtin)" ] && return if [ "$VERBOSE" = "yes" ]; then + VERSION=$(modinfo "$KMOD" | awk '/^version:/ {print $2}') echo "Loading: $FILE ($VERSION)" fi @@ -151,17 +140,7 @@ load_modules_freebsd() { load_modules_linux() { mkdir -p /etc/zfs - if modinfo "$KMOD_ZLIB_DEFLATE" >/dev/null 2>&1; then - modprobe "$KMOD_ZLIB_DEFLATE" >/dev/null 2>&1 - fi - - if modinfo "$KMOD_ZLIB_INFLATE">/dev/null 2>&1; then - modprobe "$KMOD_ZLIB_INFLATE" >/dev/null 2>&1 - fi - - for KMOD in $KMOD_SPL $KMOD_ZAVL $KMOD_ZNVPAIR \ - $KMOD_ZUNICODE $KMOD_ZCOMMON $KMOD_ZLUA $KMOD_ZZSTD \ - $KMOD_ICP $KMOD_ZFS; do + for KMOD in "$KMOD_ZLIB_DEFLATE" "$KMOD_ZLIB_INFLATE" $KMOD_SPL $KMOD_ZFS; do load_module_linux "$KMOD" || return 1 done @@ -172,23 +151,6 @@ load_modules_linux() { return 0 } -unload_module_linux() { - KMOD=$1 - - NAME="${KMOD##*/}" - NAME="${NAME%.ko}" - FILE=$(modinfo "$KMOD" | awk '/^filename:/ {print $2}') - VERSION=$(modinfo "$KMOD" | awk '/^version:/ {print $2}') - - if [ "$VERBOSE" = "yes" ]; then - echo "Unloading: $KMOD ($VERSION)" - fi - - rmmod "$NAME" || echo "Failed to unload $NAME" - - return 0 -} - unload_modules_freebsd() { kldunload "$KMOD_FREEBSD" || echo "Failed to unload $KMOD_FREEBSD" @@ -200,33 +162,16 @@ unload_modules_freebsd() { } unload_modules_linux() { - for KMOD in $KMOD_ZFS $KMOD_ICP $KMOD_ZZSTD $KMOD_ZLUA $KMOD_ZCOMMON \ - $KMOD_ZUNICODE $KMOD_ZNVPAIR $KMOD_ZAVL $KMOD_SPL; do + legacy_kmods="icp zzstd zlua zcommon zunicode znvpair zavl" + for KMOD in "$KMOD_ZFS" $legacy_kmods "$KMOD_SPL"; do NAME="${KMOD##*/}" NAME="${NAME%.ko}" - USE_COUNT=$(lsmod | awk '/^'"${NAME}"'/ {print $3}') - - if [ "$USE_COUNT" = "0" ] ; then - unload_module_linux "$KMOD" || return 1 - elif [ "$USE_COUNT" != "" ] ; then - echo "Module ${NAME} is still in use!" - return 1 - fi + ! [ -d "/sys/module/$NAME" ] || $DELMOD "$NAME" || return done - if modinfo "$KMOD_ZLIB_DEFLATE" >/dev/null 2>&1; then - modprobe -r "$KMOD_ZLIB_DEFLATE" >/dev/null 2>&1 - fi - - if modinfo "$KMOD_ZLIB_INFLATE">/dev/null 2>&1; then - modprobe -r "$KMOD_ZLIB_INFLATE" >/dev/null 2>&1 - fi - if [ "$VERBOSE" = "yes" ]; then echo "Successfully unloaded ZFS module stack" fi - - return 0 } stack_clear_linux() { @@ -245,8 +190,7 @@ stack_check_linux() { STACK_LIMIT=15362 if [ -e "$STACK_MAX_SIZE" ]; then - STACK_SIZE=$(cat "$STACK_MAX_SIZE") - + read -r STACK_SIZE <"$STACK_MAX_SIZE" if [ "$STACK_SIZE" -ge "$STACK_LIMIT" ]; then echo echo "Warning: max stack size $STACK_SIZE bytes" @@ -267,15 +211,15 @@ if [ "$UNLOAD" = "yes" ]; then umount -t zfs -a case $UNAME in FreeBSD) - unload_modules_freebsd + unload_modules_freebsd ;; Linux) - stack_check_linux - unload_modules_linux + stack_check_linux + unload_modules_linux ;; *) - echo "unknown system: $UNAME" >&2 - exit 1 + echo "unknown system: $UNAME" >&2 + exit 1 ;; esac fi @@ -287,13 +231,13 @@ if [ "$LOAD" = "yes" ]; then Linux) stack_clear_linux check_modules_linux - load_modules_linux "$@" + load_modules_linux udevadm trigger udevadm settle ;; *) - echo "unknown system: $UNAME" >&2 - exit 1 + echo "unknown system: $UNAME" >&2 + exit 1 ;; esac fi diff --git a/scripts/zfs2zol-patch.sed b/scripts/zfs2zol-patch.sed index 99824d6dd4af..2d744cd5de52 100755 --- a/scripts/zfs2zol-patch.sed +++ b/scripts/zfs2zol-patch.sed @@ -19,7 +19,7 @@ s:usr/src/test/zfs-tests/runfiles:tests/runfiles:g s:usr/src/test/zfs-tests/tests/functional:tests/zfs-tests/tests/functional:g s:usr/src/test/zfs-tests/tests/perf:tests/zfs-tests/tests/perf:g s:usr/src/test/test-runner/cmd/run.py:tests/test-runner/cmd/test-runner.py:g -s/usr\/src\/common\/zfs\/\(.*\)\.c/module\/zcommon\/\1.c/g +s:usr/src/common/zfs/\(.*\)\.c:module/zcommon/\1.c:g # crypto framework s:usr/src/common/crypto:module/icp/algs:g diff --git a/tests/runfiles/common.run b/tests/runfiles/common.run index 32a9688dcc96..5f6880b25198 100644 --- a/tests/runfiles/common.run +++ b/tests/runfiles/common.run @@ -490,6 +490,7 @@ tags = ['functional', 'cli_root', 'zpool_split'] [tests/functional/cli_root/zpool_status] tests = ['zpool_status_001_pos', 'zpool_status_002_pos', + 'zpool_status_003_pos', 'zpool_status_004_pos', 'zpool_status_features_001_pos'] tags = ['functional', 'cli_root', 'zpool_status'] diff --git a/tests/runfiles/linux.run b/tests/runfiles/linux.run index 8412a7ea5a95..3985da146044 100644 --- a/tests/runfiles/linux.run +++ b/tests/runfiles/linux.run @@ -90,7 +90,7 @@ tests = ['events_001_pos', 'events_002_pos', 'zed_rc_filter', 'zed_fd_spill'] tags = ['functional', 'events'] [tests/functional/fallocate:Linux] -tests = ['fallocate_prealloc'] +tests = ['fallocate_prealloc', 'fallocate_zero-range'] tags = ['functional', 'fallocate'] [tests/functional/fault:Linux] diff --git a/tests/zfs-tests/cmd/file/file_write.c b/tests/zfs-tests/cmd/file/file_write.c index 61101b7b0653..8791c151f66f 100644 --- a/tests/zfs-tests/cmd/file/file_write.c +++ b/tests/zfs-tests/cmd/file/file_write.c @@ -251,7 +251,7 @@ usage(char *prog) "\t[-s offset] [-c write_count] [-d data]\n\n" "Where [data] equal to zero causes chars " "0->%d to be repeated throughout, or [data]\n" - "equal to 'R' for psudorandom data.\n", + "equal to 'R' for pseudorandom data.\n", prog, DATA_RANGE); exit(1); diff --git a/tests/zfs-tests/include/libtest.shlib b/tests/zfs-tests/include/libtest.shlib index bf8e2e1a41c7..7ad58ff26d33 100644 --- a/tests/zfs-tests/include/libtest.shlib +++ b/tests/zfs-tests/include/libtest.shlib @@ -3705,7 +3705,6 @@ function set_tunable_impl typeset name="$1" typeset value="$2" typeset mdb_cmd="$3" - typeset module="${4:-zfs}" eval "typeset tunable=\$$name" case "$tunable" in @@ -3724,14 +3723,13 @@ function set_tunable_impl case "$UNAME" in Linux) - typeset zfs_tunables="/sys/module/$module/parameters" + typeset zfs_tunables="/sys/module/zfs/parameters" echo "$value" >"$zfs_tunables/$tunable" ;; FreeBSD) sysctl vfs.zfs.$tunable=$value ;; SunOS) - [[ "$module" -eq "zfs" ]] || return 1 echo "${tunable}/${mdb_cmd}0t${value}" | mdb -kw ;; esac @@ -4045,6 +4043,22 @@ function punch_hole # offset length file esac } +function zero_range # offset length file +{ + typeset offset=$1 + typeset length=$2 + typeset file=$3 + + case "$UNAME" in + Linux) + fallocate --zero-range --offset $offset --length $length "$file" + ;; + *) + false + ;; + esac +} + # # Wait for the specified arcstat to reach non-zero quiescence. # If echo is 1 echo the value after reaching quiescence, otherwise diff --git a/tests/zfs-tests/tests/functional/cli_root/zfs_rename/zfs_rename_014_neg.ksh b/tests/zfs-tests/tests/functional/cli_root/zfs_rename/zfs_rename_014_neg.ksh index 1c962608d784..57bae24277e1 100755 --- a/tests/zfs-tests/tests/functional/cli_root/zfs_rename/zfs_rename_014_neg.ksh +++ b/tests/zfs-tests/tests/functional/cli_root/zfs_rename/zfs_rename_014_neg.ksh @@ -81,7 +81,7 @@ function nesting_cleanup # before resetting it, it will be left at the modified # value for the remaining tests. That's the reason # we reset it again here just in case. - log_must set_tunable_impl MAX_DATASET_NESTING 50 Z zcommon + log_must set_tunable64 MAX_DATASET_NESTING 50 Z } log_onexit nesting_cleanup @@ -93,13 +93,13 @@ log_must zfs create -p $TESTPOOL/$dsC16 log_mustnot zfs rename $TESTPOOL/$dsA02 $TESTPOOL/$dsB15A # extend limit -log_must set_tunable_impl MAX_DATASET_NESTING 64 Z zcommon +log_must set_tunable64 MAX_DATASET_NESTING 64 Z log_mustnot zfs rename $TESTPOOL/$dsA02 $TESTPOOL/$dsB16A log_must zfs rename $TESTPOOL/$dsA02 $TESTPOOL/$dsB15A # bring back old limit -log_must set_tunable_impl MAX_DATASET_NESTING 50 Z zcommon +log_must set_tunable64 MAX_DATASET_NESTING 50 Z log_mustnot zfs rename $TESTPOOL/$dsC01 $TESTPOOL/$dsB15A47C log_must zfs rename $TESTPOOL/$dsB15A47A $TESTPOOL/$dsB15A47B diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_get/zpool_get.cfg b/tests/zfs-tests/tests/functional/cli_root/zpool_get/zpool_get.cfg index 9db993a7e82e..34c74b49190c 100644 --- a/tests/zfs-tests/tests/functional/cli_root/zpool_get/zpool_get.cfg +++ b/tests/zfs-tests/tests/functional/cli_root/zpool_get/zpool_get.cfg @@ -108,5 +108,6 @@ if is_linux || is_freebsd; then "feature@livelist" "feature@zstd_compress" "feature@zilsaxattr" + "feature@head_errlog" ) fi diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_status/Makefile.am b/tests/zfs-tests/tests/functional/cli_root/zpool_status/Makefile.am index 5553061c67b3..538c5d6c163c 100644 --- a/tests/zfs-tests/tests/functional/cli_root/zpool_status/Makefile.am +++ b/tests/zfs-tests/tests/functional/cli_root/zpool_status/Makefile.am @@ -4,4 +4,6 @@ dist_pkgdata_SCRIPTS = \ cleanup.ksh \ zpool_status_001_pos.ksh \ zpool_status_002_pos.ksh \ + zpool_status_003_pos.ksh \ + zpool_status_004_pos.ksh \ zpool_status_features_001_pos.ksh diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_003_pos.ksh b/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_003_pos.ksh new file mode 100755 index 000000000000..e0c2ed669e32 --- /dev/null +++ b/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_003_pos.ksh @@ -0,0 +1,70 @@ +#!/bin/ksh -p +# +# CDDL HEADER START +# +# The contents of this file are subject to the terms of the +# Common Development and Distribution License (the "License"). +# You may not use this file except in compliance with the License. +# +# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +# or http://www.opensolaris.org/os/licensing. +# See the License for the specific language governing permissions +# and limitations under the License. +# +# When distributing Covered Code, include this CDDL HEADER in each +# file and include the License file at usr/src/OPENSOLARIS.LICENSE. +# If applicable, add the following below this CDDL HEADER, with the +# fields enclosed by brackets "[]" replaced with your own identifying +# information: Portions Copyright [yyyy] [name of copyright owner] +# +# CDDL HEADER END +# + +# +# Copyright (c) 2019, Delphix. All rights reserved. +# Copyright (c) 2021, George Amanakis. All rights reserved. +# + +. $STF_SUITE/include/libtest.shlib + +# +# DESCRIPTION: +# Verify correct output with 'zpool status -v' after corrupting a file +# +# STRATEGY: +# 1. Create a pool and a file +# 2. zinject checksum errors +# 3. Read the file +# 4. Take a snapshot and make a clone +# 5. Verify we see "snapshot, clone and filesystem" output in 'zpool status -v' + +function cleanup +{ + log_must zinject -c all + datasetexists $TESTPOOL2 && log_must zpool destroy $TESTPOOL2 + rm -f $TESTDIR/vdev_a +} + +verify_runnable "both" + +log_assert "Verify correct 'zpool status -v' output with a corrupted file" +log_onexit cleanup + +truncate -s $MINVDEVSIZE $TESTDIR/vdev_a +log_must zpool create -f $TESTPOOL2 $TESTDIR/vdev_a + +log_must fio --rw=write --name=job --size=10M --filename=/$TESTPOOL2/10m_file +log_must zinject -t data -e checksum -f 100 -am /$TESTPOOL2/10m_file + +# Try to read the 2nd megabyte of 10m_file +dd if=/$TESTPOOL2/10m_file bs=1M || true + +log_must zfs snapshot $TESTPOOL2@snap +log_must zfs clone $TESTPOOL2@snap $TESTPOOL2/clone + +# Look to see that snapshot, clone and filesystem our files report errors +log_must eval "zpool status -v | grep '$TESTPOOL2@snap:/10m_file'" +log_must eval "zpool status -v | grep '$TESTPOOL2/clone/10m_file'" +log_must eval "zpool status -v | grep '$TESTPOOL2/10m_file'" + +log_pass "'zpool status -v' outputs affected filesystem, snapshot & clone" diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_004_pos.ksh b/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_004_pos.ksh new file mode 100755 index 000000000000..6d8571950eec --- /dev/null +++ b/tests/zfs-tests/tests/functional/cli_root/zpool_status/zpool_status_004_pos.ksh @@ -0,0 +1,81 @@ +#!/bin/ksh -p +# +# CDDL HEADER START +# +# The contents of this file are subject to the terms of the +# Common Development and Distribution License (the "License"). +# You may not use this file except in compliance with the License. +# +# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +# or http://www.opensolaris.org/os/licensing. +# See the License for the specific language governing permissions +# and limitations under the License. +# +# When distributing Covered Code, include this CDDL HEADER in each +# file and include the License file at usr/src/OPENSOLARIS.LICENSE. +# If applicable, add the following below this CDDL HEADER, with the +# fields enclosed by brackets "[]" replaced with your own identifying +# information: Portions Copyright [yyyy] [name of copyright owner] +# +# CDDL HEADER END +# + +# +# Copyright (c) 2019, by Delphix. All rights reserved. +# Copyright (c) 2021, George Amanakis. All rights reserved. +# + +. $STF_SUITE/include/libtest.shlib + +# +# DESCRIPTION: +# Verify feature@head_errlog=disabled works. +# +# STRATEGY: +# 1. Create a pool with feature@head_errlog=disabled and a file +# 2. zinject checksum errors +# 3. Read the file +# 4. Take a snapshot and make a clone +# 5. Verify that zpool status displays the old behaviour. + +function cleanup +{ + log_must zinject -c all + datasetexists $TESTPOOL2 && log_must zpool destroy $TESTPOOL2 + rm -f $TESTDIR/vdev_a +} + +verify_runnable "both" + +log_assert "Verify 'zpool status -v' with feature@head_errlog=disabled works" +log_onexit cleanup + +truncate -s $MINVDEVSIZE $TESTDIR/vdev_a +log_must zpool create -f -o feature@head_errlog=disabled $TESTPOOL2 $TESTDIR/vdev_a + +state=$(zpool list -Ho feature@head_errlog $TESTPOOL2) +if [[ "$state" != "disabled" ]]; then + log_fail "head_errlog has state $state" +fi + +log_must fio --rw=write --name=job --size=10M --filename=/$TESTPOOL2/10m_file +log_must zinject -t data -e checksum -f 100 -am /$TESTPOOL2/10m_file + +# Try to read the file +dd if=/$TESTPOOL2/10m_file bs=1M || true + +log_must zfs snapshot $TESTPOOL2@snap +log_must zfs clone $TESTPOOL2@snap $TESTPOOL2/clone + +# Check that snapshot and clone do not report the error. +log_mustnot eval "zpool status -v | grep '$TESTPOOL2@snap:/10m_file'" +log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clone/10m_file'" +log_must eval "zpool status -v | grep '$TESTPOOL2/10m_file'" + +# Check that enabling the feature reports the error properly. +log_must zpool set feature@head_errlog=enabled $TESTPOOL2 +log_must eval "zpool status -v | grep '$TESTPOOL2@snap:/10m_file'" +log_must eval "zpool status -v | grep '$TESTPOOL2/clone/10m_file'" +log_must eval "zpool status -v | grep '$TESTPOOL2/10m_file'" + +log_pass "'zpool status -v' with feature@head_errlog=disabled works" diff --git a/tests/zfs-tests/tests/functional/cli_user/misc/zfs_001_neg.ksh b/tests/zfs-tests/tests/functional/cli_user/misc/zfs_001_neg.ksh index 9188e4ba6350..ec91ded976ff 100755 --- a/tests/zfs-tests/tests/functional/cli_user/misc/zfs_001_neg.ksh +++ b/tests/zfs-tests/tests/functional/cli_user/misc/zfs_001_neg.ksh @@ -55,6 +55,6 @@ TEMPFILE="$TEST_BASE_DIR/zfs_001_neg.$$.txt" zfs > $TEMPFILE 2>&1 log_must grep "usage: zfs command args" "$TEMPFILE" -log_must awk '{if (length($0) > 80) exit 1}' $TEMPFILE +log_must awk 'length($0) > 80 {print; ++err} END {exit err}' $TEMPFILE log_pass "zfs shows a usage message when run as a user" diff --git a/tests/zfs-tests/tests/functional/fallocate/Makefile.am b/tests/zfs-tests/tests/functional/fallocate/Makefile.am index 5ff366d2482c..86364d7895dd 100644 --- a/tests/zfs-tests/tests/functional/fallocate/Makefile.am +++ b/tests/zfs-tests/tests/functional/fallocate/Makefile.am @@ -3,4 +3,5 @@ dist_pkgdata_SCRIPTS = \ setup.ksh \ cleanup.ksh \ fallocate_prealloc.ksh \ - fallocate_punch-hole.ksh + fallocate_punch-hole.ksh \ + fallocate_zero-range.ksh diff --git a/tests/zfs-tests/tests/functional/fallocate/fallocate_punch-hole.ksh b/tests/zfs-tests/tests/functional/fallocate/fallocate_punch-hole.ksh index ed83561bd556..92f4552f5bd7 100755 --- a/tests/zfs-tests/tests/functional/fallocate/fallocate_punch-hole.ksh +++ b/tests/zfs-tests/tests/functional/fallocate/fallocate_punch-hole.ksh @@ -60,13 +60,17 @@ function cleanup [[ -e $TESTDIR ]] && log_must rm -f $FILE } -function check_disk_size +function check_reported_size { typeset expected_size=$1 - disk_size=$(du $TESTDIR/file | awk '{print $1}') - if [ $disk_size -ne $expected_size ]; then - log_fail "Incorrect size: $disk_size != $expected_size" + if ! [ -e "${FILE}" ]; then + log_fail "$FILE does not exist" + fi + + reported_size=$(du "${FILE}" | awk '{print $1}') + if [ "$reported_size" != "$expected_size" ]; then + log_fail "Incorrect reported size: $reported_size != $expected_size" fi } @@ -74,9 +78,9 @@ function check_apparent_size { typeset expected_size=$1 - apparent_size=$(stat_size) - if [ $apparent_size -ne $expected_size ]; then - log_fail "Incorrect size: $apparent_size != $expected_size" + apparent_size=$(stat_size "${FILE}") + if [ "$apparent_size" != "$expected_size" ]; then + log_fail "Incorrect apparent size: $apparent_size != $expected_size" fi } @@ -86,25 +90,30 @@ log_onexit cleanup # Create a dense file and check it is the correct size. log_must file_write -o create -f $FILE -b $BLKSZ -c 8 -log_must check_disk_size $((131072 * 8)) +sync_pool $TESTPOOL +log_must check_reported_size 1027 # Punch a hole for the first full block. log_must punch_hole 0 $BLKSZ $FILE -log_must check_disk_size $((131072 * 7)) +sync_pool $TESTPOOL +log_must check_reported_size 899 # Partially punch a hole in the second block. log_must punch_hole $BLKSZ $((BLKSZ / 2)) $FILE -log_must check_disk_size $((131072 * 7)) +sync_pool $TESTPOOL +log_must check_reported_size 899 -# Punch a hole which overlaps the third and forth block. +# Punch a hole which overlaps the third and fourth block. log_must punch_hole $(((BLKSZ * 2) + (BLKSZ / 2))) $((BLKSZ)) $FILE -log_must check_disk_size $((131072 * 7)) +sync_pool $TESTPOOL +log_must check_reported_size 899 # Punch a hole from the fifth block past the end of file. The apparent # file size should not change since --keep-size is implied. apparent_size=$(stat_size $FILE) log_must punch_hole $((BLKSZ * 4)) $((BLKSZ * 10)) $FILE -log_must check_disk_size $((131072 * 4)) +sync_pool $TESTPOOL +log_must check_reported_size 387 log_must check_apparent_size $apparent_size log_pass "Ensure holes can be punched in files making them sparse" diff --git a/tests/zfs-tests/tests/functional/fallocate/fallocate_zero-range.ksh b/tests/zfs-tests/tests/functional/fallocate/fallocate_zero-range.ksh new file mode 100755 index 000000000000..e907b0f5d4c4 --- /dev/null +++ b/tests/zfs-tests/tests/functional/fallocate/fallocate_zero-range.ksh @@ -0,0 +1,119 @@ +#!/bin/ksh -p +# +# CDDL HEADER START +# +# The contents of this file are subject to the terms of the +# Common Development and Distribution License (the "License"). +# You may not use this file except in compliance with the License. +# +# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +# or http://www.opensolaris.org/os/licensing. +# See the License for the specific language governing permissions +# and limitations under the License. +# +# When distributing Covered Code, include this CDDL HEADER in each +# file and include the License file at usr/src/OPENSOLARIS.LICENSE. +# If applicable, add the following below this CDDL HEADER, with the +# fields enclosed by brackets "[]" replaced with your own identifying +# information: Portions Copyright [yyyy] [name of copyright owner] +# +# CDDL HEADER END +# + +# +# Copyright (c) 2020 by Lawrence Livermore National Security, LLC. +# Copyright (c) 2021 by The FreeBSD Foundation. +# + +. $STF_SUITE/include/libtest.shlib + +# +# DESCRIPTION: +# Test FALLOC_FL_ZERO_RANGE functionality +# +# STRATEGY: +# 1. Create a dense file +# 2. Zero various ranges in the file and verify the result. +# + +verify_runnable "global" + +if is_freebsd; then + log_unsupported "FreeBSD does not implement an analogue to ZERO_RANGE." +fi + +FILE=$TESTDIR/$TESTFILE0 +BLKSZ=$(get_prop recordsize $TESTPOOL) + +function cleanup +{ + [[ -e $TESTDIR ]] && log_must rm -f $FILE +} + +# Helpfully, this function expects kilobytes, and check_apparent_size expects bytes. +function check_reported_size +{ + typeset expected_size=$1 + + if ! [ -e "${FILE}" ]; then + log_fail "$FILE does not exist" + fi + + reported_size=$(du "${FILE}" | awk '{print $1}') + if [ "$reported_size" != "$expected_size" ]; then + log_fail "Incorrect reported size: $reported_size != $expected_size" + fi +} + +function check_apparent_size +{ + typeset expected_size=$1 + + apparent_size=$(stat_size "${FILE}") + if [ "$apparent_size" != "$expected_size" ]; then + log_fail "Incorrect apparent size: $apparent_size != $expected_size" + fi +} + +log_assert "Ensure ranges can be zeroed in files" + +log_onexit cleanup + +# Create a dense file and check it is the correct size. +log_must file_write -o create -f $FILE -b $BLKSZ -c 8 +sync_pool $TESTPOOL +log_must check_reported_size 1027 + +# Zero a range covering the first full block. +log_must zero_range 0 $BLKSZ $FILE +sync_pool $TESTPOOL +log_must check_reported_size 899 + +# Partially zero a range in the second block. +log_must zero_range $BLKSZ $((BLKSZ / 2)) $FILE +sync_pool $TESTPOOL +log_must check_reported_size 899 + +# Zero range which overlaps the third and fourth block. +log_must zero_range $(((BLKSZ * 2) + (BLKSZ / 2))) $((BLKSZ)) $FILE +sync_pool $TESTPOOL +log_must check_reported_size 899 + +# Zero range from the fifth block past the end of file, with --keep-size. +# The apparent file size must not change, since we did specify --keep-size. +apparent_size=$(stat_size $FILE) +log_must fallocate --keep-size --zero-range --offset $((BLKSZ * 4)) --length $((BLKSZ * 10)) "$FILE" +sync_pool $TESTPOOL +log_must check_reported_size 387 +log_must check_apparent_size $apparent_size + +# Zero range from the fifth block past the end of file. The apparent +# file size should change since --keep-size is not implied, unlike +# with PUNCH_HOLE. +apparent_size=$(stat_size $FILE) +log_must zero_range $((BLKSZ * 4)) $((BLKSZ * 10)) $FILE +sync_pool $TESTPOOL +log_must check_reported_size 387 +log_must check_apparent_size $((BLKSZ * 14)) + +log_pass "Ensure ranges can be zeroed in files" diff --git a/tests/zfs-tests/tests/functional/fallocate/setup.ksh b/tests/zfs-tests/tests/functional/fallocate/setup.ksh index 32334d396865..586ac026aa43 100755 --- a/tests/zfs-tests/tests/functional/fallocate/setup.ksh +++ b/tests/zfs-tests/tests/functional/fallocate/setup.ksh @@ -26,4 +26,7 @@ . $STF_SUITE/include/libtest.shlib DISK=${DISKS%% *} -default_setup $DISK +default_setup_noexit $DISK +log_must zfs set compression=off $TESTPOOL +log_pass + diff --git a/tests/zfs-tests/tests/functional/simd/simd_supported.ksh b/tests/zfs-tests/tests/functional/simd/simd_supported.ksh index 8b45e51bc257..1c89824e02fd 100755 --- a/tests/zfs-tests/tests/functional/simd/simd_supported.ksh +++ b/tests/zfs-tests/tests/functional/simd/simd_supported.ksh @@ -32,7 +32,7 @@ # # STRATEGY: # 1. Test if we are running on a Linux x86 system with SSE support -# 2. If so, check if the zfs_fletcher_4_impl module parameter contains +# 2. If so, check if the zfs_fletcher_4_impl module parameter contains # a sse implementation # 3. If not fail the test, otherwise pass it @@ -44,7 +44,7 @@ fi case "$(uname -m)" in i?86|x86_64) - typeset -R modparam="/sys/module/zcommon/parameters/zfs_fletcher_4_impl" + typeset -R modparam="/sys/module/zfs/parameters/zfs_fletcher_4_impl" if awk '/^flags/ {exit !/sse/}' /proc/cpuinfo; then log_must grep -q sse "$modparam" log_pass "SIMD instructions supported"