Fixes for spurious failures of resilver_restart_001 test #1

jwpoduska · 2019-12-09T12:55:45Z

Signed-off-by: John Poduska [email protected]
Closes openzfs#9677

Motivation and Context

This PR fixes spurious failures to a new test introduced by openzfs#9588, as reported in openzfs#9677.

Description

The resilver restart test was reported as failing about 2% of the time.

Issues found:

The event log wasn't large enough, so resilver events were missing
One 'zpool sync' wasn't enough for resilver to start after zinject
Comment capitalization inconsistencies we fixed as well

How Has This Been Tested?

The test has been run by itself hundreds of times without failure.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

Currently the best way to wait for the completion of a long-running operation in a pool, like a scrub or device removal, is to poll 'zpool status' and parse its output, which is neither efficient nor convenient. This change adds a 'wait' subcommand to the zpool command. When invoked, 'zpool wait' will block until a specified type of background activity completes. Currently, this subcommand can wait for any of the following: - Scrubs or resilvers to complete - Devices to initialized - Devices to be replaced - Devices to be removed - Checkpoints to be discarded - Background freeing to complete For example, a scrub that is in progress could be waited for by running zpool wait -t scrub <pool> This also adds a -w flag to the attach, checkpoint, initialize, replace, remove, and scrub subcommands. When used, this flag makes the operations kicked off by these subcommands synchronous instead of asynchronous. This functionality is implemented using a new ioctl. The type of activity to wait for is provided as input to the ioctl, and the ioctl blocks until all activity of that type has completed. An ioctl was used over other methods of kernel-userspace communiction primarily for the sake of portability. Porting Notes: This is ported from Delphix OS change DLPX-44432. The following changes were made while porting: - Added ZoL-style ioctl input declaration. - Reorganized error handling in zpool_initialize in libzfs to integrate better with changes made for TRIM support. - Fixed check for whether a checkpoint discard is in progress. Previously it also waited if the pool had a checkpoint, instead of just if a checkpoint was being discarded. - Exposed zfs_initialize_chunk_size as a ZoL-style tunable. - Updated more existing tests to make use of new 'zpool wait' functionality, tests that don't exist in Delphix OS. - Used existing ZoL tunable zfs_scan_suspend_progress, together with zinject, in place of a new tunable zfs_scan_max_blks_per_txg. - Added support for a non-integral interval argument to zpool wait. Future work: ZoL has support for trimming devices, which Delphix OS does not. In the future, 'zpool wait' could be extended to add the ability to wait for trim operations to complete. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John Gallagher <[email protected]> Closes openzfs#9162

openzfs#9321) Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. Unfortunately changing the evelator via usermodehelper requires reading some userland binaries, most notably modprobe(8) or sh(1), from a zfs dataset on systems with root-on-zfs. This can deadlock the system if used during the following call path because it may need, if the data is not already cached in the ARC, reading directly from disk while holding the spa config lock as a writer: zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() -> vdev_elevator_switch() -> call_usermodehelper() While the usermodehelper waits sh(1), modprobe(8) is blocked in the ZIO pipeline trying to read from disk: INFO: task modprobe:2650 blocked for more than 10 seconds. Tainted: P OE 5.2.14 modprobe D 0 2650 206 0x00000000 Call Trace: ? __schedule+0x244/0x5f0 schedule+0x2f/0xa0 cv_wait_common+0x156/0x290 [spl] ? do_wait_intr_irq+0xb0/0xb0 spa_config_enter+0x13b/0x1e0 [zfs] zio_vdev_io_start+0x51d/0x590 [zfs] ? tsd_get_by_thread+0x3b/0x80 [spl] zio_nowait+0x142/0x2f0 [zfs] arc_read+0xb2d/0x19d0 [zfs] ... zpl_iter_read+0xfa/0x170 [zfs] new_sync_read+0x124/0x1b0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x4f/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This commit changes how we use the usermodehelper functionality from synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents scrubs, and other vdev_elevator_switch() consumers, from triggering the aforementioned issue. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Issue openzfs#8664 Closes openzfs#9321

Currently, spa_keystore_change_key_sync_impl() does not recurse into clones when updating encryption roots for either a call to 'zfs promote' or 'zfs change-key'. This can cause children of these clones to end up in a state where they point to the wrong dataset as the encryption root. It can also trigger ASSERTs in some cases where the code checks reference counts on wrapping keys. This patch fixes this issue by ensuring that this function properly recurses into clones during processing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#9267 Closes openzfs#9294

Since 4f342e4 env(1) must be able to find a "python2" executable in the "constrained path" on systems configured with --with-python=2.x otherwise the ZFS Test Suite won't be able to use Python scripts. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: loli10K <[email protected]> Closes openzfs#9325

This commit fixes the following build failure detected on Debian9 (GCC 6.3.0): CC [M] module/zfs/spa.o module/zfs/spa.c: In function ‘spa_wait_common.part.31’: module/zfs/spa.c:9468:6: error: ‘in_progress’ may be used uninitialized in this function [-Werror=maybe-uninitialized] if (!in_progress || spa->spa_waiters_cancel || error) ^ cc1: all warnings being treated as errors Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: loli10K <[email protected]> Closes openzfs#9326

This commit fixes a NULL pointer dereference triggered in spa_vdev_remove_top_check() by trying to "zpool remove" an indirect vdev. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes openzfs#9327

The was incorrect with respect to swapping dataset IDs both in the on-disk ZAP object and the in-memory queue. In both cases, if ds1 was already present, then it would be first replaced with ds2 and then ds would be replaced back with ds1. Also, both cases did not properly handle a situation where both ds1 and ds2 are already queued. A duplicate insertion would be attempted and its failure would result in a panic. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes openzfs#9140 Closes openzfs#9163

Move the trailing newlines from the error message strings to the format strings to more closely match the other error messages. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9330

When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed by: Jerry Jelinek <[email protected]> Signed-off-by: Kody A Kantor <[email protected]> External-issue: illumos-joyent OS-7982 Closes openzfs#9299 Closes openzfs#9338

The difference between the sizes could be positive or negative. Leaving the types as unsigned means the result overflows when the difference is negative and removing the labs() means we'll have introduced a bug. The subtraction results in the correct value when the unsigned integer is interpreted as a signed integer by labs(). Clang doesn't see that we're doing a subtraction and abusing the types. It sees the result of the subtraction, an unsigned value, being passed to an absolute value function and emits a warning which we treat as an error. Reviewed by: Youzhong Yang <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9355

Trying to 'zfs diff' a snapshot with large dnodes will incorrectly try to access its interior slots when dnodesize > sizeof(dnode_phys_t). This is normally not an issue because the interior slots are zero-filled, which report_dnode() handles calling report_free_dnode_range(). However this is not the case for encrypted large dnodes or filesystem using many SA based xattrs where the extra data past the legacy dnode size boundary is interpreted as a dnode_phys_t. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: loli10K <[email protected]> Closes openzfs#7678 Closes openzfs#8931 Closes openzfs#9343

Refactor the zvol in to platform dependent and independent bits. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9295

Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. While well intentioned this introduced a bug which could cause a system hang, that issue was subsequently fixed by commit 2a0d418. In order to avoid future bugs in this area, and to simplify the code, this functionality is being deprecated. A console warning has been added to notify any existing consumers and the documentation updated accordingly. This option will remain for the lifetime of the 0.8.x series for compatibility but if planned to be phased out of master. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#8664 Closes openzfs#9317

When the xattr/cleanup.ksh script is unable to remove the test group due to an active process then it will not call default_cleanup. This will result in a zvol_ENOSPC/setup failure when attempting to create the /mnt/testdir directory which will already exist. Resolve the issue by performing the default_cleanup before removing the test user and group to ensure this step always happens. Also allow one more retry to further minimize the likelihood of the cleanup failing. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#9358

lot_must -> log_must Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Sara Hartse <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9362

Currently, the recv_fix_encryption_hierarchy() function accepts 'destsnap' as one of its parameters. Originally, this was intended to be the top-level dataset of a receive (whether or not the receive was recursive). Unfortunately, this parameter actually is simply the input that is passed in from the command line. When the user specifies 'zfs recv -d', this string is actually only the name of the receiving pool since the rest of the name is derived from the send stream. This causes the function to fail, leaving some datasets with an invalid encryption hierarchy. This patch resolves this problem by passing in the top_zfs variable instead. In order to make this work, this patch also includes some changes that ensure the value is always present when we need it. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes openzfs#9273 Closes openzfs#9309

Allow ZED notification via slack incoming webhook. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Ben McGough <[email protected]> Closes openzfs#9076 Closes openzfs#9350

Refactor the zfs ioctls in to platform dependent and independent bits. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Sean Eric Fagan <[email protected]> Signed-off-by: Matthew Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9301

Factor Linux specific functions out of the zpool command. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9333

We've seen cases where after creating a ZVOL, the ZVOL device node in "/dev" isn't generated after 20 seconds of waiting, which is the point at which our applications gives up on waiting and reports an error. The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the same time, based on a policy set by the user. This refresh operation will destroy the ZVOL, and re-create it based on a snapshot. When this occurs, we see many hundreds of entries on the "z_zvol" taskq (based on inspection of the /proc/spl/taskq-all file). Many of the entries on the taskq end up in the "zvol_remove_minors_impl" function, and I've measured the latency of that function: Function = zvol_remove_minors_impl msecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 1 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 1 | | 128 -> 255 : 45 |****************************************| 256 -> 511 : 5 |**** | That data is from a 10 second sample, using the BCC "funclatency" tool. As we can see, in this 10 second sample, most calls took 128ms at a minimum. Thus, some basic math tells us that in any 20 second interval, we could only process at most about 150 removals, which is much less than the 400+ that'll occur based on the workload. As a result of this, and since all ZVOL minor operations will go through the single threaded "z_zvol" taskq, the latency for creating a single ZVOL device can be unreasonably large due to other ZVOL activity on the system. In our case, it's large enough to cause the application to generate an error and fail the operation. When profiling the "zvol_remove_minors_impl" function, I saw that most of the time in the function was spent off-cpu, blocked in the function "taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl" will dispatch calls to "zvol_free" using the "system_taskq", and then the "taskq_wait_outstanding" function is used to wait for all of those dispatched calls to occur before "zvol_remove_minors_impl" will return. As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have to wait for all calls to "zvol_free" to occur before it returns. Thus, this change removes the call to "taskq_wait_oustanding", so that calls to "zvol_free" don't affect the latency of "zvol_remove_minors_impl". Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes openzfs#9380

Reduce the time required for ./configure to perform the needed KABI checks by allowing kbuild to compile multiple test cases in parallel. This was accomplished by splitting each test's source code from the logic handling whether that code could be compiled or not. By introducing this split it's possible to minimize the number of times kbuild needs to be invoked. As importantly, it means all of the tests can be built in parallel. This does require a little extra care since we expect some tests to fail, so the --keep-going (-k) option must be provided otherwise some tests may not get compiled. Furthermore, since a failure during the kbuild modpost phase will result in an early exit; the final linking phase is limited to tests which passed the initial compilation and produced an object file. Once everything has been built the configure script proceeds as previously. The only significant difference is that it now merely needs to test for the existence of a .ko file to determine the result of a given test. This vastly speeds up the entire process. New test cases should use ZFS_LINUX_TEST_SRC to declare their test source code and ZFS_LINUX_TEST_RESULT to check the result. All of the existing kernel-*.m4 files have been updated accordingly, see config/kernel-current-time.m4 for a basic example. The legacy ZFS_LINUX_TRY_COMPILE macro has been kept to handle special cases but it's use is not encouraged. master (secs) patched (secs) ------------- ---------------- autogen.sh 61 68 configure 137 24 (~17% of current run time) make -j $(nproc) 44 44 make rpms 287 150 Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#8547 Closes openzfs#9132 Closes openzfs#9341

Line 31 and 32 overwrote the ${root} variable which broke mount-zfs.sh We have create a new variable for the dataset instead of overwriting the ${root} variable in zfs-load-key.sh${root} variable in zfs-load-key.sh Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Garrett Fields <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Dacian Reece-Stremtan <[email protected]> Closes openzfs#8913 Closes openzfs#9379

Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9385

Make arc_stats visible to platform code. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9386

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9389

Factor Linux specific pieces out of libspl. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9336

If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as proposed in the ubuntu root on zfs upstream guide (https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS), we end up with a race where some services, like systemd-random-seed are writing under /var/lib, while zfs-mount is called. zfs mount will then potentially fail because of /var/lib isn't empty and so, can't be mounted. Order those 2 units for now (more may be needed) as we can't declare virtually a provide mount point to match "RequiresMountsFor=/var/lib/systemd/random-seed" from systemd-random-seed.service. The optional generator for zfs 0.8 fixes it, but it's not enabled by default nor necessarily required. Example: - rpool/ROOT/ubuntu (mountpoint = /) - rpool/var/ (mountpoint = /var) - rpool/var/lib (mountpoint = /var/lib) Both zfs-mount.service and systemd-random-seed.service are starting After=systemd-remount-fs.service. zfs-mount.service should be done before local-fs.target while systemd-random-seed.service should finish before sysinit.target (which is a later target). Ideally, we would have a way for zfs mount -a unit to declare all paths or move systemd-random-seed after local-fs.target. Reviewed-by: Antonio Russo <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Didier Roche <[email protected]> Closes openzfs#9360

Update cleanup_upgrade to use destroy_dataset and destroy_pool when performing cleanup. These wrappers retry if the pool is busy preventing occasional failures like those observed when running tests upgrade_readonly_pool. For example: SUCCESS: test enabled == enabled User accounting upgrade is not executed on readonly pool NOTE: Performing local cleanup via log_onexit (cleanup_upgrade) cannot destroy 'testpool': pool is busy ERROR: zpool destroy testpool exited 1 Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#9400

Factor Linux specific functionality out of libzutil. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9356

Factor Linux specific functionality out of libzfs. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Macy <[email protected]> Closes openzfs#9377

FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls and the struct file in the device structure is unused. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9625

Minimal compatibility changes for FreeBSD. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9631

fallocate(2) is a Linux-specific system call which in unavailable on other platforms. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9633

Include the required headers for FreeBSD. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9634

Adding the FreeBSD code allows arc_summary and arcstat to be used on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9641

KM_PUSHPAGE is an Illumosism - On FreeBSD it's aliased to the same malloc flag as KM_SLEEP. The compiler naturally rejects multiple case statements with the same value. This is effectively a no-op since all callers pass a specific KM_* flag. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9643

Linux and FreeBSD use different names for suid / setuid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9632

FreeBSD needs to cope with multiple version of the zfs_cmd_t structure. Allowing the platform code to pre and post process the cmd structure makes it possible to work with legacy tooling. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9624

Remove the specific gitignore rules for module left-overs and add a generic one in modules/. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes openzfs#9656

Moving qsort to the platform header allows each platform to provide an appropriate sorting implementation. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9663

The write_record() function is private and should be marked as such. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9665

The module_param_call() functionality is currently still Linux-specific and should be wrapped accordingly. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9666

There may be circumstances where it's desirable that all blocks in a specified dataset be stored on the special device. Relax the artificial 128K limit and allow the special_small_blocks property to be set up to 1M. When blocks >1MB have been enabled via the zfs_max_recordsize module option, this limit is increased accordingly. Reviewed-by: Don Brady <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#9131 Closes openzfs#9355

In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO to read data from the original storage device. Unfortunately pointer to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if some other read try to bump the ZIO priority, it will crash. The problem is reproducible by corrupting L2ARC content and reading some data with prefetch if l2arc_noprefetch tunable is changed to 0. With the default setting the issue is probably not reproducible now. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes openzfs#9648

Modify the Codecov settings to provide a more realistic and stable report. The following change were made: - Precision has been limited to whole percents only, but will round to nearest. This means 0.0-0.49 will round to zero (no change) and 0.51 will round to 1%. - Exclude the tests/zfs-tests directory from the report. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Kjeld Schouten-Lebbing <[email protected]> Closes openzfs#9650

In gcm_mode_decrypt_contiguous_blocks(), if vmem_alloc() fails, bcopy is called with a NULL pointer destination and a length > 0. This results in undefined behavior. Further ctx->gcm_pt_buf is freed but not set to NULL, leading to a potential write after free and a double free due to missing return value handling in crypto_update_uio(). The code as is may write to ctx->gcm_pt_buf in gcm_decrypt_final() and may free ctx->gcm_pt_buf again in aes_decrypt_atomic(). The fix is to slightly rework error handling and check the return value in crypto_update_uio(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes openzfs#9659

The checksum display code of zdb_read_block uses a zio to read in the block and then calls zio_checksum_compute. Use a new zio in the call to zio_checksum_compute not the zio from the read which has been destroyed by zio_wait. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes openzfs#9644 Closes openzfs#9657

- Moves compression algorithms for tests to properties.shlib - Removes all compression algorithms levels from general tests - Replaces on with lz4 for compression tests - Removes random algorithm selection, if not needed - Cleans copyright header formatting Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Michael Niewöhner <[email protected]> Signed-off-by: Kjeld Schouten-Lebbing <[email protected]> Closes openzfs#9645

- on Linux move Linux specific headers to zfs_context_os.h - on FreeBSD move FreeBSD specific definitions to zfs_context_os.h - remove duplicate tsd_ definitions - remove unused AT_TYPE Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9668

arc_summary3 reports L2ARC hits and misses as Bytes, whereas they should be reported as events. arc_summary2 reports these correctly. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#9669

Remove the ASSERTV macro and handle suppressing unused compiler warnings for variables only in ASSERTs using the __attribute__((unused)) compiler annotation. The annotation is understood by both gcc and clang. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9671

Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL wrapper, and split the common and platform specific portions. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9670

FreeBSD requires three additional ioctls, they are ZFS_IOC_NEXTBOOT, ZFS_IOC_JAIL, and ZFS_IOC_UNJAIL. These have been added after the Linux-specific ioctls. The range 0x80-0xFF has been reserved for future optional platform-specific ioctls. Any platform may choose to implement these as appropriate. None of the existing ioctl numbers have been changed to maintain compatibility. For Linux no vectors have been registered for the new ioctls and they are reported as unsupported. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9667

FreeBSD uses its own crypto framework in-kernel which, at this time, has no EDONR implementation. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9664

The resilver restart test was reported as failing about 2% of the time. Two issues were found: - The event log wasn't large enough, so resilver events were missing - One 'zpool sync' wasn't enough for resilver to start after zinject Signed-off-by: John Poduska <[email protected]> Closes openzfs#9677

After spa_vdev_remove_aux() is called, the config nvlist is no longer valid, as it's been replaced by the new one (with the specified device removed). Therefore any pointers into the nvlist are no longer valid. So we can't save the result of `fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the call to spa_vdev_remove_aux(). Instead, use spa_strdup() to save a copy of the string before calling spa_vdev_remove_aux. Found by AddressSanitizer: ERROR: AddressSanitizer: heap-use-after-free on address ... READ of size 34 at 0x608000a1fcd0 thread T686 #0 0x7fe88b0c166d (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d) #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447 openzfs#2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259 openzfs#3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 openzfs#4 0x55ffbc769fba in ztest_execute ztest.c:6714 openzfs#5 0x55ffbc779a90 in ztest_thread ztest.c:6761 openzfs#6 0x7fe889cbc6da in start_thread openzfs#7 0x7fe8899e588e in __clone 0x608000a1fcd0 is located 48 bytes inside of 88-byte region freed by thread T686 here: #0 0x7fe88b14e7b8 in __interceptor_free #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874 openzfs#2 0x7fe88ae543ba in nvpair_free nvpair.c:844 openzfs#3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978 openzfs#4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185 openzfs#5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221 openzfs#6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 openzfs#7 0x55ffbc769fba in ztest_execute ztest.c:6714 openzfs#8 0x55ffbc779a90 in ztest_thread ztest.c:6761 openzfs#9 0x7fe889cbc6da in start_thread Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#9706

`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 openzfs#2 0x7f5115231bf6 in __libc_start_main openzfs#3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 openzfs#2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Reviewed-by: George Wilson <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#12339

Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we leave zfsvfs != NULL. This will lead to execution of the error handling `if` statement at the `out` label, and hence to a call to dmu_objset_disown and zfsvfs_free. However, zfs_umount, which we call upon failure of zfs_root and d_make_root already does dmu_objset_disown and zfsvfs_free. I suppose this patch rather adds to the brittleness of this part of the code base, but I don't want to invest more time in this right now. To add a regression test, we'd need some kind of fault injection facility for zfs_root or d_make_root, which doesn't exist right now. And even then, I think that regression test would be too closely tied to the implementation. To repro the double-disown / double-free, do the following: 1. patch zfs_root to always return an error 2. mount a ZFS filesystem Here's the stack trace you would see then: VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000) PANIC at dsl_dataset.c:1003:dsl_dataset_disown() Showing stack for process 28332 CPU: 2 PID: 28332 Comm: zpool Tainted: G O 5.10.103-1.nutanix.el7.x86_64 #1 Call Trace: dump_stack+0x74/0x92 spl_dumpstack+0x29/0x2b [spl] spl_panic+0xd4/0xfc [spl] dsl_dataset_disown+0xe9/0x150 [zfs] dmu_objset_disown+0xd6/0x150 [zfs] zfs_domount+0x17b/0x4b0 [zfs] zpl_mount+0x174/0x220 [zfs] legacy_get_tree+0x2b/0x50 vfs_get_tree+0x2a/0xc0 path_mount+0x2fa/0xa70 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x38/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Co-authored-by: Christian Schwarz <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes openzfs#14025

jgallag88 and others added 30 commits September 13, 2019 18:09

OpenZFS restructuring - arc_stats

13a4027

Make arc_stats visible to platform code. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9386

Add inode accessors to common code

6360e27

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9389

OpenZFS restructuring - libspl

d31277a

Factor Linux specific pieces out of libspl. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9336

OpenZFS restructuring - libzutil

7c5eff9

Factor Linux specific functionality out of libzutil. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9356

mattmacy and others added 25 commits November 30, 2019 15:35

Add FreeBSD support to zio_crypto.h

77323bc

Minimal compatibility changes for FreeBSD. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9631

Make asm-x86_64/atomic.S build on FreeBSD

c54687c

Include the required headers for FreeBSD. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9634

Add FreeBSD code to arc_summary and arcstat

101f9b1

Adding the FreeBSD code allows arc_summary and arcstat to be used on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes openzfs#9641

Add FreeBSD required defines to mntent.h

42a826e

Linux and FreeBSD use different names for suid / setuid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9632

Mark write_record static

bff8fb3

The write_record() function is private and should be marked as such. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes openzfs#9665

jwpoduska closed this Dec 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for spurious failures of resilver_restart_001 test #1

Fixes for spurious failures of resilver_restart_001 test #1

jwpoduska commented Dec 9, 2019

Fixes for spurious failures of resilver_restart_001 test #1

Fixes for spurious failures of resilver_restart_001 test #1

Conversation

jwpoduska commented Dec 9, 2019

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist: