Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.2] cmd: add 'help' subcommand to zpool and zfs #15342

Closed
wants to merge 40 commits into from

Conversation

robn
Copy link
Member

@robn robn commented Oct 3, 2023

Description

2.2 backport of #15288.

How Has This Been Tested?

Build and quick functional check.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

amotin and others added 30 commits July 20, 2023 08:58
Starting approximately from version 1302506 vn_lock_pair() grown two
additional arguments following head.  There is a one week hole, but
that is closet reference point we have.

Reviewed-by: Mateusz Guzik <[email protected]>
Signed-off-by:  Alexander Motin <[email protected]>
Sponsored by:   iXsystems, Inc.
Closes openzfs#15047
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#15036
It seems 9c5167d "Project Quota on ZFS" missed to add prefetch
for DMU_PROJECTUSED_OBJECT during scan (scrub/resilver).  It should
not cause visible problems, but may affect scub/resilver performance.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15024
Since we are already iterating the ZAP, we have exact string key to
remove, we do not need to call zap_remove_int() with the int key we
just converted, we can call zap_remove() for the original string.

This should make no functional change, only a micro-optimization.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15056

(cherry picked from commit fdba8cb)
Unlike regular receive, raw receive require destination to have the
same block structure as the source.  In case of dnode reclaim this
triggers two special cases, requiring special handling:
 - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz()
should not dirty the data buffer if block size does not change, or
durign receive dbuf_dirty_lightweight() will trigger assertion.
 - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz()
would fail and receive_object would trigger assertion, so we should
destroy and recreate the dnode from scratch.

Reviewed-by: Paul Dagnelie <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15039

(cherry picked from commit c4e8742)
My analysis in PR openzfs#14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes openzfs#15049
As it turns out having autotrim default to 'on' on FreeBSD never really
worked due to mess with defines where userland and kernel module were
getting different default values (userland was defaulting to 'off',
module was thinking it's 'on').

Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Yuri Pankov <[email protected]>
Closes openzfs#15079
Make the version here match that elsewhere in the kernel and system
headers.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15058
Signed-off-by: Brian Behlendorf <[email protected]>
This change was introduced in Linux commit
7ba150834b840f6f5cdd07ca69a4ccf39df59a66

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15059
The disk_check_media_change() function was added which replaces
bdev_check_media_change.  This change was introduced in 6.5rc1
444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a
gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk
is now used to pass the expected data.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15060
Set ARC_FLAG_NO_BUF when prefetching data L1 buffers for scan.  We
do not prefetch data L0 buffers, so we do not need the L1 buffers,
only want them to be ready in ARC. This saves some CPU time on the
buffers decompression.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15029
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <[email protected]>
Signed-off-by: Wojciech Małota-Wójcik <[email protected]>
Closes openzfs#15025
Ashift can be set for a vdev only during its creation, and the
top-level vdev does not change when a vdev is attached or replaced.
The ashift property should not be used during attachment, as it
does not allow attaching/replacing a vdev if the pool's ashift
property is increased after the existing vdev was created. Instead,
we should be able to attach the vdev if the attached vdev can
satisfy the ashift requirement with its parent.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ameer Hamza <[email protected]>
Closes openzfs#15061
Check that vdev has valid zap and bail out early.

While here, move objid selection out of the loop, it's not going to
change.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Yuri Pankov <[email protected]>
Closes openzfs#15063
Since spa_min_alloc may not be a power of 2, unlike ashifts, in the
case of DRAID, we should not select the minimal value among several
vdevs. Rounding to a multiple of it is unlikely to work for other
vdevs. Instead, using the greatest common divisor produces smaller
yet more reasonable results.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ameer Hamza <[email protected]>
Closes openzfs#15067
We cannot call zpl_enter in zpl_test_super, because zpl_test_super is
under spinlock so we can't sleep, and also because zpl_test_super is
called without sb->s_umount taken, so it's possible we would race with
zfs_umount and call zpl_enter on freed zfsvfs.

Here's an stack trace when this happens:
[ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast()
[ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.115012] Call Trace:
[ 2379.115019]  dump_stack+0x74/0x96
[ 2379.115024]  panic+0x114/0x2f6
[ 2379.115035]  spl_panic+0xcf/0xfc [spl]
[ 2379.115477]  __cv_broadcast+0x68/0xa0 [spl]
[ 2379.115585]  rrw_exit+0xb8/0x310 [zfs]
[ 2379.115696]  rrm_exit+0x4a/0x80 [zfs]
[ 2379.115808]  zpl_test_super+0xa9/0xd0 [zfs]
[ 2379.115920]  sget+0xd1/0x230
[ 2379.116033]  zpl_mount+0xdc/0x230 [zfs]
[ 2379.116037]  legacy_get_tree+0x28/0x50
[ 2379.116039]  vfs_get_tree+0x27/0xc0
[ 2379.116045]  path_mount+0x2fe/0xa70
[ 2379.116048]  do_mount+0x80/0xa0
[ 2379.116050]  __x64_sys_mount+0x8b/0xe0
[ 2379.116052]  do_syscall_64+0x35/0x50
[ 2379.116054]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 2379.116057] RIP: 0033:0x7f9912e8b26a

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#15077
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15084
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue openzfs#14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes openzfs#15052
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes openzfs#14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes openzfs#14717
Closes openzfs#15052
To simplify error handling bpobj_iterate_blkptrs() iterates through
the list of block pointers backwards.  Unfortunately speculative
prefetcher is currently unable to detect such patterns, that makes
each block read there synchronous and very slow on HDD pools.

According to my tests, added explicit prefetch reduces time needed
to asynchronously delete 8 snapshots of 4 million blocks each from
20 seconds to less than one, that should free sync thread for other
useful work, such as async writes, scrub, etc.

While there, plug one memory leak in case of bpobj_open() error and
harmonize some variable names.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15071
- Reduce maximum prefetch distance for 32bit platforms to 8MB as it
was previously.  Those systems didn't grow much probably, so better
stay conservative there.
 - Retire array_rd_sz tunable, blocking prefetch for large requests.
We should not penalize applications trying to be more efficient. The
speculative prefetcher by itself has reasonable distance limits, and
1MB is not much at all these days.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15072
metaslab_force_ganging isn't enough to actually force ganging, because
it still only forces 3% of the time. This adds
metaslab_force_ganging_pct so we can configure how often to force
ganging.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#15088
This new check in 0.9.0 appears to have some issues with various forms
of "early return", like trap, exit and return. This is tripping up (at
least):

  cmd/zed/zed.d/history_event-zfs-list-cacher.sh
  /etc/zfs/zfs-functions

Its not obvious what its complaining about or what the remedy is, so it
seems sensible to disable this check for now.

See also:

  https://www.shellcheck.net/wiki/SC2317
  koalaman/shellcheck#2542
  koalaman/shellcheck#2613

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15089
Signed-off-by: Brian Behlendorf <[email protected]>
This commit updates changelog for native Debian packages for
OpenZFS 2.2.0 release.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#15104
When we have some LWBs closed and their ZIOs ready to be issued, we
can not afford sleeping on config lock if somebody else try to lock
it as writer, or it will cause a deadlock.

To solve it, move spa_config_enter() from zil_lwb_write_issue() to
zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering
with other threads.  Now if we can't immediately lock config, issue
all previously closed LWBs so that they could drop their config
locks after completion, and only then allow sleeping on our lock.

Reviewed-by: Mark Maybee <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes openzfs#15078
Closes openzfs#15080
This locking was recently added as part of openzfs#14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from openzfs#14979.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15103
Update the META file to reflect compatibility with the 6.4 kernel.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15095
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15086
Closes openzfs#15094
Just silencing the warning about large allocations.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
robn and others added 10 commits July 26, 2023 08:46
bv_entcount can be a relatively large allocation (see comment for
BRT_RANGESIZE), so get it from the big allocator.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
dbuf_undirty() will (correctly) only removed dirty records for the given
(open) txg. If there is a dirty record for an earlier closed txg that
has not been synced out yet, then db_dirty_records will still have
entries on it, tripping the assertion.

Instead, change the assertion to only consider the current txg. To some
extent this is redundant, as its really just saying "did dbuf_undirty()
work?", but it it doesn't hurt and accurately expresses our
expectations.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Original-patch-by: Kay Pedersen <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
Block cloning introduced a new state transition from DB_NOFILL to
DB_READ. This occurs when a block is cloned and then read on the
current txg.

In this case, the clone will move the dbuf to DB_NOFILL, and then the
read will be issued for the overidden block pointer. If that read is
still outstanding when it comes time to write, the dbuf will be in
DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus
tripping the assertions.

This updates those checks to allow DB_READ as a valid state iff the
dirty record is for a BRT write and there is a override block pointer.
This is a safe situation because the block already exists, so there's
nothing that could change from underneath the read.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Original-patch-by: Kay Pedersen <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
This implements the Linux VFS ops required to service the file
copy/clone APIs:

  .copy_file_range    (4.5+)
  .clone_file_range   (4.5-4.19)
  .dedupe_file_range  (4.5-4.19)
  .remap_file_range   (4.20+)

Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are
hooked up here, but are not implemented yet.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and
were implemented as regular filesystem-specific ioctls. This implements
those ioctls directly in OpenZFS, allowing cloning to work on older
kernels.

There's no need to gate these behind version checks; on later kernels
Linux will simply never deliver these ioctls, instead calling the
approprate VFS op.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
Redhat have backported copy_file_range and clone_file_range to the EL7
kernel using an "extended file operations" wrapper structure. This
connects all that up to let cloning work there too.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15050
Closes openzfs#405
Closes openzfs#13349
Return the more descriptive EOPNOTSUPP instead of EXDEV when the
storage pool doesn't support block cloning.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15097
Signed-off-by: Brian Behlendorf <[email protected]>
'program help subcommand' is a reasonably common pattern for
multifunction command-line programs. This commit adds support for that
style to the zpool and zfs commands.

When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the
'man' program on the PATH with the most likely manpage name for the
requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or
"zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics.
If no topic is supplied, uses the top "zpool" or "zfs" pages.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15288
@robn
Copy link
Member Author

robn commented Oct 3, 2023

Ahh, crap.

@robn robn closed this Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.