Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: large_microzap #16593

Merged
merged 2 commits into from
Oct 3, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
feature: large_microzap
In a4b21ea we added the zap_micro_max_size tuneable to raise the size
at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block)
ZAPs. Before this, a microZAP was limited to 128KiB, which was the old
largest block size. The side effect of raising the max size past 128KiB
is that it be stored in a large block, requiring the large_blocks
feature.

Unfortunately, this means that a backup stream created without the
--large-block (-L) flag to zfs send would split the microZAP block into
smaller blocks and send those, as is normal behaviour for large blocks.
This would be received correctly, but since microZAPs are limited to the
first block in the object by definition, the entries in the later blocks
would be inaccessible. For directory ZAPs, this gives the appearance of
files being lost.

This commit adds a feature flag, large_microzap, that must be enabled
for microZAPs to grow beyond 128KiB, and which will be activated the
first time that occurs. This feature is later checked when generating
the stream and if active, the send operation will abort unless
--large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The
state of this flag effectively sets the upper value for this tuneable,
that is, if the feature is disabled, the tuneable will be clamped to
128KiB.

A stream flag is also added to ensure that the receiver also activates
its own feature flag upon receiving the stream. This is not strictly
necessary to _use_ the received microZAP, since it doesn't care how
large its block is, but it is required to send the microZAP object on,
otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP
from outside the ZAP code, and because it seems unlikely that most
users are affected (a fairly niche tuneable combined with what should be
an uncommon use of send), and for the sake of expediency, this change
activates the feature the first time a microZAP grows to use a large
block, and is never deactivated after that. This can be improved in the
future.

This commit changes nothing for existing pools that already have large
microZAPs. The feature will not be retroactively applied, but will be
activated the next time a microZAP grows past the limit.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
robn committed Oct 2, 2024
commit 97f39af3d06bc6a69b38762a1f30ef7bf31ada71
2 changes: 2 additions & 0 deletions include/sys/fs/zfs.h
Original file line number Diff line number Diff line change
@@ -30,6 +30,7 @@
* Portions Copyright 2010 Robert Milkowski
* Copyright (c) 2021, Colm Buckley <[email protected]>
* Copyright (c) 2022 Hewlett Packard Enterprise Development LP.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _SYS_FS_ZFS_H
@@ -1631,6 +1632,7 @@ typedef enum {
ZFS_ERR_CRYPTO_NOTSUP,
ZFS_ERR_RAIDZ_EXPAND_IN_PROGRESS,
ZFS_ERR_ASHIFT_MISMATCH,
ZFS_ERR_STREAM_LARGE_MICROZAP,
} zfs_errno_t;

/*
4 changes: 3 additions & 1 deletion include/sys/zap_impl.h
Original file line number Diff line number Diff line change
@@ -24,6 +24,7 @@
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright (c) 2013, 2016 by Delphix. All rights reserved.
* Copyright 2017 Nexenta Systems, Inc.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _SYS_ZAP_IMPL_H
@@ -45,7 +46,6 @@ extern int fzap_default_block_shift;

#define MZAP_ENT_LEN 64
#define MZAP_NAME_LEN (MZAP_ENT_LEN - 8 - 4 - 2)
#define MZAP_MAX_BLKSZ SPA_OLD_MAXBLOCKSIZE

#define ZAP_NEED_CD (-1U)

@@ -210,6 +210,8 @@ int zap_hashbits(zap_t *zap);
uint32_t zap_maxcd(zap_t *zap);
uint64_t zap_getflags(zap_t *zap);

uint64_t zap_get_micro_max_size(spa_t *spa);

#define ZAP_HASH_IDX(hash, n) (((n) == 0) ? 0 : ((hash) >> (64 - (n))))

void fzap_byteswap(void *buf, size_t size);
5 changes: 4 additions & 1 deletion include/sys/zfs_ioctl.h
Original file line number Diff line number Diff line change
@@ -23,6 +23,7 @@
* Copyright (c) 2012, 2024 by Delphix. All rights reserved.
* Copyright 2016 RackTop Systems.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _SYS_ZFS_IOCTL_H
@@ -145,6 +146,7 @@ typedef enum drr_headertype {
*/
#define DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS (1 << 27)
#define DMU_BACKUP_FEATURE_LONGNAME (1 << 28)
#define DMU_BACKUP_FEATURE_LARGE_MICROZAP (1 << 29)

/*
* Mask of all supported backup features
@@ -155,7 +157,8 @@ typedef enum drr_headertype {
DMU_BACKUP_FEATURE_COMPRESSED | DMU_BACKUP_FEATURE_LARGE_DNODE | \
DMU_BACKUP_FEATURE_RAW | DMU_BACKUP_FEATURE_HOLDS | \
DMU_BACKUP_FEATURE_REDACTED | DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS | \
DMU_BACKUP_FEATURE_ZSTD | DMU_BACKUP_FEATURE_LONGNAME)
DMU_BACKUP_FEATURE_ZSTD | DMU_BACKUP_FEATURE_LONGNAME | \
DMU_BACKUP_FEATURE_LARGE_MICROZAP)

/* Are all features in the given flag word currently supported? */
#define DMU_STREAM_SUPPORTED(x) (!((x) & ~DMU_BACKUP_FEATURE_MASK))
2 changes: 2 additions & 0 deletions include/zfeature_common.h
Original file line number Diff line number Diff line change
@@ -24,6 +24,7 @@
* Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
* Copyright (c) 2013, Joyent, Inc. All rights reserved.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2024, Klara, Inc.
*/

#ifndef _ZFEATURE_COMMON_H
@@ -84,6 +85,7 @@ typedef enum spa_feature {
SPA_FEATURE_RAIDZ_EXPANSION,
SPA_FEATURE_FAST_DEDUP,
SPA_FEATURE_LONGNAME,
SPA_FEATURE_LARGE_MICROZAP,
SPA_FEATURES
} spa_feature_t;

11 changes: 6 additions & 5 deletions lib/libzfs/libzfs.abi
Original file line number Diff line number Diff line change
@@ -629,7 +629,7 @@
<elf-symbol name='fletcher_4_superscalar_ops' size='128' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='libzfs_config_ops' size='16' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='sa_protocol_names' size='16' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='spa_feature_table' size='2408' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='spa_feature_table' size='2464' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfeature_checks_disable' size='4' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_deleg_perm_tab' size='512' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_history_event_names' size='328' type='object-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
@@ -6194,7 +6194,8 @@
<enumerator name='SPA_FEATURE_RAIDZ_EXPANSION' value='40'/>
<enumerator name='SPA_FEATURE_FAST_DEDUP' value='41'/>
<enumerator name='SPA_FEATURE_LONGNAME' value='42'/>
<enumerator name='SPA_FEATURES' value='43'/>
<enumerator name='SPA_FEATURE_LARGE_MICROZAP' value='43'/>
<enumerator name='SPA_FEATURES' value='44'/>
</enum-decl>
<typedef-decl name='spa_feature_t' type-id='33ecb627' id='d6618c78'/>
<qualified-type-def type-id='80f4b756' const='yes' id='b99c00c9'/>
@@ -9373,8 +9374,8 @@
</function-decl>
</abi-instr>
<abi-instr address-size='64' path='module/zcommon/zfeature_common.c' language='LANG_C99'>
<array-type-def dimensions='1' type-id='83f29ca2' size-in-bits='19264' id='bd39d632'>
<subrange length='43' type-id='7359adad' id='8f7e73a2'/>
<array-type-def dimensions='1' type-id='83f29ca2' size-in-bits='19712' id='fd4573e5'>
<subrange length='44' type-id='7359adad' id='cf8ba455'/>
</array-type-def>
<enum-decl name='zfeature_flags' id='6db816a4'>
<underlying-type type-id='9cac1fee'/>
@@ -9451,7 +9452,7 @@
<pointer-type-def type-id='611586a1' size-in-bits='64' id='2e243169'/>
<qualified-type-def type-id='eaa32e2f' const='yes' id='83be723c'/>
<pointer-type-def type-id='83be723c' size-in-bits='64' id='7acd98a2'/>
<var-decl name='spa_feature_table' type-id='bd39d632' mangled-name='spa_feature_table' visibility='default' elf-symbol-id='spa_feature_table'/>
<var-decl name='spa_feature_table' type-id='fd4573e5' mangled-name='spa_feature_table' visibility='default' elf-symbol-id='spa_feature_table'/>
<var-decl name='zfeature_checks_disable' type-id='c19b74c3' mangled-name='zfeature_checks_disable' visibility='default' elf-symbol-id='zfeature_checks_disable'/>
<function-decl name='opendir' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='80f4b756'/>
8 changes: 7 additions & 1 deletion lib/libzfs/libzfs_sendrecv.c
Original file line number Diff line number Diff line change
@@ -30,6 +30,7 @@
* Copyright 2016 Igor Kozhukhov <[email protected]>
* Copyright (c) 2018, loli10K <[email protected]>. All rights reserved.
* Copyright (c) 2019 Datto Inc.
* Copyright (c) 2024, Klara, Inc.
*/

#include <assert.h>
@@ -2828,7 +2829,12 @@ zfs_send_one_cb_impl(zfs_handle_t *zhp, const char *from, int fd,
case EROFS:
zfs_error_aux(hdl, "%s", zfs_strerror(errno));
return (zfs_error(hdl, EZFS_BADBACKUP, errbuf));

case ZFS_ERR_STREAM_LARGE_MICROZAP:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"source snapshot contains large microzaps, "
"need -L (--large-block) or -w (--raw) to "
"generate stream"));
return (zfs_error(hdl, EZFS_BADBACKUP, errbuf));
default:
return (zfs_standard_error(hdl, errno, errbuf));
}
10 changes: 8 additions & 2 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
@@ -16,7 +16,9 @@
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\"
.Dd June 27, 2024
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd October 2, 2024
.Dt ZFS 4
.Os
.
@@ -614,7 +616,11 @@ However, this is limited by
.
.It Sy zap_micro_max_size Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq int
Maximum micro ZAP size.
A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size.
A "micro" ZAP is upgraded to a "fat" ZAP once it grows beyond the specified
size.
Sizes higher than 128KiB will be clamped to 128KiB unless the
.Sy large_microzap
feature is enabled.
.
.It Sy zap_shrink_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
If set, adjacent empty ZAP blocks will be collapsed, reducing disk space.
23 changes: 20 additions & 3 deletions man/man7/zpool-features.7
Original file line number Diff line number Diff line change
@@ -14,12 +14,11 @@
.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\" Copyright (c) 2019, Klara Inc.
.\" Copyright (c) 2019, 2023, 2024, Klara, Inc.
.\" Copyright (c) 2019, Allan Jude
.\" Copyright (c) 2021, Colm Buckley <[email protected]>
.\" Copyright (c) 2023, Klara Inc.
.\"
.Dd February 14, 2024
.Dd October 2, 2024
.Dt ZPOOL-FEATURES 7
.Os
.
@@ -706,6 +705,24 @@ are destroyed.
Large dnodes allow more data to be stored in the bonus buffer,
thus potentially improving performance by avoiding the use of spill blocks.
.
.feature com.klarasystems large_microzap yes extensible_dataset large_blocks
This feature allows "micro" ZAPs to grow larger than 128 KiB without being
upgraded to "fat" ZAPs.
.Pp
This feature becomes
.Sy active
the first time a micro ZAP grows larger than 128KiB.
It will only be returned to the
.Sy enabled
state when all datasets that ever had a large micro ZAP are destroyed.
.Pp
Note that even when this feature is enabled, micro ZAPs cannot grow larger
than 128 KiB without also changing the
.Sy zap_micro_max_size
module parameter.
See
.Xr zfs 4 .
.
.feature com.delphix livelist yes extensible_dataset
This feature allows clones to be deleted faster than the traditional method
when a large number of random/sparse writes have been made to the clone.
6 changes: 5 additions & 1 deletion man/man8/zfs-send.8
Original file line number Diff line number Diff line change
@@ -28,8 +28,9 @@
.\" Copyright 2019 Richard Laager. All rights reserved.
.\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc.
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd July 27, 2023
.Dd October 2, 2024
.Dt ZFS-SEND 8
.Os
.
@@ -111,6 +112,9 @@ property of this filesystem has never been set above 128 KiB.
The receiving system must have the
.Sy large_blocks
pool feature enabled as well.
This flag is required if the
.Sy large_microzap
pool feature is active.
See
.Xr zpool-features 7
for details on ZFS feature flags and the
15 changes: 14 additions & 1 deletion module/zcommon/zfeature_common.c
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
* Copyright (c) 2013, Joyent, Inc. All rights reserved.
* Copyright (c) 2014, Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2019, Klara Inc.
* Copyright (c) 2019, 2024, Klara, Inc.
* Copyright (c) 2019, Allan Jude
*/

@@ -772,6 +772,19 @@ zpool_feature_init(void)
longname_deps, sfeatures);
}

{
static const spa_feature_t large_microzap_deps[] = {
SPA_FEATURE_EXTENSIBLE_DATASET,
SPA_FEATURE_LARGE_BLOCKS,
SPA_FEATURE_NONE
};
zfeature_register(SPA_FEATURE_LARGE_MICROZAP,
"com.klarasystems:large_microzap", "large_microzap",
"Support for microzaps larger than 128KB.",
ZFEATURE_FLAG_PER_DATASET | ZFEATURE_FLAG_READONLY_COMPAT,
ZFEATURE_TYPE_BOOLEAN, large_microzap_deps, sfeatures);
}

zfs_mod_list_supported_free(sfeatures);
}

23 changes: 22 additions & 1 deletion module/zfs/dmu_recv.c
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
* Copyright (c) 2014, Joyent, Inc. All rights reserved.
* Copyright 2014 HybridCluster. All rights reserved.
* Copyright (c) 2018, loli10K <[email protected]>. All rights reserved.
* Copyright (c) 2019, Klara Inc.
* Copyright (c) 2019, 2024, Klara, Inc.
* Copyright (c) 2019, Allan Jude
* Copyright (c) 2019 Datto Inc.
* Copyright (c) 2022 Axcient.
@@ -593,6 +593,9 @@ recv_begin_check_feature_flags_impl(uint64_t featureflags, spa_t *spa)
if ((featureflags & DMU_BACKUP_FEATURE_LARGE_DNODE) &&
!spa_feature_is_enabled(spa, SPA_FEATURE_LARGE_DNODE))
return (SET_ERROR(ENOTSUP));
if ((featureflags & DMU_BACKUP_FEATURE_LARGE_MICROZAP) &&
!spa_feature_is_enabled(spa, SPA_FEATURE_LARGE_MICROZAP))
return (SET_ERROR(ENOTSUP));

/*
* Receiving redacted streams requires that redacted datasets are
@@ -994,6 +997,24 @@ dmu_recv_begin_sync(void *arg, dmu_tx_t *tx)
numredactsnaps, tx);
}

if (featureflags & DMU_BACKUP_FEATURE_LARGE_MICROZAP) {
/*
* The source has seen a large microzap at least once in its
* life, so we activate the feature here to match. It's not
* strictly necessary since a large microzap is usable without
* the feature active, but if that object is sent on from here,
* we need this info to know to add the stream feature.
*
* There may be no large microzap in the incoming stream, or
* ever again, but this is a very niche feature and its very
* difficult to spot a large microzap in the stream, so its
* not worth the effort of trying harder to activate the
* feature at first use.
*/
dsl_dataset_activate_feature(dsobj, SPA_FEATURE_LARGE_MICROZAP,
(void *)B_TRUE, tx);
}

dmu_buf_will_dirty(newds->ds_dbuf, tx);
dsl_dataset_phys(newds)->ds_flags |= DS_FLAG_INCONSISTENT;

13 changes: 12 additions & 1 deletion module/zfs/dmu_send.c
Original file line number Diff line number Diff line change
@@ -26,7 +26,7 @@
* Copyright 2014 HybridCluster. All rights reserved.
* Copyright 2016 RackTop Systems.
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
* Copyright (c) 2019, Klara Inc.
* Copyright (c) 2019, 2024, Klara, Inc.
* Copyright (c) 2019, Allan Jude
*/

@@ -2015,6 +2015,17 @@ setup_featureflags(struct dmu_send_params *dspp, objset_t *os,
if (dsl_dataset_feature_is_active(to_ds, SPA_FEATURE_LONGNAME)) {
*featureflags |= DMU_BACKUP_FEATURE_LONGNAME;
}

if (dsl_dataset_feature_is_active(to_ds, SPA_FEATURE_LARGE_MICROZAP)) {
/*
* We must never split a large microzap block, so we can only
* send large microzaps if LARGE_BLOCKS is already enabled.
*/
if (!(*featureflags & DMU_BACKUP_FEATURE_LARGE_BLOCKS))
return (SET_ERROR(ZFS_ERR_STREAM_LARGE_MICROZAP));
*featureflags |= DMU_BACKUP_FEATURE_LARGE_MICROZAP;
}

return (0);
}

4 changes: 2 additions & 2 deletions module/zfs/dmu_tx.c
Original file line number Diff line number Diff line change
@@ -22,6 +22,7 @@
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2012, 2017 by Delphix. All rights reserved.
* Copyright (c) 2024, Klara, Inc.
*/

#include <sys/dmu.h>
@@ -575,7 +576,6 @@ dmu_tx_hold_zap_impl(dmu_tx_hold_t *txh, const char *name)
dmu_tx_t *tx = txh->txh_tx;
dnode_t *dn = txh->txh_dnode;
int err;
extern int zap_micro_max_size;

ASSERT(tx->tx_txg == 0);

@@ -591,7 +591,7 @@ dmu_tx_hold_zap_impl(dmu_tx_hold_t *txh, const char *name)
* - 2 grown ptrtbl blocks
*/
(void) zfs_refcount_add_many(&txh->txh_space_towrite,
zap_micro_max_size, FTAG);
zap_get_micro_max_size(tx->tx_pool->dp_spa), FTAG);

if (dn == NULL)
return;
Loading