From 2055e9f8f6326ed765747ac533924b24a6276a48 Mon Sep 17 00:00:00 2001 From: Ned Bass Date: Sat, 26 Mar 2016 07:15:30 +0000 Subject: [PATCH] Backfilling metadnode degrades object create rates Object creation rates may be degraded when dmu_object_alloc() tries to backfill the metadnode array by restarting its search at offset 0. The method of searching the dnode space for holes is inefficient and unreliable, leading to many failed attempts to obtain a dnode hold. These failed attempts are expensive and limit overall system throughput. This patch changes the default behavior to disable backfilling, and it adds a zfs_metadnode_backfill module parameter to allow the old behavior to be enabled. The search offset restart happens at most once per call to dmu_object_alloc() when the previously allocated object number is a multiple of 4096. If the hold on the requested object fails because the object is allocated, dmu_object_next() is called to find the next hole. That function should theoretically identify the next free object that the next loop iteration can successfully obtain a hold on. In practice, however, dmu_object_next() may falsely identify a recently allocated dnode as free because the in-memory copy of the dnode_phys_t is not up to date. The next hold attempt then fails, and this process repeats for up to 4096 loop iterations before the search skips ahead to a sparse region of the metadnode. A similar pathology occurs if dmu_object_next() returns ESRCH when it fails to find a hole in the current dnode block. In this case dmu_object_alloc() simply increments the object number and retries, resulting again in up to 4096 failed dnode hold attempts. We can avoid these pathologies by not attempting to backfill the metadnode array. This may result in sparse dnode blocks, potentially costing disk space, memory overhead, and increased disk I/O. These penalties appear to be outweighed by the performance cost of the current approach. Future work could implement a more efficient means to search for holes and allow us to reenable backfilling by default. === Benchmark Results === We measured a 46% increase in average file creation rate by setting zfs_metadnode_backfill=0. The createmany benchmark used is available at http://github.com/nedbass/createmany. It used 32 threads to create 16 million files over 16 iterations. The pool was freshly created for each of the two tests. The test system was a d2.xlarge Amazon AWS virtual machine with 3 2TB disks in a raidz pool. zfs_metadnode_backfill Average creates/second ---------------------- ---------------------- 0 43879 1 30040 $ zpool create tank raidz /dev/xvd{b,c,d} $ echo 0 > /sys/module/zfs/parameters/zfs_metadnode_backfill $ for ((i=0;i<16;i++)) ; do ./createmany -o -t 32 -D $(mktemp -d /tank/XXXXX) 1000000 ; done total: 1000000 creates in 21.142829 seconds: 47297.359852 creates/second total: 1000000 creates in 21.421943 seconds: 46681.108566 creates/second total: 1000000 creates in 21.996960 seconds: 45460.826977 creates/second total: 1000000 creates in 22.031947 seconds: 45388.637143 creates/second total: 1000000 creates in 21.597262 seconds: 46302.165727 creates/second total: 1000000 creates in 21.194397 seconds: 47182.281302 creates/second total: 1000000 creates in 23.844561 seconds: 41938.285457 creates/second total: 1000000 creates in 25.678497 seconds: 38943.089478 creates/second total: 1000000 creates in 22.400553 seconds: 44641.757449 creates/second total: 1000000 creates in 22.011262 seconds: 45431.290857 creates/second total: 1000000 creates in 21.848749 seconds: 45769.211022 creates/second total: 1000000 creates in 26.574808 seconds: 37629.622928 creates/second total: 1000000 creates in 22.326124 seconds: 44790.580077 creates/second total: 1000000 creates in 23.562593 seconds: 42440.152541 creates/second total: 1000000 creates in 26.825597 seconds: 37277.828270 creates/second total: 1000000 creates in 22.277026 seconds: 44889.297413 creates/second $ zpool destroy tank $ zpool create tank raidz /dev/xvd{b,c,d} $ echo 1 > /sys/module/zfs/parameters/zfs_metadnode_backfill $ for ((i=0;i<16;i++)) ; do ./createmany -o -t 32 -D $(mktemp -d /tank/XXXXX) 1000000 ; done total: 1000000 creates in 31.947285 seconds: 31301.564265 creates/second total: 1000000 creates in 31.511260 seconds: 31734.687822 creates/second total: 1000000 creates in 31.984121 seconds: 31265.515618 creates/second total: 1000000 creates in 31.960720 seconds: 31288.406458 creates/second total: 1000000 creates in 32.651408 seconds: 30626.550663 creates/second total: 1000000 creates in 32.579218 seconds: 30694.414826 creates/second total: 1000000 creates in 36.163562 seconds: 27652.143474 creates/second total: 1000000 creates in 33.621352 seconds: 29743.003829 creates/second total: 1000000 creates in 33.097268 seconds: 30213.974061 creates/second total: 1000000 creates in 34.419482 seconds: 29053.313476 creates/second total: 1000000 creates in 34.014244 seconds: 29399.448204 creates/second total: 1000000 creates in 32.972573 seconds: 30328.236705 creates/second total: 1000000 creates in 34.757156 seconds: 28771.054526 creates/second total: 1000000 creates in 32.194859 seconds: 31060.859951 creates/second total: 1000000 creates in 32.464407 seconds: 30802.966165 creates/second total: 1000000 creates in 37.443681 seconds: 26706.776650 creates/second Signed-off-by: Ned Bass --- man/man5/zfs-module-parameters.5 | 13 +++++++++++++ module/zfs/dmu_object.c | 10 +++++++++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5 index 2d565dc191f9..c8fa10581faa 100644 --- a/man/man5/zfs-module-parameters.5 +++ b/man/man5/zfs-module-parameters.5 @@ -1170,6 +1170,19 @@ Disable meta data compression Use \fB1\fR for yes and \fB0\fR for no (default). .RE +.sp +.ne 2 +.na +\fBzfs_metadnode_backfill\fR (int) +.ad +.RS 12n +Enable backfilling of the metadnode array to avoid sparse dnode blocks. +Sparse blocks can cost disk space, memory overhead, and increased disk +I/O, while backfilling can limit overall object creation rates. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + .sp .ne 2 .na diff --git a/module/zfs/dmu_object.c b/module/zfs/dmu_object.c index 5faecafc7d86..05e50d310d91 100644 --- a/module/zfs/dmu_object.c +++ b/module/zfs/dmu_object.c @@ -31,6 +31,8 @@ #include #include +int zfs_metadnode_backfill = 0; + uint64_t dmu_object_alloc(objset_t *os, dmu_object_type_t ot, int blocksize, dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx) @@ -58,7 +60,9 @@ dmu_object_alloc(objset_t *os, dmu_object_type_t ot, int blocksize, * described in traverse_visitbp. */ if (P2PHASE(object, L2_dnode_count) == 0) { - uint64_t offset = restarted ? object << DNODE_SHIFT : 0; + uint64_t offset = + (restarted || !zfs_metadnode_backfill) ? + object << DNODE_SHIFT : 0; int error = dnode_next_offset(DMU_META_DNODE(os), DNODE_FIND_HOLE, &offset, 2, DNODES_PER_BLOCK >> 2, 0); @@ -225,6 +229,10 @@ dmu_object_free_zapified(objset_t *mos, uint64_t object, dmu_tx_t *tx) } #if defined(_KERNEL) && defined(HAVE_SPL) +module_param(zfs_metadnode_backfill, int, 0644); +MODULE_PARM_DESC(zfs_metadnode_backfill, + "Enable backfilling of the metadnode array"); + EXPORT_SYMBOL(dmu_object_alloc); EXPORT_SYMBOL(dmu_object_claim); EXPORT_SYMBOL(dmu_object_reclaim);