Skip to content

Commit

Permalink
Avoid 128K kmem allocations in mzap_upgrade()
Browse files Browse the repository at this point in the history
As originally implemented the mzap_upgrade() function will
perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc().
These large allocations can potentially block indefinitely
if contiguous memory is not available.  Since this allocation
is done under the zap->zap_rwlock it can appear as if there is
a deadlock in zap_lockdir().  This is shown below.

The optimal fix for this would be to rework mzap_upgrade()
such that no longer allocations are required.  This could be
done but it would result in us diverging further from the other
implementations.  Therefore I've opted against doing this
unless it becomes absolutely necessary.

Instead mzap_upgrade() has been updated to use zio_buf_alloc()
which can reliably provide buffers of up to SPA_MAXBLOCKSIZE.

filebench     R  running task        0 15523  15521 0x0000008c
Call Trace:
 [<ffffffff8116eeea>] fallback_alloc+0x1ba/0x270
 [<ffffffff8116e93f>] cache_grow+0x2cf/0x320
 [<ffffffff8116ec69>] ____cache_alloc_node+0x99/0x160
 [<ffffffffa0233171>] kmem_alloc_debug+0x251/0x490 [spl]
 [<ffffffff8116fa39>] __kmalloc+0x189/0x220
 [<ffffffffa0233171>] kmem_alloc_debug+0x251/0x490 [spl]
 [<ffffffffa04c5d5a>] mzap_upgrade+0xca/0x310 [zfs]
 [<ffffffffa04c6c59>] zap_lockdir+0xab9/0xbb0 [zfs]
 [<ffffffffa04c7ca0>] zap_add+0x50/0x1c0 [zfs]
 [<ffffffffa04be17a>] zap_add_int+0x7a/0xa0 [zfs]
 [<ffffffffa04d254f>] zfs_unlinked_add+0x5f/0x110 [zfs]
 [<ffffffffa04d3051>] zfs_rmnode+0x1f1/0x410 [zfs]
 [<ffffffffa04fa0ae>] zfs_zinactive+0xfe/0x200 [zfs]
 [<ffffffffa04f4d1f>] zfs_inactive+0x7f/0x370 [zfs]
 [<ffffffffa0513b90>] zpl_inode_delete+0x0/0x30 [zfs]
 [<ffffffffa0513a1e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff811a659c>] clear_inode+0xac/0x140
 [<ffffffffa0513bb0>] zpl_inode_delete+0x20/0x30 [zfs]
 [<ffffffff811a6c9e>] generic_delete_inode+0xde/0x1d0
 [<ffffffff811a6df5>] generic_drop_inode+0x65/0x80
 [<ffffffff811a5c42>] iput+0x62/0x70
 [<ffffffff8119ab09>] do_unlinkat+0x1a9/0x260
 [<ffffffff8119abd6>] sys_unlink+0x16/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

kswapd0       D 0000000000000000     0    59      2 0x00000000
Call Trace:
 [<ffffffff8152b275>] rwsem_down_failed_common+0x95/0x1d0
 [<ffffffffa04871b6>] ? refcount_remove+0x16/0x20 [zfs]
 [<ffffffff8152b3d3>] rwsem_down_write_failed+0x23/0x30
 [<ffffffff8128f383>] call_rwsem_down_write_failed+0x13/0x20
 [<ffffffffa04c63c9>] zap_lockdir+0x229/0xbb0 [zfs]
 [<ffffffffa04c8868>] zap_remove_norm+0x48/0x2d0 [zfs]
 [<ffffffffa04c8b03>] zap_remove+0x13/0x20 [zfs]
 [<ffffffffa04bde31>] zap_remove_int+0x61/0x90 [zfs]
 [<ffffffffa04d306c>] zfs_rmnode+0x20c/0x410 [zfs]
 [<ffffffffa04fa0ae>] zfs_zinactive+0xfe/0x200 [zfs]
 [<ffffffffa04f4d1f>] zfs_inactive+0x7f/0x370 [zfs]
 [<ffffffffa0513a1e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff811a659c>] clear_inode+0xac/0x140
 [<ffffffff811a6670>] dispose_list+0x40/0x120
 [<ffffffff811a69c4>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81138a4a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8113bd6a>] balance_pgdat+0x59a/0x820
 [<ffffffff8113c124>] kswapd+0x134/0x3b0
 [<ffffffff8109abf6>] kthread+0x96/0xa0

Signed-off-by: Brian Behlendorf <[email protected]>
  • Loading branch information
behlendorf committed Aug 5, 2014
1 parent fbeddd6 commit 26cb948
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions module/zfs/zap_micro.c
Original file line number Diff line number Diff line change
Expand Up @@ -533,15 +533,15 @@ mzap_upgrade(zap_t **zapp, dmu_tx_t *tx, zap_flags_t flags)
ASSERT(RW_WRITE_HELD(&zap->zap_rwlock));

sz = zap->zap_dbuf->db_size;
mzp = kmem_alloc(sz, KM_PUSHPAGE | KM_NODEBUG);
mzp = zio_buf_alloc(sz);
bcopy(zap->zap_dbuf->db_data, mzp, sz);
nchunks = zap->zap_m.zap_num_chunks;

if (!flags) {
err = dmu_object_set_blocksize(zap->zap_objset, zap->zap_object,
1ULL << fzap_default_block_shift, 0, tx);
if (err) {
kmem_free(mzp, sz);
zio_buf_free(mzp, sz);
return (err);
}
}
Expand All @@ -567,7 +567,7 @@ mzap_upgrade(zap_t **zapp, dmu_tx_t *tx, zap_flags_t flags)
if (err)
break;
}
kmem_free(mzp, sz);
zio_buf_free(mzp, sz);
*zapp = zap;
return (err);
}
Expand Down

0 comments on commit 26cb948

Please sign in to comment.