Fix large mzap_upgrade() allocation #2580

behlendorf · 2014-08-08T16:09:12Z

Avoid 128K kmem allocations in mzap_upgrade()

As originally implemented the mzap_upgrade() function will perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc(). These large allocations can potentially block indefinitely if contiguous memory is not available. Since this allocation is done under the zap->zap_rwlock it can appear as if there is a deadlock in zap_lockdir().

    filebench     R  running task        0 15523  15521 0x0000008c
    Call Trace:
     [<ffffffff8116eeea>] fallback_alloc+0x1ba/0x270
     [<ffffffff8116e93f>] cache_grow+0x2cf/0x320
     [<ffffffff8116ec69>] ____cache_alloc_node+0x99/0x160
     [<ffffffffa0233171>] kmem_alloc_debug+0x251/0x490 [spl]
     [<ffffffff8116fa39>] __kmalloc+0x189/0x220
     [<ffffffffa0233171>] kmem_alloc_debug+0x251/0x490 [spl]
     [<ffffffffa04c5d5a>] mzap_upgrade+0xca/0x310 [zfs]
     [<ffffffffa04c6c59>] zap_lockdir+0xab9/0xbb0 [zfs]
     [<ffffffffa04c7ca0>] zap_add+0x50/0x1c0 [zfs]
     [<ffffffffa04be17a>] zap_add_int+0x7a/0xa0 [zfs]
     [<ffffffffa04d254f>] zfs_unlinked_add+0x5f/0x110 [zfs]
     [<ffffffffa04d3051>] zfs_rmnode+0x1f1/0x410 [zfs]
     [<ffffffffa04fa0ae>] zfs_zinactive+0xfe/0x200 [zfs]
     [<ffffffffa04f4d1f>] zfs_inactive+0x7f/0x370 [zfs]
     [<ffffffffa0513b90>] zpl_inode_delete+0x0/0x30 [zfs]
     [<ffffffffa0513a1e>] zpl_clear_inode+0xe/0x10 [zfs]
     [<ffffffff811a659c>] clear_inode+0xac/0x140
     [<ffffffffa0513bb0>] zpl_inode_delete+0x20/0x30 [zfs]
     [<ffffffff811a6c9e>] generic_delete_inode+0xde/0x1d0
     [<ffffffff811a6df5>] generic_drop_inode+0x65/0x80
     [<ffffffff811a5c42>] iput+0x62/0x70
     [<ffffffff8119ab09>] do_unlinkat+0x1a9/0x260
     [<ffffffff8119abd6>] sys_unlink+0x16/0x20
     [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

    kswapd0       D 0000000000000000     0    59      2 0x00000000
    Call Trace:
     [<ffffffff8152b275>] rwsem_down_failed_common+0x95/0x1d0
     [<ffffffffa04871b6>] ? refcount_remove+0x16/0x20 [zfs]
     [<ffffffff8152b3d3>] rwsem_down_write_failed+0x23/0x30
     [<ffffffff8128f383>] call_rwsem_down_write_failed+0x13/0x20
     [<ffffffffa04c63c9>] zap_lockdir+0x229/0xbb0 [zfs]
     [<ffffffffa04c8868>] zap_remove_norm+0x48/0x2d0 [zfs]
     [<ffffffffa04c8b03>] zap_remove+0x13/0x20 [zfs]
     [<ffffffffa04bde31>] zap_remove_int+0x61/0x90 [zfs]
     [<ffffffffa04d306c>] zfs_rmnode+0x20c/0x410 [zfs]
     [<ffffffffa04fa0ae>] zfs_zinactive+0xfe/0x200 [zfs]
     [<ffffffffa04f4d1f>] zfs_inactive+0x7f/0x370 [zfs]
     [<ffffffffa0513a1e>] zpl_clear_inode+0xe/0x10 [zfs]
     [<ffffffff811a659c>] clear_inode+0xac/0x140
     [<ffffffff811a6670>] dispose_list+0x40/0x120
     [<ffffffff811a69c4>] shrink_icache_memory+0x274/0x2e0
     [<ffffffff81138a4a>] shrink_slab+0x12a/0x1a0
     [<ffffffff8113bd6a>] balance_pgdat+0x59a/0x820
     [<ffffffff8113c124>] kswapd+0x134/0x3b0
     [<ffffffff8109abf6>] kthread+0x96/0xa0

The text was updated successfully, but these errors were encountered:

As originally implemented the mzap_upgrade() function will perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc(). These large allocations can potentially block indefinitely if contiguous memory is not available. Since this allocation is done under the zap->zap_rwlock it can appear as if there is a deadlock in zap_lockdir(). This is shown below. The optimal fix for this would be to rework mzap_upgrade() such that no large allocations are required. This could be done but it would result in us diverging further from the other implementations. Therefore I've opted against doing this unless it becomes absolutely necessary. Instead mzap_upgrade() has been updated to use zio_buf_alloc() which can reliably provide buffers of up to SPA_MAXBLOCKSIZE. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#2580

As originally implemented the mzap_upgrade() function will perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc(). These large allocations can potentially block indefinitely if contiguous memory is not available. Since this allocation is done under the zap->zap_rwlock it can appear as if there is a deadlock in zap_lockdir(). This is shown below. The optimal fix for this would be to rework mzap_upgrade() such that no large allocations are required. This could be done but it would result in us diverging further from the other implementations. Therefore I've opted against doing this unless it becomes absolutely necessary. Instead mzap_upgrade() has been updated to use zio_buf_alloc() which can reliably provide buffers of up to SPA_MAXBLOCKSIZE. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Close openzfs#2580

behlendorf added this to the 0.6.4 milestone Aug 8, 2014

behlendorf added the Bug label Aug 8, 2014

behlendorf closed this as completed in 4dd1893 Aug 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix large mzap_upgrade() allocation #2580

Fix large mzap_upgrade() allocation #2580

behlendorf commented Aug 8, 2014

Fix large mzap_upgrade() allocation #2580

Fix large mzap_upgrade() allocation #2580

Comments

behlendorf commented Aug 8, 2014