Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix large mzap_upgrade() allocation #2580

Closed
behlendorf opened this issue Aug 8, 2014 · 0 comments
Closed

Fix large mzap_upgrade() allocation #2580

behlendorf opened this issue Aug 8, 2014 · 0 comments
Milestone

Comments

@behlendorf
Copy link
Contributor

Avoid 128K kmem allocations in mzap_upgrade()

As originally implemented the mzap_upgrade() function will perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc(). These large allocations can potentially block indefinitely if contiguous memory is not available. Since this allocation is done under the zap->zap_rwlock it can appear as if there is a deadlock in zap_lockdir().

    filebench     R  running task        0 15523  15521 0x0000008c
    Call Trace:
     [<ffffffff8116eeea>] fallback_alloc+0x1ba/0x270
     [<ffffffff8116e93f>] cache_grow+0x2cf/0x320
     [<ffffffff8116ec69>] ____cache_alloc_node+0x99/0x160
     [<ffffffffa0233171>] kmem_alloc_debug+0x251/0x490 [spl]
     [<ffffffff8116fa39>] __kmalloc+0x189/0x220
     [<ffffffffa0233171>] kmem_alloc_debug+0x251/0x490 [spl]
     [<ffffffffa04c5d5a>] mzap_upgrade+0xca/0x310 [zfs]
     [<ffffffffa04c6c59>] zap_lockdir+0xab9/0xbb0 [zfs]
     [<ffffffffa04c7ca0>] zap_add+0x50/0x1c0 [zfs]
     [<ffffffffa04be17a>] zap_add_int+0x7a/0xa0 [zfs]
     [<ffffffffa04d254f>] zfs_unlinked_add+0x5f/0x110 [zfs]
     [<ffffffffa04d3051>] zfs_rmnode+0x1f1/0x410 [zfs]
     [<ffffffffa04fa0ae>] zfs_zinactive+0xfe/0x200 [zfs]
     [<ffffffffa04f4d1f>] zfs_inactive+0x7f/0x370 [zfs]
     [<ffffffffa0513b90>] zpl_inode_delete+0x0/0x30 [zfs]
     [<ffffffffa0513a1e>] zpl_clear_inode+0xe/0x10 [zfs]
     [<ffffffff811a659c>] clear_inode+0xac/0x140
     [<ffffffffa0513bb0>] zpl_inode_delete+0x20/0x30 [zfs]
     [<ffffffff811a6c9e>] generic_delete_inode+0xde/0x1d0
     [<ffffffff811a6df5>] generic_drop_inode+0x65/0x80
     [<ffffffff811a5c42>] iput+0x62/0x70
     [<ffffffff8119ab09>] do_unlinkat+0x1a9/0x260
     [<ffffffff8119abd6>] sys_unlink+0x16/0x20
     [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

    kswapd0       D 0000000000000000     0    59      2 0x00000000
    Call Trace:
     [<ffffffff8152b275>] rwsem_down_failed_common+0x95/0x1d0
     [<ffffffffa04871b6>] ? refcount_remove+0x16/0x20 [zfs]
     [<ffffffff8152b3d3>] rwsem_down_write_failed+0x23/0x30
     [<ffffffff8128f383>] call_rwsem_down_write_failed+0x13/0x20
     [<ffffffffa04c63c9>] zap_lockdir+0x229/0xbb0 [zfs]
     [<ffffffffa04c8868>] zap_remove_norm+0x48/0x2d0 [zfs]
     [<ffffffffa04c8b03>] zap_remove+0x13/0x20 [zfs]
     [<ffffffffa04bde31>] zap_remove_int+0x61/0x90 [zfs]
     [<ffffffffa04d306c>] zfs_rmnode+0x20c/0x410 [zfs]
     [<ffffffffa04fa0ae>] zfs_zinactive+0xfe/0x200 [zfs]
     [<ffffffffa04f4d1f>] zfs_inactive+0x7f/0x370 [zfs]
     [<ffffffffa0513a1e>] zpl_clear_inode+0xe/0x10 [zfs]
     [<ffffffff811a659c>] clear_inode+0xac/0x140
     [<ffffffff811a6670>] dispose_list+0x40/0x120
     [<ffffffff811a69c4>] shrink_icache_memory+0x274/0x2e0
     [<ffffffff81138a4a>] shrink_slab+0x12a/0x1a0
     [<ffffffff8113bd6a>] balance_pgdat+0x59a/0x820
     [<ffffffff8113c124>] kswapd+0x134/0x3b0
     [<ffffffff8109abf6>] kthread+0x96/0xa0
@behlendorf behlendorf added this to the 0.6.4 milestone Aug 8, 2014
@behlendorf behlendorf added the Bug label Aug 8, 2014
behlendorf added a commit to behlendorf/zfs that referenced this issue Aug 8, 2014
As originally implemented the mzap_upgrade() function will
perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc().
These large allocations can potentially block indefinitely
if contiguous memory is not available.  Since this allocation
is done under the zap->zap_rwlock it can appear as if there is
a deadlock in zap_lockdir().  This is shown below.

The optimal fix for this would be to rework mzap_upgrade()
such that no large allocations are required.  This could be
done but it would result in us diverging further from the other
implementations.  Therefore I've opted against doing this
unless it becomes absolutely necessary.

Instead mzap_upgrade() has been updated to use zio_buf_alloc()
which can reliably provide buffers of up to SPA_MAXBLOCKSIZE.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2580
ryao pushed a commit to ryao/zfs that referenced this issue Aug 11, 2014
As originally implemented the mzap_upgrade() function will
perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc().
These large allocations can potentially block indefinitely
if contiguous memory is not available.  Since this allocation
is done under the zap->zap_rwlock it can appear as if there is
a deadlock in zap_lockdir().  This is shown below.

The optimal fix for this would be to rework mzap_upgrade()
such that no large allocations are required.  This could be
done but it would result in us diverging further from the other
implementations.  Therefore I've opted against doing this
unless it becomes absolutely necessary.

Instead mzap_upgrade() has been updated to use zio_buf_alloc()
which can reliably provide buffers of up to SPA_MAXBLOCKSIZE.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2580
ryao pushed a commit to ryao/zfs that referenced this issue Oct 8, 2014
As originally implemented the mzap_upgrade() function will
perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc().
These large allocations can potentially block indefinitely
if contiguous memory is not available.  Since this allocation
is done under the zap->zap_rwlock it can appear as if there is
a deadlock in zap_lockdir().  This is shown below.

The optimal fix for this would be to rework mzap_upgrade()
such that no large allocations are required.  This could be
done but it would result in us diverging further from the other
implementations.  Therefore I've opted against doing this
unless it becomes absolutely necessary.

Instead mzap_upgrade() has been updated to use zio_buf_alloc()
which can reliably provide buffers of up to SPA_MAXBLOCKSIZE.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2580
ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
As originally implemented the mzap_upgrade() function will
perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc().
These large allocations can potentially block indefinitely
if contiguous memory is not available.  Since this allocation
is done under the zap->zap_rwlock it can appear as if there is
a deadlock in zap_lockdir().  This is shown below.

The optimal fix for this would be to rework mzap_upgrade()
such that no large allocations are required.  This could be
done but it would result in us diverging further from the other
implementations.  Therefore I've opted against doing this
unless it becomes absolutely necessary.

Instead mzap_upgrade() has been updated to use zio_buf_alloc()
which can reliably provide buffers of up to SPA_MAXBLOCKSIZE.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Close openzfs#2580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant