Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prevent direct reclaim from recursively locking hash_lock
Chris Siebemann provided back traces for a deadlock with the latest HEAD from his system. Of particular interest was this backtrace: l2arc_feed D ffff88083f593cc0 0 642 2 0x00000000 ffff880036697198 0000000000000046 ffff88080d9fa740 0000000000013cc0 ffff880036697fd8 0000000000013cc0 ffff88080fd10000 ffff88080d9fa740 000000003f496b20 ffffffffa04fe8d8 ffffffffa04fe8dc ffff88080d9fa740 Call Trace: [<ffffffff81760619>] schedule_preempt_disabled+0x29/0x70 [<ffffffff81762433>] __mutex_lock_slowpath+0xb3/0x120 [<ffffffff817624c3>] mutex_lock+0x23/0x40 [<ffffffffa0391f39>] remove_reference.isra.10+0x59/0xc0 [zfs] [<ffffffffa039690d>] arc_buf_remove_ref+0xbd/0x140 [zfs] [<ffffffffa039cb2f>] dbuf_rele_and_unlock+0x15f/0x3b0 [zfs] [<ffffffffa03b870e>] ? dnode_setdirty+0x12e/0x190 [zfs] [<ffffffffa039f572>] ? dbuf_dirty+0x492/0x9b0 [zfs] [<ffffffffa039cec6>] dbuf_rele+0x26/0x30 [zfs] [<ffffffffa03b85c2>] dnode_rele+0x72/0x90 [zfs] [<ffffffffa039cd0a>] dbuf_rele_and_unlock+0x33a/0x3b0 [zfs] [<ffffffff811f4e9d>] ? __slab_free+0xbd/0x300 [<ffffffff817624b6>] ? mutex_lock+0x16/0x40 [<ffffffffa039dd87>] ? dmu_buf_update_user+0x57/0xb0 [zfs] [<ffffffffa039d126>] dmu_buf_rele+0x26/0x30 [zfs] [<ffffffffa03dc378>] sa_handle_destroy+0x68/0xb0 [zfs] [<ffffffffa0437c0e>] zfs_zinactive+0xce/0x160 [zfs] [<ffffffffa0431244>] zfs_inactive+0x64/0x200 [zfs] [<ffffffff810da5a0>] ? autoremove_wake_function+0x40/0x40 [<ffffffffa0448518>] zpl_evict_inode+0x28/0x30 [zfs] [<ffffffff81232227>] evict+0xa7/0x190 [<ffffffff8123234e>] dispose_list+0x3e/0x60 [<ffffffff81233386>] prune_icache_sb+0x56/0x80 [<ffffffff81218fd5>] super_cache_scan+0x115/0x180 [<ffffffff811aa299>] shrink_slab_node+0x129/0x2b0 [<ffffffff81205cfb>] ? mem_cgroup_iter+0x12b/0x430 [<ffffffff811ac0bb>] shrink_slab+0x8b/0x170 [<ffffffff811af057>] shrink_zones+0x357/0x470 [<ffffffff811af22b>] do_try_to_free_pages+0xbb/0x140 [<ffffffff811af38a>] try_to_free_pages+0xda/0x170 [<ffffffff811a2177>] __alloc_pages_nodemask+0x647/0xb30 [<ffffffff811ea59c>] alloc_pages_current+0x9c/0x120 [<ffffffff811f586b>] new_slab+0x3ab/0x4c0 [<ffffffff811f5de4>] __slab_alloc+0x464/0x5e0 [<ffffffffa02f49be>] ? spl_kmem_cache_alloc+0x8e/0x870 [spl] [<ffffffff810d0126>] ? dequeue_task_fair+0x3d6/0x680 [<ffffffff810d0f01>] ? put_prev_entity+0x31/0x400 [<ffffffff811f6323>] kmem_cache_alloc+0x1a3/0x1f0 [<ffffffffa02f49be>] ? spl_kmem_cache_alloc+0x8e/0x870 [spl] [<ffffffffa02f49be>] spl_kmem_cache_alloc+0x8e/0x870 [spl] [<ffffffff810fe9ee>] ? try_to_del_timer_sync+0x5e/0x90 [<ffffffffa043fb82>] zio_create+0x42/0x5d0 [zfs] [<ffffffffa0440341>] zio_null+0x61/0x70 [zfs] [<ffffffffa0396070>] ? l2arc_feed_thread+0xbb0/0xbb0 [zfs] [<ffffffffa044036e>] zio_root+0x1e/0x20 [zfs] [<ffffffffa0395dbc>] l2arc_feed_thread+0x8fc/0xbb0 [zfs] [<ffffffff810d3fee>] ? pick_next_task_fair+0x1be/0x8c0 [<ffffffffa03954c0>] ? l2arc_evict+0x3a0/0x3a0 [zfs] [<ffffffffa02f53fa>] thread_generic_wrapper+0x7a/0x90 [spl] [<ffffffffa02f5380>] ? __thread_exit+0x20/0x20 [spl] [<ffffffff810b7fea>] kthread+0xea/0x100 [<ffffffff810b7f00>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff817646fc>] ret_from_fork+0x7c/0xb0 [<ffffffff810b7f00>] ? kthread_create_on_node+0x1b0/0x1b0 It appears that direct reclaim attempted to evict a buffer protected by a lock the thread was already holding, which deadlocked. Marking critical sections formed by the hash_lock with spl_fstrans_mark()/spl_fstrans_unmark() whenever we might perform an allocation should avoid this. Closes openzfs#3050 Reported-by: Chris Siebenmann <[email protected]> Signed-off-by: Richard Yao <[email protected]>
- Loading branch information