rmmod: "sleeping function called from invalid context" #771

dechamps · 2012-06-05T10:09:42Z

When rmmod'ing zfs-0.6.0-rc8 on a kernel with debug facilities enabled:

 BUG: sleeping function called from invalid context at kernel/rwsem.c:48
 in_atomic(): 1, irqs_disabled(): 0, pid: 5983, name: rmmod
 1 lock held by rmmod/5983:
 #0:  (vn_file_lock){+.+...}, at: [<ffffffffa000d24d>] spl_vn_fini+0x2d/0x1e0 [spl]
 Pid: 5983, comm: rmmod Tainted: G           O 3.2.18--std-ipv6-64 #5
 Call Trace:
 [<ffffffff8107fcbf>] __might_sleep+0xdf/0x110
 [<ffffffff81b766ef>] down_write+0x1f/0x70
 [<ffffffff810e7a29>] ? __call_rcu+0xb9/0x170
 [<ffffffffa00080a0>] spl_kmem_cache_destroy+0x90/0x600 [spl]
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffffa000d24d>] ? spl_vn_fini+0x2d/0x1e0 [spl]
 [<ffffffffa000d3e0>] spl_vn_fini+0x1c0/0x1e0 [spl]
 [<ffffffffa00025ca>] ? spl_proc_fini+0x7a/0x150 [spl]
 [<ffffffffa00122ba>] spl_fini+0x5a/0xc0 [spl]
 [<ffffffff81b75429>] ? mutex_unlock+0x9/0x10
 [<ffffffff810ca8ac>] sys_delete_module+0x17c/0x270
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b7867b>] system_call_fastpath+0x16/0x1b
 BUG: scheduling while atomic: rmmod/5983/0x00000002
 1 lock held by rmmod/5983:
 #0:  (vn_file_lock){+.+...}, at: [<ffffffffa000d24d>] spl_vn_fini+0x2d/0x1e0 [spl]
 Modules linked in: spl(O-) [last unloaded: zunicode]
 Pid: 5983, comm: rmmod Tainted: G           O 3.2.18--std-ipv6-64 #5
 Call Trace:
 [<ffffffff81081e67>] __schedule_bug+0x77/0x80
 [<ffffffff81b74a5f>] __schedule+0x9af/0x9e0
 [<ffffffff810b978f>] ? save_trace+0x3f/0xc0
 [<ffffffff810bddda>] ? __lock_acquire+0x127a/0x1d80
 [<ffffffff81b74b6a>] schedule+0x3a/0x60
 [<ffffffff81b74fb5>] schedule_timeout+0x1b5/0x210
 [<ffffffff81b77b8b>] ? _raw_spin_unlock_irq+0x2b/0x50
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b73e67>] wait_for_common+0xd7/0x180
 [<ffffffff81b7538b>] ? __mutex_unlock_slowpath+0xcb/0x160
 [<ffffffff810842c0>] ? try_to_wake_up+0x2e0/0x2e0
 [<ffffffff810bb29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81b73fb8>] wait_for_completion+0x18/0x20
 [<ffffffff810a14c0>] flush_workqueue+0x230/0x560
 [<ffffffff810a1290>] ? cancel_work_sync+0x10/0x10
 [<ffffffff810a1800>] flush_scheduled_work+0x10/0x20
 [<ffffffffa000813d>] spl_kmem_cache_destroy+0x12d/0x600 [spl]
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffffa000d24d>] ? spl_vn_fini+0x2d/0x1e0 [spl]
 [<ffffffffa000d3e0>] spl_vn_fini+0x1c0/0x1e0 [spl]
 [<ffffffffa00025ca>] ? spl_proc_fini+0x7a/0x150 [spl]
 [<ffffffffa00122ba>] spl_fini+0x5a/0xc0 [spl]
 [<ffffffff81b75429>] ? mutex_unlock+0x9/0x10
 [<ffffffff810ca8ac>] sys_delete_module+0x17c/0x270
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b7867b>] system_call_fastpath+0x16/0x1b
 BUG: sleeping function called from invalid context at kernel/mutex.c:271
 in_atomic(): 1, irqs_disabled(): 0, pid: 5983, name: rmmod
 1 lock held by rmmod/5983:
 #0:  (vn_file_lock){+.+...}, at: [<ffffffffa000d24d>] spl_vn_fini+0x2d/0x1e0 [spl]
 Pid: 5983, comm: rmmod Tainted: G           O 3.2.18--std-ipv6-64 #5
 Call Trace:
 [<ffffffff8107fcbf>] __might_sleep+0xdf/0x110
 [<ffffffff81b7610c>] mutex_lock_nested+0x3c/0x370
 [<ffffffff810842c0>] ? try_to_wake_up+0x2e0/0x2e0
 [<ffffffff810bb29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810a156a>] flush_workqueue+0x2da/0x560
 [<ffffffff810a1290>] ? cancel_work_sync+0x10/0x10
 [<ffffffff810a1800>] flush_scheduled_work+0x10/0x20
 [<ffffffffa000813d>] spl_kmem_cache_destroy+0x12d/0x600 [spl]
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffffa000d24d>] ? spl_vn_fini+0x2d/0x1e0 [spl]
 [<ffffffffa000d3e0>] spl_vn_fini+0x1c0/0x1e0 [spl]
 [<ffffffffa00025ca>] ? spl_proc_fini+0x7a/0x150 [spl]
 [<ffffffffa00122ba>] spl_fini+0x5a/0xc0 [spl]
 [<ffffffff81b75429>] ? mutex_unlock+0x9/0x10
 [<ffffffff810ca8ac>] sys_delete_module+0x17c/0x270
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b7867b>] system_call_fastpath+0x16/0x1b

The text was updated successfully, but these errors were encountered:

behlendorf · 2012-06-08T04:46:46Z

On first glance that doesn't make sense... the call path referenced shouldn't be an atomic context. It's the generic shutdown path where it's entirely reasonable to sleep. However, looking more closely at it I see what it's rightfully upset about. It looks like we're destroying the kmem_cache under a spin lock which is an atomic context so we do risk a very unlikely deadlock here.

Can you try making the following change to the spl to address the issue.

diff --git a/module/spl/spl-vnode.c b/module/spl/spl-vnode.c
index cd0fa2c..2e55b00 100644
--- a/module/spl/spl-vnode.c
+++ b/module/spl/spl-vnode.c
@@ -845,13 +845,12 @@ spl_vn_fini(void)
                leaked++;
        }
 
-       kmem_cache_destroy(vn_file_cache);
-       vn_file_cache = NULL;
        spin_unlock(&vn_file_lock);
 
        if (leaked > 0)
                SWARN("Warning %d files leaked\n", leaked);
 
+       kmem_cache_destroy(vn_file_cache);
        kmem_cache_destroy(vn_cache);
 
        SEXIT;

dechamps · 2012-06-11T12:32:41Z

Your patch fixes the issue, thanks.

By the way, sorry for posting this issue in the ZFS project, I should have posted it under SPL.

In the module unload path the vm_file_cache was being destroyed under a spin lock. Because this operation might sleep it was possible, although very very unlikely, that this could result in a deadlock. This issue was indentified by using a Linux debug kernel and has been fixed by moving the kmem_cache_destroy() out from under the spin lock. There is no need to lock this operation here. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs/zfs#771

behlendorf · 2012-06-11T19:09:03Z

No problem, I've just merged this simple fix.

Bumps [config](https://github.com/mehcode/config-rs) from 0.13.2 to 0.13.3. - [Release notes](https://github.com/mehcode/config-rs/releases) - [Changelog](https://github.com/mehcode/config-rs/blob/master/CHANGELOG.md) - [Commits](rust-cli/config-rs@0.13.2...0.13.3) --- updated-dependencies: - dependency-name: config dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

behlendorf closed this as completed Jun 11, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rmmod: "sleeping function called from invalid context" #771

rmmod: "sleeping function called from invalid context" #771

dechamps commented Jun 5, 2012

behlendorf commented Jun 8, 2012

dechamps commented Jun 11, 2012

behlendorf commented Jun 11, 2012

rmmod: "sleeping function called from invalid context" #771

rmmod: "sleeping function called from invalid context" #771

Comments

dechamps commented Jun 5, 2012

behlendorf commented Jun 8, 2012

dechamps commented Jun 11, 2012

behlendorf commented Jun 11, 2012