Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rmmod: "sleeping function called from invalid context" #771

Closed
dechamps opened this issue Jun 5, 2012 · 3 comments
Closed

rmmod: "sleeping function called from invalid context" #771

dechamps opened this issue Jun 5, 2012 · 3 comments
Milestone

Comments

@dechamps
Copy link
Contributor

dechamps commented Jun 5, 2012

When rmmod'ing zfs-0.6.0-rc8 on a kernel with debug facilities enabled:

 BUG: sleeping function called from invalid context at kernel/rwsem.c:48
 in_atomic(): 1, irqs_disabled(): 0, pid: 5983, name: rmmod
 1 lock held by rmmod/5983:
 #0:  (vn_file_lock){+.+...}, at: [<ffffffffa000d24d>] spl_vn_fini+0x2d/0x1e0 [spl]
 Pid: 5983, comm: rmmod Tainted: G           O 3.2.18--std-ipv6-64 #5
 Call Trace:
 [<ffffffff8107fcbf>] __might_sleep+0xdf/0x110
 [<ffffffff81b766ef>] down_write+0x1f/0x70
 [<ffffffff810e7a29>] ? __call_rcu+0xb9/0x170
 [<ffffffffa00080a0>] spl_kmem_cache_destroy+0x90/0x600 [spl]
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffffa000d24d>] ? spl_vn_fini+0x2d/0x1e0 [spl]
 [<ffffffffa000d3e0>] spl_vn_fini+0x1c0/0x1e0 [spl]
 [<ffffffffa00025ca>] ? spl_proc_fini+0x7a/0x150 [spl]
 [<ffffffffa00122ba>] spl_fini+0x5a/0xc0 [spl]
 [<ffffffff81b75429>] ? mutex_unlock+0x9/0x10
 [<ffffffff810ca8ac>] sys_delete_module+0x17c/0x270
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b7867b>] system_call_fastpath+0x16/0x1b
 BUG: scheduling while atomic: rmmod/5983/0x00000002
 1 lock held by rmmod/5983:
 #0:  (vn_file_lock){+.+...}, at: [<ffffffffa000d24d>] spl_vn_fini+0x2d/0x1e0 [spl]
 Modules linked in: spl(O-) [last unloaded: zunicode]
 Pid: 5983, comm: rmmod Tainted: G           O 3.2.18--std-ipv6-64 #5
 Call Trace:
 [<ffffffff81081e67>] __schedule_bug+0x77/0x80
 [<ffffffff81b74a5f>] __schedule+0x9af/0x9e0
 [<ffffffff810b978f>] ? save_trace+0x3f/0xc0
 [<ffffffff810bddda>] ? __lock_acquire+0x127a/0x1d80
 [<ffffffff81b74b6a>] schedule+0x3a/0x60
 [<ffffffff81b74fb5>] schedule_timeout+0x1b5/0x210
 [<ffffffff81b77b8b>] ? _raw_spin_unlock_irq+0x2b/0x50
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b73e67>] wait_for_common+0xd7/0x180
 [<ffffffff81b7538b>] ? __mutex_unlock_slowpath+0xcb/0x160
 [<ffffffff810842c0>] ? try_to_wake_up+0x2e0/0x2e0
 [<ffffffff810bb29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81b73fb8>] wait_for_completion+0x18/0x20
 [<ffffffff810a14c0>] flush_workqueue+0x230/0x560
 [<ffffffff810a1290>] ? cancel_work_sync+0x10/0x10
 [<ffffffff810a1800>] flush_scheduled_work+0x10/0x20
 [<ffffffffa000813d>] spl_kmem_cache_destroy+0x12d/0x600 [spl]
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffffa000d24d>] ? spl_vn_fini+0x2d/0x1e0 [spl]
 [<ffffffffa000d3e0>] spl_vn_fini+0x1c0/0x1e0 [spl]
 [<ffffffffa00025ca>] ? spl_proc_fini+0x7a/0x150 [spl]
 [<ffffffffa00122ba>] spl_fini+0x5a/0xc0 [spl]
 [<ffffffff81b75429>] ? mutex_unlock+0x9/0x10
 [<ffffffff810ca8ac>] sys_delete_module+0x17c/0x270
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b7867b>] system_call_fastpath+0x16/0x1b
 BUG: sleeping function called from invalid context at kernel/mutex.c:271
 in_atomic(): 1, irqs_disabled(): 0, pid: 5983, name: rmmod
 1 lock held by rmmod/5983:
 #0:  (vn_file_lock){+.+...}, at: [<ffffffffa000d24d>] spl_vn_fini+0x2d/0x1e0 [spl]
 Pid: 5983, comm: rmmod Tainted: G           O 3.2.18--std-ipv6-64 #5
 Call Trace:
 [<ffffffff8107fcbf>] __might_sleep+0xdf/0x110
 [<ffffffff81b7610c>] mutex_lock_nested+0x3c/0x370
 [<ffffffff810842c0>] ? try_to_wake_up+0x2e0/0x2e0
 [<ffffffff810bb29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810a156a>] flush_workqueue+0x2da/0x560
 [<ffffffff810a1290>] ? cancel_work_sync+0x10/0x10
 [<ffffffff810a1800>] flush_scheduled_work+0x10/0x20
 [<ffffffffa000813d>] spl_kmem_cache_destroy+0x12d/0x600 [spl]
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffffa000d24d>] ? spl_vn_fini+0x2d/0x1e0 [spl]
 [<ffffffffa000d3e0>] spl_vn_fini+0x1c0/0x1e0 [spl]
 [<ffffffffa00025ca>] ? spl_proc_fini+0x7a/0x150 [spl]
 [<ffffffffa00122ba>] spl_fini+0x5a/0xc0 [spl]
 [<ffffffff81b75429>] ? mutex_unlock+0x9/0x10
 [<ffffffff810ca8ac>] sys_delete_module+0x17c/0x270
 [<ffffffff810bb1fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81b7867b>] system_call_fastpath+0x16/0x1b
@behlendorf
Copy link
Contributor

On first glance that doesn't make sense... the call path referenced shouldn't be an atomic context. It's the generic shutdown path where it's entirely reasonable to sleep. However, looking more closely at it I see what it's rightfully upset about. It looks like we're destroying the kmem_cache under a spin lock which is an atomic context so we do risk a very unlikely deadlock here.

Can you try making the following change to the spl to address the issue.

diff --git a/module/spl/spl-vnode.c b/module/spl/spl-vnode.c
index cd0fa2c..2e55b00 100644
--- a/module/spl/spl-vnode.c
+++ b/module/spl/spl-vnode.c
@@ -845,13 +845,12 @@ spl_vn_fini(void)
                leaked++;
        }
 
-       kmem_cache_destroy(vn_file_cache);
-       vn_file_cache = NULL;
        spin_unlock(&vn_file_lock);
 
        if (leaked > 0)
                SWARN("Warning %d files leaked\n", leaked);
 
+       kmem_cache_destroy(vn_file_cache);
        kmem_cache_destroy(vn_cache);
 
        SEXIT;

@dechamps
Copy link
Contributor Author

Your patch fixes the issue, thanks.

By the way, sorry for posting this issue in the ZFS project, I should have posted it under SPL.

behlendorf added a commit to openzfs/spl that referenced this issue Jun 11, 2012
In the module unload path the vm_file_cache was being destroyed
under a spin lock.  Because this operation might sleep it was
possible, although very very unlikely, that this could result
in a deadlock.

This issue was indentified by using a Linux debug kernel and
has been fixed by moving the kmem_cache_destroy() out from under
the spin lock.  There is no need to lock this operation here.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs/zfs#771
@behlendorf
Copy link
Contributor

No problem, I've just merged this simple fix.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
Bumps [config](https://github.com/mehcode/config-rs) from 0.13.2 to 0.13.3.
- [Release notes](https://github.com/mehcode/config-rs/releases)
- [Changelog](https://github.com/mehcode/config-rs/blob/master/CHANGELOG.md)
- [Commits](rust-cli/config-rs@0.13.2...0.13.3)

---
updated-dependencies:
- dependency-name: config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants