-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock with CONFIG_DEBUG_MUTEX (ArchLinux) #167
Comments
Thanks for the bug report. There is one well known deadlock which can occur but it's unfortunately an upstream kernel bug. If your comfortable with rebuilding your kernel you can try the following patch. https://bugzilla.kernel.org/attachment.cgi?id=50802 In the meanwhile I've been investigating a way to avoid the issue without patching the kernel. |
I do not see any vmalloc in the tracebacks. Are you sure this is related to this kernel bug? |
I didn't either, but the traces also looked incomplete so it was an educated guess. What was there for the traces didn't clearly show the root cause |
the same happens to me 2 times: mount: action that cause the deadlock: [kki@l0cutus ~]$ cp /home/kki/.jd/downloads/* /mnt/tank/fish1/ and in that position it stopped, the only solution was a reboot. kernel: 2.6.37 thanks ! |
I have been following the long long thread on Issue 154. Claims there are that we don't need this kernel patch anymore with recent code. I tested most recent code from yesterday in my VirtualBox play environment (2GB ram), but I still see this deadlock, simply by running around 3 bonnie++ on a fresh ZFS. Unfortunately bugzilla is down. Could someone drop the patch from above in a different location? Is that the patch in question http://permalink.gmane.org/gmane.linux.kernel.mm/59658 ? I would highly appreciate a two line update on that issue. Thank you! |
Are you sure you are still seeing this deadlock? There are still a couple slightly differently deadlocks lurking particularly for machines with only 2GB of ram. That's one of the major reasons I'm still calling this a release candidate. If you can open a new issue with the console stacks from your deadlock I'm happy to take a look. |
I am sorry. Where are my manners? Here is a full log: It has the same anatomy as the stack trace posted in this issue. The ZFS volume was fresh from zpool create + zfs primarycache=none. |
This was determined to be a different deadlock than that described in #154. Your stack matches the one in this bug, but the original root cause determination was wrong. In this case the deadlock looks to be related to dsl_dir_diduse_space(). From your stacks. z_wr_int/4:1099 mutex_lock+0x11/0x30 dsl_dir_diduse_space+0x119/0x160 [zfs] dsl_dataset_block_born+0x132/0x200 [zfs] dbuf_write_done+0x1a4/0x250 [zfs]
This results in a deadlock. It's not clear how this is possible since !MUTEX_HELD() works well in the rest of the code. However, that's what the stack clearly shows. We'll make sure we dig in to how this can happen but it may take us a little bit to get to it. |
Thank you for the ultra quick response. Just to make sure that I get what you are saying (I have trouble following multiple uses of "this"):
It would be interesting for me, if other users have that problem. I simply have to run 3 concurrent bonnie++-s and I almost immediately run into this deadlock. If your system survives that test, please leave me a note. I would really like to get to the source of a potential workaround. (My kernel is PREEMPT_NONE btw - if that would help, I can provide my VirtualBox vdi image) |
You have it exactly right. Thus far I've only heard of one or two other people who have encountered this. Since it hasn't been that common it has not moved to the top of the list for things to get fixed. I personally haven't seen this while running bonnie++ so it's likely related to how your kernel is built. I see your running a 2.6.37 kernel with archlinux. Since this looks like it's related to MUTEX_HELD behaving incorrectly it would be useful to know how HAVE_MUTEX_OWNER is set in the spl_config.h. |
I am also using the Arch kernel with PREEMPT_NONE btw. This may be linked somehow. |
I was suspecting the arch kernel too, so I ran inconclusive systematic tests two months ago with the KQ branch, where I ran into this problem too: https://spreadsheets2.google.com/a/endorphin.org/ccc?hl=en&hl=en&key=t9tjFCJ9fngkBzHFBXk9D3w&authkey=CIKig4MK#gid=0 The only configuration I found sorta stable was Ubuntu kernel + Ubuntu config + 1 CPU. At some point however, I found that configuration failing too, and I gave up in frustration. Unfortunately, I don't have test hardware available to do more systematic test... maybe I can buy some. I want to get this working so desperately. |
For what it's worth I'm using Darik's Ubuntu PPA for my home NAS and it's working quite well. He does a nice job tracking the latest fixes which haven't made it in to an -rc tag just yet. Back to this bug we'll get it fixed but to be honest we haven't focused on it yet. It sounds like what I need to do is install a VM for ARCH Linux and do a little testing there. There are quite a few ARCH users now it would be good to ensure it works for them. |
I digged some more: http://clemens.endorphin.org/arch-kill-diff - that's the minimal configuration diff I could find that discriminates lockup and no-lockup. It is basically CONFIG_DEBUG_MUTEXES=n which triggers a lot of interesting other defines including: CONFIG_MUTEX_SPIN_ON_OWNER=y, CONFIG_INLINE_(SPIN|READ|WRITE)_UNLOCK(_IRQ)?=y. Result:
I am not sure, if the more aggressive mutex setting just replaces a lock-up with a race... :/ |
Thanks for doing the leg work on this. CONFIG_DEBUG_MUTEX makes sense as the likely cause since of what's causing the deadlock. My hunch is that for some reason this causes MUTEX_HELD to return incorrect results which will trigger the deadlock. As for the segfault that sounds like it may be a different issue. Running ZFS in the kernel would be hard pressed to cause a user application to segfault. All it can do is make a system call behave incorrectly. This could lead to a segfault if the application doesn't correctly handle the system call error. |
after i've compiled the latest archlinux kernel and latest spl/zfs git, rsync doesn't 'die' anymore, BUT system become unusable (too much cpu&ram intensive), same operation Number of files: 29940 sent 2.91G bytes received 508.15K bytes 6.31M bytes/sec [root@l0cutus ~]# zpool get all test NAME PROPERTY VALUE SOURCE [root@l0cutus ~]# zfs get all test NAME PROPERTY VALUE SOURCE [root@l0cutus ~]# uname -a Linux l0cutus 2.6.37-ARCH #1 SMP PREEMPT Fri Mar 25 15:10:00 CET 2011 x86_64 Genuine Intel(R) CPU U2300 @ 1.20GHz GenuineIntel GNU/Linux |
behlendorf: It's most likely a bug in bonnie++ that's causing the segfault. It's just handling the I/O error incorrectly. And btw I changed to 4GB on bare metal (no VirtualBox) for this testing. |
I can confirm this problem on 2.6.38-ARCH / zfs 0.6.0rc4 (8GB on bare metal with 3x2TB in raidz). I can also confirm that using a kernel configured with clefru's config remedies the situation (thank you clefru!). |
After rereading the history in this bug I'm going to reopen it. While we have a work around I'd rather have a real fix. |
Actually I did not realize I closed it with my last comment. May have clicked on "Comment & Close" instead of "Comment" when replying... |
I can reliably reproduce this with "/usr/local/libexec/zfs/zconfig.sh -c -t 5". The problem is that we're clearing the mutex owner while not holding the mutex: Thread 1: Thread 2: Thread 1: Thread 2: |
Indeed, your right... I remember this case. When I was working on this code I wasn't able to 100% close this race in mutex_exit() when (HAVE_MUTEX_OWNER && CONFIG_SMP && CONFIG_DEBUG_MUTEXES) are set. The problem is that when CONFIG_DEBUG_MUTEXES is set mutex_unlock() doesn't clear the owner, so I clear it just after the lock is dropped (hense the race). We could clear it just before the lock is dropped however I think there may be a substantial time interval when the lock is then still technically held but has no owner. Still that may be preferable. If you have a solid reproducer could you try this. diff --git a/include/sys/mutex.h b/include/sys/mutex.h index 659214f..9cad228 100644 --- a/include/sys/mutex.h +++ b/include/sys/mutex.h @@ -84,8 +84,8 @@ mutex_owner(kmutex_t *mp) #ifdef CONFIG_DEBUG_MUTEXES # define mutex_exit(mp) \ ({ \ - mutex_unlock(&(mp)->m); \ (mp)->m.owner = NULL; \ + mutex_unlock(&(mp)->m); \ }) #else # define mutex_exit(mp) mutex_unlock(&(mp)->m) |
Well, this almost works - except for the fact where debug_mutex_unlock() checks the lock's owner (which by that time has been set to NULL):
I don't think we can safely rely on the m.owner variable in this case. Clearing it before mutex_unlock() would cause warnings - while clearing it afterwards causes deadlocks. Here's another option (i.e. using the m_owner variable):
I believe we can also safely remove the spinlocks in spl_mutex_set_owner/spl_mutex_clear_owner because these functions are only called while holding the mutex. mutex_owner() could be made safe using ACCESS_ONCE (instead of the spinlock). Another option would be to unify the mutex codebase and get rid of the m.owner special case. But I guess the additional memory accesses for m_owner might have a bit of a performance impact. This would need further testing. (Oh, and btw: the MUTEX #define in mutex.h:104 is weird - it works as intended due to how the struct is laid-out in memory but should probably be changed to match the kmutex_t struct). Hmm, I guess I should set up a branch for this and get started on those patches. :) |
Ahh yes, it's all coming back to me! Since we can't seem to safely use the existing m.owner when CONFIG_DEBUG_MUTEXES is set it does seem like the only safe thing to do is fallback to the !HAVE_MUTEX_OWNER compatibility implementation. This is clearly the common case anyway since most default distribution kernels are built with this disabled. Gunnar, if you can propose and test two patches for these mutex fixes I'm happy to retest and commit them. Proposed changes:
As for the MUTEX define on 104.4 I don't think that's too weird, By design we are relying on how this structure get's organized in memory. |
Actually, it looks like wait_lock isn't held while the mutex is locked and other threads will lock/unlock wait_lock when they're trying to acquire the mutex - which means spin_trylock() is actually likely to fail in spl_mutex_set_owner/spl_mutex_clear_owner. I'll probably have the patches finished sometime tomorrow. |
Everything looks good, I've reviewed and merged the fixes in to the spl and my usual regression test suite passed on the 12 distributions I always test. Close this issue, thanks Gunnar! |
On kernels with CONFIG_DEBUG_MUTEXES mutex_exit() clears the mutex owner after releasing the mutex. This would cause mutex_owner() to return an incorrect owner if another thread managed to lock the mutex before mutex_exit() had a chance to clear the owner. Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS issue #167
See dechamps/zfs@cc6cd40 for details. This harmless addition was merged to simplify testing the ZFS TRIM support patches. Signed-off-by: Brian Behlendorf <[email protected]> Closes #167
Signed-off-by: Pawan <[email protected]>
The ZFS module deadlocked while I was rsync-ing a large (35GB) file to a ZFS mountpoint.
I'm using a 2.6.38 kernel with the last Git version of SPL and ZFS.
Here is the backtrace shown in dmesg :
The text was updated successfully, but these errors were encountered: