-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txg_sync thread stuck in zio_wait() for more than 12 hours with no pending/running IO #2850
Comments
I've seen this rarely and it's been a hard to reproduce. I'd stack by dumping all the kernel stacks and then taking a close look at the various |
I checked all the ZFS threads. Only z_null_iss/0 looked suspicious - it's always in running state but its stack always stays the same:
|
This might be another instance of #2523 which has been difficult to pin down. Usually in this case you'll be able to identify a process spinning in mutex_unlock_slowpath. |
Thanks, I'll try /pull/2828. Good news is we hit it quite often (at about 6%) during a particular test. |
I've been looking for a good reproducer. What's the easiest way to hit this in a VM with Lustre? |
It's most often hit in sanity test_132, more details in LU-5242. Also I was able to hit it a couple of times with VMs configured with only 1 CPU, so it's llikely not caused by any race. |
Is this kernel log a symptom of this bug?
or should I open a separate bug for this? Happened under low load as far as I can tell, couple of days in a row. Required actually hard powering down the machine to reboot every time (X hang after it, which I needed, so...). Might be worth mentioning I've had this running smoothly on Arch Linux for ages, until kernel 3.17 came out and the zfs package got updated to a newer git tag (to support 3.17 kernels). |
@thegreatgazoo if you get a chance could you verify that the fix in openzfs/spl#421 resolves this hang. |
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #421
This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article. openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races |
Commit: openzfs/zfs@a3c1eb7 From: Chunwei Chen <[email protected]> Date: Fri, 19 Dec 2014 11:31:59 +0800 Subject: mutex: force serialization on mutex_exit() to fix races It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Backported-by: Darik Horn <[email protected]> Closes #421 Conflicts: include/sys/mutex.h
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #421
@behlendorf We tested /pull/2828 which didn't eliminate the timeouts. We're now testing openzfs/spl@a3c1eb7, will report back. |
That's what I'd expect. It was unlikely that you were hitting the issue which the first patch addressed. However, I'm confident the second patch will resolve the issue. I'd also suggest you audit the Lustre code for similar issues. It wouldn't at all surprise me if Lustre is using a mutex in a similar unsafe fashion. In fact, we've seen very similar symptoms on some Lustre clients, and those symptoms went away by enabling mutex debugging which is consistent with this kind of issue. |
y, I just screamed at everyone to scrutinize their code. |
@behlendorf @justinkb's issue is not a duplicate of #2523. Instead, it is a duplicate of #3091, provided that it occurred on a multi-socket system. |
@behlendorf @ryao Let me explain - Our Setup: In the above setup for ZFS umount, we see txg_sync hung for long using a lot of CPU and memory. Do you think the two are related? Should I open a new bug for it? |
Could you file a new issue for the remaining bug. That would make it easier to track. |
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#421 Conflicts: include/sys/mutex.h
It happened on CentOS 6.5 2.6.32-431.29.2, ZFS version 0.6.3-1 srcversion 533BB7E5866E52F63B9ACCB. The txg_sync thread's hung for more than 12 hours now:
But /proc/spl/kstat/zfs/lustre-ost2/io showed wcnt=0 and rcnt=0, which I took to mean that there's no ZIO queued at vdev (the pool sits on a single /dev/vda). The /proc/diskstats also showed 0 I/Os currently in progress. I checked zpool events -v, and found nothing about any zio. The last event on the pool was:
But the hang began at about Oct 29 2014 01:40.
The system still has 1/3 memory free, of 1.9G total; arcsz is only 20M. Swap is not on ZFS. So it's probably not a VM related deadlock.
I hit this several times already, but this time I'd be able to keep the system live for as long as it'd take to trouble shoot. Any hint on debugging?
The text was updated successfully, but these errors were encountered: