-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txg_sync hung in zio_wait() #1928
Comments
Record the type, priority, and reexecute fields of a ZIO in the zevent log. This may provide helpful debugging information if a ZIO is hung. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#1928
With the above patch applied I reproduced the issue and in addition to zio_flags, zio_stage, zio_pipeline we have:
|
If a zio hangs log its address in the zevent to we can chase it with a kernel debugger. This is a debug-only patch since we wouldn't normally want user tools to expose kernel memory addresses. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#1928
I reproduced this and got a dump of the missing zio using crash. |
@nedbass Now that is interesting. The IO which is hung is a child IO for the mirror, and it appears to never have been dispatched to the Linux block layer. That means that either What should happen is If you can reproduce this it would be interested to instrument the error paths in |
Also, this strikes me as suspicious:
especially since the pool is not a mirror but a single file vdev. Should that not be |
It seems like the taskq threads are deadlocking trying to do I/O during memory reclaim. I have a system in this state, and two
So if Should |
The vdev_file_io_start() function may be processing a zio that the txg_sync thread is waiting on. In this case it is not safe to perform memory allocations that may generate new I/O since this could cause a deadlock. To avoid this, call taskq_dispatch() with TQ_PUSHPAGE instead of TQ_SLEEP. Issue openzfs#1928
When the txg_sync thread issues an I/O and blocks in zio_wait(). Then it is critical that all processes involved in handling that I/O use KM_PUSHPAGE when performing allocations. If they use KM_SLEEP then it is possible that during a low memory condition direct reclaim will be invoked and it may attempt to flush dirty data to the file system. This will result in the thread attempting to assign a TX will block until the txg_sync thread completes. The end result is a deadlock with the txg_sync thread blocked on the original I/O, and the taskq thread blocked on the txg_sync thread. To prevent developers from accidentally introducing this type of deadlock. This patch passes the TQ_PUSHPAGE_THREAD flag when dispatching a I/O to a taskq which the txg_sync thread is waiting on. This causes the taskq thread to set the PF_NOFS flag in its task struct while the zio_execute() function is being executed. Thereby ensuring that any accidental misuse of the KM_SLEEP is quickly causes and fixed. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#1928
When the TQ_PUSHPAGE_THREAD flag is passed to taskq_dispatch() then the PF_NOFS bit will be bit will be set while the passed function is executing. This makes it possible to detect when the KM_SLEEP is being used inappropriately even for delegated work. Signed-off-by: Brian Behlendorf <[email protected]> openzfs/zfs#1928
@nedbass Yes, that's exactly what's going on. I've pushed two additional patches for review which extend the existing infrastructure so we can automatically detect cases like this. |
After fixing the above
|
@nedbass Both the zfs_iput_taskq and kswapd0 threads looks like additional collateral damage to me. Neither should be holding any resources which will prevent the I/O from completing. |
When the txg_sync thread issues an I/O and blocks in zio_wait(). Then it is critical that all processes involved in handling that I/O use KM_PUSHPAGE when performing allocations. If they use KM_SLEEP then it is possible that during a low memory condition direct reclaim will be invoked and it may attempt to flush dirty data to the file system. This will result in the thread attempting to assign a TX will block until the txg_sync thread completes. The end result is a deadlock with the txg_sync thread blocked on the original I/O, and the taskq thread blocked on the txg_sync thread. To prevent developers from accidentally introducing this type of deadlock. This patch passes the TQ_PUSHPAGE_THREAD flag when dispatching a I/O to a taskq which the txg_sync thread is waiting on. This causes the taskq thread to set the PF_NOFS flag in its task struct while the zio_execute() function is being executed. Thereby ensuring that any accidental misuse of the KM_SLEEP is quickly causes and fixed. Finally, this patch addresses the vdev_file_io_start() function which may be processing a zio that the txg_sync thread is waiting on. In this case it is not safe to perform memory allocations that may generate new I/O since this could cause a deadlock. To avoid this, call taskq_dispatch() with TQ_PUSHPAGE instead of TQ_SLEEP. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#1928
The vdev_file_io_start() function may be processing a zio that the txg_sync thread is waiting on. In this case it is not safe to perform memory allocations that may generate new I/O since this could cause a deadlock. To avoid this, call taskq_dispatch() with TQ_PUSHPAGE instead of TQ_SLEEP. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1928
@nedbass I've merged your initial KM_PUSHPAGE fix in to master which is clearly correct and addresses the first deadlock described above. I've going to hold off merging any of the more involved debugging patches until we have a better handle of the other similar issue you encountered. 04aa2de vdev_file_io_start() to use taskq_dispatch(TQ_PUSHPAGE) |
The vdev_file_io_start() function may be processing a zio that the txg_sync thread is waiting on. In this case it is not safe to perform memory allocations that may generate new I/O since this could cause a deadlock. To avoid this, call taskq_dispatch() with TQ_PUSHPAGE instead of TQ_SLEEP. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#1928
@ryao I suspect you're right, but I'll need to spend more time looking at this to say for certain. However, we have stopped seeing this failure which does suggest it was fixed. |
Closing. This was almost certainly addressed by openzfs/spl@a3c1eb7. |
I've been working on a system with @eolson78 which shows virtually identical delay events and very similar stacks. It can be reproduced rather easily on a fairly large EC2 instance to which a pair of NFS clients are running the "fileserver" workload from filebench and also on which the snapshots and zfs send operations are being performed. The blocked process stacks from syslog and the event log are logs area available at https://gist.github.com/dweeezil/c17b9b759935bec0045a. The installation of ZFS is mostly 0.6.3 but with the mutex_exit serialization and "pipeline invoke next stage" patches added as well as a few other post-0.6.3 patches. It should very closely match the code at https://github.com/dweeezil/zfs/tree/softnas and https://github.com/dweeezil/spl/tree/softnas. After the IO is blocked for a sufficiently long time, the NFS clients are disconnected and once that happens, the system returns to normal operation fairly quickly. This may certainly be a completely unrelated problem but since it looked so similar to this issue, I figured a followup to it would be warranted even though the issue is closed. |
Record the type, priority, and reexecute fields of a ZIO in the zevent log. This may provide helpful debugging information if a ZIO is hung. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#1928
If a zio hangs log its address in the zevent to we can chase it with a kernel debugger. This is a debug-only patch since we wouldn't normally want user tools to expose kernel memory addresses. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#1928
This is probably unrelated, but I thought it worth mentioning that EC2 does do things like IO limiting. Perhaps they're doing something else in their IO stack which is causing the issue. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html |
@behlendorf I'm not sure it's unrelated at all. I've become concerned about either the Xen blkfront driver, itself, or a related problem on its mating driver on the back end (rate limiting?). During one of the apparent hang periods, I noticed the inflight counters for the (single vdev) xvd device seemingly "stuck":
and a suspiciously matching counter in the io kstat for the pool:
I posted the followup to this closed issue because it appeared to almost precisely match @nedbass' original data points: stacks, zio pipeline & flags, etc. It just dawned on me that I might try some non-ZFS I/O to the same device during these periods where the I/O seems to be hung up. |
In testing #1696 a virtual machine running Fedora Core 19 hung running the filebench test. The
txg_sync
thread is stuck inzio_wait()
:Here are zpool events -v and task state output.
Decoding some of the zio fields from the zevent data, we have
The text was updated successfully, but these errors were encountered: