-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write IO hangs when uiomove triggers page fault in zfs_write context. #7512
Comments
@YanyunGao please try latest stable version 0.7.8 first. |
@gmelikov Thanks for quick reply. Could you hint us which potential fixs are relevant to the issue. |
I noticed that there is a comment in the code dmu.c. dmu_write_uio_dnode():
It seems we hit the case, uiomove blocks forever in our case. The issue report lists two back traces. One is page_fault context, the other is syscall write context. page_fault context hold the semaphore but stuck in tx_wait_open (wait for quiesce txg complete, unfortunately, the quiesce txg never complete because of the other back trace). The syscall write context increases tc_count of quiescing txg but fails to down read semphore, in turns block quiesce thread. It looks like a deadlock. |
0.7 branch has a big bunch of changes, and one of them - ABD 7657def . If there is a problem, which is And, secondly, I think it's a good tone to try fresh stable version first. You can look at #7339 too, it looks similar. |
Thanks for pointing out #7339, it's relevant but the it won't fix our problem. The interesting thing is I found we might hit the same symptoms as removing the prefault code @trisk mentioned in the comments of #7339. In our case, prefault is called absolutely, so I think it's the case the page is evicted after prefault. I saw there is a proposal to create a separate issue to track this kind of issue. |
After dive into the detail of the problem, we figured out the issue more clearly. This is a dead lock case. The dead lock is about mm_sem of two threads context of a process. The application has two threads, one is doing write syscall, the other thread is manipulating memory address which is mapped to a file.
So context #1 and context #2 trap into the dead lock. If my understand is correct, prefaulted page being faulted in step 3 is the criminal of the deadlock.
} How do you guys think? |
The bug time sequence: 1. context openzfs#1, zfs_write assign a txg "n". 2. In a same process, context openzfs#2, mmap page fault (which means the `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed, and wait previous txg "n" completed. 3. context openzfs#1 call `uiomove` to write, however page fault is occurred in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by context openzfs#2, so it stuck and can't complete, then txg "n" will not complete. So context openzfs#1 and context openzfs#2 trap into the "dead lock". Signed-off-by: Grady Wong <[email protected]> Closes openzfs#7512
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
Write IO hangs on a pool with a lot of backtraces in sys log. One write request stuck in context of page_fault which causes the quiesce thread never complete. It turns out following txg cannot be synced.
Describe how to reproduce the problem
We hit the problem twice in one week but yet have no idea how to reproduce it.
There are mixed payloads with mmap and regular read/write operations. We noticed that the problem could be easy to reproduce when there is no too much memory left.
Include any warning/errors/backtraces from the system logs
May 7 14:39:28 xt4 kernel: ffff88074283b918 0000000000000086 ffff880814623ec0 ffff88074283bfd8
May 7 14:39:28 xt4 kernel: ffff88074283bfd8 ffff88074283bfd8 ffff880814623ec0 ffff880814623ec0
May 7 14:39:28 xt4 kernel: ffff88046d70de38 ffffffffffffffff ffff88046d70de40 ffff88046d70ddc0
May 7 14:39:28 xt4 kernel: Call Trace:
May 7 14:39:28 xt4 kernel: [] schedule+0x29/0x70
May 7 14:39:28 xt4 kernel: [] rwsem_down_read_failed+0xf5/0x170
May 7 14:39:28 xt4 kernel: [] call_rwsem_down_read_failed+0x18/0x30
May 7 14:39:28 xt4 kernel: [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
May 7 14:39:28 xt4 kernel: [] down_read+0x20/0x30
May 7 14:39:28 xt4 kernel: [] __do_page_fault+0x380/0x450
May 7 14:39:28 xt4 kernel: [] do_page_fault+0x35/0x90
May 7 14:39:28 xt4 kernel: [] page_fault+0x28/0x30
May 7 14:39:28 xt4 kernel: [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
May 7 14:39:28 xt4 kernel: [] ? copy_user_enhanced_fast_string+0x9/0x20
May 7 14:39:28 xt4 kernel: [] ? uiomove+0x155/0x2b0 [zcommon]
May 7 14:39:28 xt4 kernel: [] dmu_write_uio_dnode+0x90/0x150 [zfs]
May 7 14:39:28 xt4 kernel: [] dmu_write_uio_dbuf+0x4f/0x70 [zfs]
May 7 14:39:28 xt4 kernel: [] zfs_write+0xb31/0xc40 [zfs]
May 7 14:39:28 xt4 kernel: [] ? nvlist_lookup_common.part.71+0xa2/0xb0 [znvpair]
May 7 14:39:28 xt4 kernel: [] ? nvlist_lookup_byte_array+0x26/0x30 [znvpair]
May 7 14:39:28 xt4 kernel: [] ? strfree+0xe/0x10 [spl]
May 7 14:39:28 xt4 kernel: [] ? vfs_getxattr+0x88/0xb0
May 7 14:39:28 xt4 kernel: [] zpl_write_common_iovec.constprop.8+0x95/0x100 [zfs]
May 7 14:39:28 xt4 kernel: [] zpl_write+0x7e/0xb0 [zfs]
May 7 14:39:28 xt4 kernel: [] vfs_write+0xbd/0x1e0
May 7 14:39:28 xt4 kernel: [] SyS_pwrite64+0x92/0xc0
May 7 14:39:28 xt4 kernel: [] system_call_fastpath+0x16/0x1b
May 7 14:39:28 xt4 kernel: glusterfsd D 0000000000000000 0 28771 29196 0x00000080
May 7 14:39:28 xt4 kernel: ffff880561ffb9b8 0000000000000086 ffff8808652b5e20 ffff880561ffbfd8
May 7 14:39:28 xt4 kernel: ffff880561ffbfd8 ffff880561ffbfd8 ffff8808652b5e20 ffff88046f406b68
May 7 14:39:28 xt4 kernel: ffff88046f406a20 ffff88046f406b70 ffff88046f406a48 0000000000000000
May 7 14:39:28 xt4 kernel: Call Trace:
May 7 14:39:28 xt4 kernel: [] schedule+0x29/0x70
May 7 14:39:28 xt4 kernel: [] cv_wait_common+0x125/0x150 [spl]
May 7 14:39:28 xt4 kernel: [] ? wake_up_atomic_t+0x30/0x30
May 7 14:39:28 xt4 kernel: [] __cv_wait+0x15/0x20 [spl]
May 7 14:39:28 xt4 kernel: [] txg_wait_open+0xc3/0x110 [zfs]
May 7 14:39:28 xt4 kernel: [] dmu_tx_wait+0x3a8/0x3c0 [zfs]
May 7 14:39:28 xt4 kernel: [] dmu_tx_assign+0x9a/0x510 [zfs]
May 7 14:39:28 xt4 kernel: [] zfs_dirty_inode+0xf7/0x320 [zfs]
May 7 14:39:28 xt4 kernel: [] ? dequeue_entity+0x11c/0x5d0
May 7 14:39:28 xt4 kernel: [] ? __d_lookup+0x120/0x160
May 7 14:39:28 xt4 kernel: [] ? dequeue_task_fair+0x41e/0x660
May 7 14:39:28 xt4 kernel: [] ? sched_clock_cpu+0x85/0xc0
May 7 14:39:28 xt4 kernel: [] ? __switch_to+0x15c/0x4c0
May 7 14:39:28 xt4 kernel: [] zpl_dirty_inode+0x2c/0x40 [zfs]
May 7 14:39:28 xt4 kernel: [] __mark_inode_dirty+0xca/0x290
May 7 14:39:28 xt4 kernel: [] update_time+0x81/0xd0
May 7 14:39:28 xt4 kernel: [] ? __sb_start_write+0x58/0x110
May 7 14:39:28 xt4 kernel: [] file_update_time+0xa0/0xf0
May 7 14:39:28 xt4 kernel: [] filemap_page_mkwrite+0x40/0xb0
May 7 14:39:28 xt4 kernel: [] do_page_mkwrite+0x54/0xa0
May 7 14:39:28 xt4 kernel: [] do_shared_fault.isra.46+0x77/0x1e0
May 7 14:39:28 xt4 kernel: [] handle_mm_fault+0x61e/0x1000
May 7 14:39:28 xt4 kernel: [] ? do_futex+0x122/0x5b0
May 7 14:39:28 xt4 kernel: [] __do_page_fault+0x154/0x450
May 7 14:39:28 xt4 kernel: [] do_page_fault+0x35/0x90
May 7 14:39:28 xt4 kernel: [] page_fault+0x28/0x30
The text was updated successfully, but these errors were encountered: