-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process stuck at zfs_fsync #10440
Comments
by the way - there is a process stuck at the moment, which I did not kill. So if any additional data can be obtained from that I'm happy to help |
thanks, I'll try recompiling with that fix and let you know if it occurs again |
This looks very similar to openzfsonosx/openzfs#3. There's a tiny proposed fix in openzfsonosx/openzfs@adeb452. This code was added in the 0.8.x releases, so that's consistent with your observation that it didn't happen with the earlier release. |
This bug is still relevant for me with current zfs master (as of today); lockdep reveals this, it is probably relevant, I'm investigating further:
|
Proposed fix, testing:
|
Oh, it's not going to be that simple :-D |
After
|
There are three locks at play, when handling ZIL records. First is a zl_issuer_lock, that is supposed to be held, whenever we're issuing a new record to ZIL. Then there is zl_lock to actually protect the in the zilog struct. And then there's a last lock, zcw_lock, which is used to protect the lists of ZIL call-back waiters. The order of locking is supposed to be: zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage in zil_alloc_lwb and zil_commit_waiter_timeout functions. Function zil_commit_waiter_link_lwb goes against this, it is expecting to be entered with zl_lock already held and only then it is taking the zcw_lock; This patch straightens the locking in zil_commit_waiter_link_lwb to take the zl_lock on its own, correcting the locking order. Following is an attached messsage from the Linux lockdep mechanism describing the potential deadlock - in our production, it was a matter of a few hours to hit this one... (redacted to fit 72 cols) systemd-journal/5561 is trying to acquire lock: (&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0 but task is already holding lock: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&zcw->zcw_lock){+.+.}-{4:4}: __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs] zil_commit_impl+0x12b0/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&zilog->zl_lock){+.+.}-{4:4}: __lock_acquire+0x12b6/0x2480 lock_acquire+0xab/0x380 __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_alloc_lwb+0x1df/0x3e0 [zfs] zil_lwb_write_issue+0x265/0x3f0 [zfs] zil_commit_impl+0x577/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&zcw->zcw_lock); lock(&zilog->zl_lock); lock(&zcw->zcw_lock); lock(&zilog->zl_lock); *** DEADLOCK *** 2 locks held by systemd-journal/5561: #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc... #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 Fixes openzfs#7038 Fixes openzfs#10440 Signed-off-by: Pavel Snajdr <[email protected]>
There are three locks at play, when handling ZIL records. First is a zl_issuer_lock, that is supposed to be held, whenever we're issuing a new record to ZIL. Then there is zl_lock to actually protect the in the zilog struct. And then there's a last lock, zcw_lock, which is used to protect the lists of ZIL call-back waiters. The order of locking is supposed to be: zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage in zil_alloc_lwb and zil_commit_waiter_timeout functions. Function zil_commit_waiter_link_lwb goes against this, it is expecting to be entered with zl_lock already held and only then it is taking the zcw_lock; This patch straightens the locking in zil_commit_waiter_link_lwb to take the zl_lock on its own, correcting the locking order. Following is an attached messsage from the Linux lockdep mechanism describing the potential deadlock - in our production, it was a matter of a few hours to hit this one... (redacted to fit 72 cols) systemd-journal/5561 is trying to acquire lock: (&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0 but task is already holding lock: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&zcw->zcw_lock){+.+.}-{4:4}: __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs] zil_commit_impl+0x12b0/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&zilog->zl_lock){+.+.}-{4:4}: __lock_acquire+0x12b6/0x2480 lock_acquire+0xab/0x380 __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_alloc_lwb+0x1df/0x3e0 [zfs] zil_lwb_write_issue+0x265/0x3f0 [zfs] zil_commit_impl+0x577/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&zcw->zcw_lock); lock(&zilog->zl_lock); lock(&zcw->zcw_lock); lock(&zilog->zl_lock); *** DEADLOCK *** 2 locks held by systemd-journal/5561: #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc... #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 Fixes openzfs#7038 Fixes openzfs#10440 Signed-off-by: Pavel Snajdr <[email protected]>
There are three locks at play, when handling ZIL records. First is a zl_issuer_lock, that is supposed to be held, whenever we're issuing a new record to ZIL. Then there is zl_lock to actually protect the in the zilog struct. And then there's a last lock, zcw_lock, which is used to protect the lists of ZIL call-back waiters. The order of locking is supposed to be: zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage in zil_alloc_lwb and zil_commit_waiter_timeout functions. Function zil_commit_waiter_link_lwb goes against this, it is expecting to be entered with zl_lock already held and only then it is taking the zcw_lock; This patch straightens the locking in zil_commit_waiter_link_lwb to take the zl_lock on its own, correcting the locking order. Following is an attached messsage from the Linux lockdep mechanism describing the potential deadlock - in our production, it was a matter of a few hours to hit this one... (redacted to fit 72 cols) systemd-journal/5561 is trying to acquire lock: (&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0 but task is already holding lock: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&zcw->zcw_lock){+.+.}-{4:4}: __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs] zil_commit_impl+0x12b0/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&zilog->zl_lock){+.+.}-{4:4}: __lock_acquire+0x12b6/0x2480 lock_acquire+0xab/0x380 __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_alloc_lwb+0x1df/0x3e0 [zfs] zil_lwb_write_issue+0x265/0x3f0 [zfs] zil_commit_impl+0x577/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&zcw->zcw_lock); lock(&zilog->zl_lock); lock(&zcw->zcw_lock); lock(&zilog->zl_lock); *** DEADLOCK *** 2 locks held by systemd-journal/5561: #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc... #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 Fixes openzfs#7038 Fixes openzfs#10440 Signed-off-by: Pavel Snajdr <[email protected]>
It took a while, but the problem has manifested again, running with ~2 months old git HEAD:
|
Well, our issue seems to be only relevant when core-dumping - #10875 (comment) |
The real problem and it does exist - linux-next 20211118 + current zfs: You can catch this deadlock when debug, use this config:
I will report the result after testing this fix. |
The situation changed after the patch, TRAP disappeared, but the deadlock end remained.
|
@AndyLavr I'm not sure the issues are actually related, it's been a while and all I recall now is a good salad from the mix of info I had at the time :D Are you experiencing this with any other linux revisions than the pretty recent -next? Btw, do you think you'd be able to distill a reproducer from the workload you have? |
I was debugging another issue and accidentally saw this. I only use -next.
This is one message during boot. I will send you dmesg by email. |
Ah, OK, I see, that's a lockdep warning about a potential deadlock; if you're not really hitting it, I guess you can ignore that one :) I haven't seen the actual deadlock occur again, so it must be resolved for quite some time now. :) |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
I'm using the tool backy2 to backup a CEPH cluster to a raidz ZFS volume. The Backy2 Process uses a SQLite Database stored on in a dataset on the Backup pool. Sometimes during the Backup, the process gets stuck (D state). The stacktrace shows that it is stuck at a zfs-task:
The problem happens not every day, but on a regular basis. It is quite annoying, because it requires a reboot of the cluster node. The cluster is a Proxmox6 cluster. The same setup was working for a long time on a Proxmox5 cluster (Kernel 4.15.18-29-pve, zfs 0.7.13-pve1~bpo2, Debian Stretch) and problems only started after the update.
The text was updated successfully, but these errors were encountered: