-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix zfs_fsync deadlock #10875
Closed
Closed
Fix zfs_fsync deadlock #10875
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on why it's safe to drop this lock here, and then reacquire it in
zil_commit_waiter_link_lwb
? For example, may anything bad happen ifzilog->zl_last_lwb_opened
were to change in between the drop and reacquire?I don't mean to imply this is wrong, just trying to refresh my memory on if this is safe or not, since the locking between these three locks is subtle..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it have been possible for the lwb assigned above:
to have been issued and perhaps completed in between dropping the lock here, and acquiring it later? In which case, the
lwb_t
structure pointed to bylast_lwb
may no longer be valid anymore?I'm looking at the code, and I don't think that's possible, since we will be holding the
zl_issuer_lock
lock here, and it also has to be held when issuing thezl_last_lwb_opened
; so there's no chance forlast_lwb
to be freed in between the drop/acquire ofzl_lock
.Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just to be clear, my concern is if the lwb we pass into
zil_commit_waiter_link_lwb
can be freed by the time it's used in that function, due to us droppingzl_lock
and racing with some other thread.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closer, the lwb wouldn't necessarily have to have been freed for this to break, we just need to ensure it doesn't transition into the
LWB_STATE_FLUSH_DONE
state during the drop/acquire, since at that point we would have wanted to skip the itx, rather than add it to the lwb.Further, I believe at this point,
last_lwb
should be in theLWB_STATE_OPENED
state, and another thread can't transition it to theLWB_STATE_ISSUED
(and thus it can't transition toLWB_STATE_FLUSH_DONE
either), due to us still holding thezl_issuer_lock
lock.If I'm thinking about all this correctly, then I think this change makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put the unlock after the brackets in both if/else bodies is exactly that the zilog->zl_last_lwb_opened doesn't change. Then the function is called with *itx pointer, not the lwb. I think this is legit, but it for sure needs more eyeballs on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looking at these two stacks from the PR description, don't they both require the single (per ZIL)
zl_issuer_lock
? So then, we actually could not have two threads simultaneously in these two stacks, and thus there isn't any actual deadlock concern (even if the lock ordering is confusing)?I think it'd be good to get stack information from the system when the actual deadlock occurs, since I'm not sure I'm convinced that lockdep is reporting a legitimate deadlock scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual lockdep message is a bit confusing, that is true - the main concern here is the
zil_commit_waiter_timeout
function, which releases thezcw->zcw_lock
for LWBs that aren't finished yet, then it takes thezl_issuer_lock
, but doesn't care about thezl_lock
at all, which might be held by a different thread. My main concern is the order of locking in that function:Now if we combine that with functions that take the
zl_lock
in combination with the above locks, and the fact that they do it with different order (sometimes the zcw_lock is first, sometimes zl_lock)... I don't know, I think this is rather opaque and I think I like the idea of separate lock protecting thezl_lwb_list
best.I'll paste the stacks of the stuck processes later today, they are still present on the production machines we run, we just clone the workload in a new dataset everytime this happens, so that we don't have to reboot the whole box. This is precisely "enabled" by the
zcw_lock
being specific to the callback waiter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the node with the stuck processes has been cycled meanwhile, I'll have to wait a bit for this to turn up again... should have saved the stacks when I had the chance :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily disagree with this sentiment, but I'd like to determine if we're making a change to eliminate a legitimate deadlock scenario, or if we're making a change to make the code easier to read/comprehend/etc (and/or to silence a lockdep false positive).
Which is why I'd like to see the state of the system when the actual deadlock occurs, so we can verify if that deadlock is the same issue being raised here, of if it's something different. In which case, if it is different, I'd still be OK looking to making some changes to the code that make this logic easier to understand.
I'm not sure how much of the ZIL logic and/or locking you're familiar with, so just to make sure we're all on the same page (and sorry if this isn't already documented in the code), the
zl_issuer_lock
has more to do about limiting that a single thread is only ever issuing ZIL writes to disk, rather than protecting structures in memory.Due to how the ZIL is laid out on disk, via a linked list of blocks and block pointers, we can only have a single thread issue ZIL writes to disk, as the thread that issues these writes is also the thread that allocates the "next" ZIL block on disk; and there can only ever be a single "next" block at any given time. So we use the
zl_issuer_lock
to ensure there isn't two threads racing when allocating that "next" block.Taking my above explanation into account, here we have to acquire the
zl_issuer_lock
because we're about to callzil_lwb_write_issue
, andzl_issuer_lock
is effectively used to wrap calls tozil_lwb_write_issue
rather than accessing any particular structure in memory.Additionally, after we acquire
zl_issuer_lock
we have to re-acquirezcw_lock
to ensure the waiter that we're concerned about doesn't transition into the "done" state while we're about to issue that waiter's lwb (this is also why we checkzcw_done
after we re-acquire thezcw_lock
, since it could have been marked done after we dropped the lock).We don't acquire the
zl_lock
at this point since we're not making any direct modifications to thezilog
structure. Further, we may not need that lock, for example in the case wherezcw_done
, in which case acquiringzl_lock
unnecessarily may hinder ZIL performance without any benefit (e.g. if acquiring this lock here, conflicts with acquiring the lock inzil_lwb_flush_vdevs_done
).I'm more than happy to discuss this more; please let me know if this makes sense, and/or if I need to add any more details, clarification, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, FWIW, there's some commentary on
zl_issuer_lock
andzl_lock
inzil_impl.h
above the definitions of thelwb_state_t
,lwb_t
, andzil_commit_waiter_t
structures.