-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race between zil_commit() and zil_suspend() #14514
Conversation
This appears to be a regression introduced by 1ce23dc. |
I have rebased this on master and repushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this lock zl_suspend_lock, since that is what it protects.
Also I am worrying about one more contention point per ZIL. Even though in most cases it won't block, it is still a couple of atomics on shared variable. This reminds me about FreeBSD's rms_rlock() primitives, but may be there are some other solutions too.
I will change the name when I do my next push.
I could not see a better way to close the race. |
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Signed-off-by: Richard Yao <[email protected]>
The latest push rebases on master and addresses both of @amotin's comments. |
#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#14514
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#14514
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#14514
Motivation and Context
openzfsonwindows#206 found that it is possible to trip
VERIFY(list_is_empty(&lwb->lwb_itxs))
when azil_commit()
is delayed by the scheduler long enough for a parallelzil_suspend()
operation to exitzil_commit_impl()
. This is a data race. To prevent this, we introduce azilog->zl_commit_lock
rwlock to ensure that all outstandingzil_commit()
operations finish beforezil_suspend()
begins and that subsequent operations fallback totxg_wait_synced()
afterzil_suspend()
has begun.On
PREEMPT_RT
Linux kernels, therw_enter()
implementation suffers from writer starvation. This means that a ZIL intensive system can delayzil_suspend()
indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds againstPREEMPT_RT
Linux kernels are currently broken due to a GPL symbol issue (#11097), so we can safely disregard that issue for now.Description
We modify
zil_commit()
to grab a read lock for both its suspend check and the full duration ofzil_commit_impl()
, although not for the duration oftxg_wait_synced()
when the suspend check shows that ZIL is suspended. We also modifyzil_suspend()
to grab a write lock before grabbingzilog->zl_lock
and release it after it has incrementedzilog->zl_suspend
. The result is that all outstandingzil_commit()
operations finish beforezil_suspend()
begins and subsequentzil_commit()
operations fallback totxg_wait_synced()
as expected. This prevents the scheduler from adding arbitrarily long waits tozil_commit()
that will cause it to runzil_commit_impl()
on a ZIL that is either already suspending or already suspended.Note that grabbing the write lock before grabbing
zilog->zl_lock
is intended to prevent a lock inversion deadlock betweenzil_commit()
andzil_suspend()
on the two locks.How Has This Been Tested?
The buildbot can test it.
Types of changes
Checklist:
Signed-off-by
.