You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many Assertion failed: !(*take_lock) appearing successively in our test and production environemnts.
After preliminary investigation, it seems that there is a race between __afr_eager_lock_handle and afr_wakeup_same_fd_delayed_op. In the lifecycle of eager lock, when transactions complete fop phase, they will be moved to inode->post_op list, which is expected to be waked up as the timer expired or the last transaction on this inode get finished. However, such a wakeup process will be triggered each time a FLUSH is issued. When the first fop finished, afr_flush is called and it will wakeup delayed post_op on the same inode from afr_wakeup_same_fd_delayed_op. Once this operation succeed, lock->delay_timer = NULL will be set, which means all subsequent transactions need to wait and will be put into next lifecycle.
When a new write transaction on the same inode comes at the same time, howerver, it will be incorrectly added to owners list due to lock->release == false, lock->delay_timer == NULL, list_empty(&lock->owners) == true and issues an extra lock request in current lifecycle, which makes stale inodelk.
From the code, I guess both lock->release and lock->delay_timer are marks to indicate whether a new transaction can to be added in current lifecycle, because the two conditions seem to hold at the same time. (e.g. lock->release = _gf_true and lock->delay_timer = NULL are set in afr_delayed_changelog_wake_up_cbk, when current transaction is the last owner of lock).
After verification, it is found that setting lock->release = _gf_true in afr_wakeup_same_fd_delayed_op will handle this problem. It is the simplest way to avoid stale inodelks while the root cause of the race is still unsolved.
In fact, the root cause of this issue is that afr_wakeup_same_fd_delayed_op is triggered too often. Ideally, it should only be triggered once in the last transaction of inode in a eager lock lifecycle as described at #418 . If so, this race is gone. However, it is hard to determine whether a transaction is the last without introducing more complexity.
2、create and start a normal replicated volume test, mount this volume. (e.g. mount -t glusterfs localhost:test /mnt)
3、write a same file simultaneously with 2 threads.
#!/usr/bin/python
import sys
import random
import time
import multiprocessing
def write_files(file_name):
while True:
with open(file_name, "w") as f:
random.seed(time.time())
result = random.random()
f.write(str(result) * 1024)
def main(argv):
thread_num = 2
file_name = '/mnt/testfile'
for num in range(int(thread_num)):
proc = multiprocessing.Process(target=write_files, args=(file_name,))
proc.start()
if __name__ == "__main__":
main(sys.argv[1:])
while True:
time.sleep(1)
After Assertion failed: !(*take_lock) is printed in mount log, stale inodelk occurs, and can be checked with statedump. The full output of the command that failed:
Expected results:
No assertion failures or stale inodelk when a same file is written simultaneously.
**- Is there any crash ? Provide the backtrace and coredump
Assert failure will cause crash in DEBUG mode.
Additional info:
- The operating system / glusterfs version:
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
The text was updated successfully, but these errors were encountered:
…delayed_op (gluster#4425)
Race between __afr_eager_lock_handle and afr_wakeup_same_fd_delayed_op will lead to
stale inodelks, which may cause subsequent IO blocked.
To avoid this race, set lock->release = _gf_true also in afr_wakeup_same_fd_delayed_op so
that a new transaction can be correctly handled as expected in __afr_eager_lock_handle.
Fixes: gluster#4425
Signed-off-by: chenjinhao <[email protected]>
Hi, @pranithk , can we simply set lock->release = _gf_true to avoid this issue? When this issue is triggered, inodelk request from other mountpoint will be blocked,which greatly effect the availability of a replicated volume.
Will such a fix bring a worse effection? I noticed you provided this method in #3065, but no further modification was made.
Description of problem:
Same as #3065 .
There are many
Assertion failed: !(*take_lock)
appearing successively in our test and production environemnts.After preliminary investigation, it seems that there is a race between
__afr_eager_lock_handle
andafr_wakeup_same_fd_delayed_op
. In the lifecycle of eager lock, when transactions complete fop phase, they will be moved toinode->post_op
list, which is expected to be waked up as the timer expired or the last transaction on this inode get finished. However, such a wakeup process will be triggered each time a FLUSH is issued. When the first fop finished,afr_flush
is called and it will wakeup delayed post_op on the same inode fromafr_wakeup_same_fd_delayed_op
. Once this operation succeed,lock->delay_timer = NULL
will be set, which means all subsequent transactions need to wait and will be put into next lifecycle.When a new write transaction on the same inode comes at the same time, howerver, it will be incorrectly added to owners list due to
lock->release == false, lock->delay_timer == NULL, list_empty(&lock->owners) == true
and issues an extra lock request incurrent
lifecycle, which makesstale inodelk
.From the code, I guess both
lock->release
andlock->delay_timer
are marks to indicate whether a new transaction can to be added in current lifecycle, because the two conditions seem to hold at the same time. (e.g.lock->release = _gf_true
andlock->delay_timer = NULL
are set inafr_delayed_changelog_wake_up_cbk
, when current transaction is thelast owner
of lock).After verification, it is found that setting
lock->release = _gf_true
inafr_wakeup_same_fd_delayed_op
will handle this problem. It is the simplest way to avoid stale inodelks while the root cause of the race is still unsolved.In fact, the root cause of this issue is that
afr_wakeup_same_fd_delayed_op
is triggered too often. Ideally, it should only be triggered once in the last transaction of inode in a eager lock lifecycle as described at #418 . If so, this race is gone. However, it is hard to determine whether a transaction is the last without introducing more complexity.The exact command to reproduce the issue:
1、modify glusterfs code to intensify race.
2、create and start a normal replicated volume
test
, mount this volume. (e.g.mount -t glusterfs localhost:test /mnt
)3、write a same file simultaneously with 2 threads.
After
Assertion failed: !(*take_lock)
is printed in mount log, stale inodelk occurs, and can be checked with statedump.The full output of the command that failed:
Expected results:
No assertion failures or stale inodelk when a same file is written simultaneously.
**- Is there any crash ? Provide the backtrace and coredump
Assert failure will cause crash in DEBUG mode.
Additional info:
- The operating system / glusterfs version:
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration
The text was updated successfully, but these errors were encountered: