Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed: !(*take_lock) will cause stale inodelk and block subsequent IO #4425

Open
chen1195585098 opened this issue Oct 22, 2024 · 1 comment · May be fixed by #4426
Open

Assertion failed: !(*take_lock) will cause stale inodelk and block subsequent IO #4425

chen1195585098 opened this issue Oct 22, 2024 · 1 comment · May be fixed by #4426

Comments

@chen1195585098
Copy link
Contributor

Description of problem:
Same as #3065 .

There are many Assertion failed: !(*take_lock) appearing successively in our test and production environemnts.

After preliminary investigation, it seems that there is a race between __afr_eager_lock_handle and afr_wakeup_same_fd_delayed_op. In the lifecycle of eager lock, when transactions complete fop phase, they will be moved to inode->post_op list, which is expected to be waked up as the timer expired or the last transaction on this inode get finished. However, such a wakeup process will be triggered each time a FLUSH is issued. When the first fop finished, afr_flush is called and it will wakeup delayed post_op on the same inode from afr_wakeup_same_fd_delayed_op. Once this operation succeed, lock->delay_timer = NULL will be set, which means all subsequent transactions need to wait and will be put into next lifecycle.

When a new write transaction on the same inode comes at the same time, howerver, it will be incorrectly added to owners list due to lock->release == false, lock->delay_timer == NULL, list_empty(&lock->owners) == true and issues an extra lock request in current lifecycle, which makes stale inodelk.

[2024-10-21 06:52:46.969255] I [afr-transaction.c:2776:__afr_eager_lock_handle] 0-LOCK_TEST: TID=4209133248, ctx = 0x7f04f400d690, lock = 0x7f04f400d6e0, owners = 1, waitings = 0, frozens = 0, reason = lock->release = FALSE, lock->delay_timer reset to NLL, add to owners, take_lock = 0, do_pre_op = 1, lock->acquried = 1
[2024-10-21 06:52:47.694238] I [afr-common.c:4199:afr_wakeup_same_fd_delayed_op] 0-LOCK_TEST: TID=4217525952, set lock->delay_timer = NULL  # retrieve post_op here
[2024-10-21 06:52:47.696090] E [afr-transaction.c:2773:__afr_eager_lock_handle] (-->/usr/local/lib/glusterfs/8.2/xlator/cluster/replicate.so(+0x39dab) [0x7f04fa53fdab] -->/usr/local/lib/glusterfs/8.2/xlator/cluster/replicate.so(+0x3984c) [0x7f04fa53f84c] -->/usr/local/lib/glusterfs/8.2/xlator/cluster/replicate.so(+0x39796) [0x7f04fa53f796] ) 0-: Assertion failed: !(*take_lock)
[2024-10-21 06:52:47.696106] I [afr-transaction.c:2776:__afr_eager_lock_handle] 0-LOCK_TEST: TID=4187313856, ctx = 0x7f04f400d690, lock = 0x7f04f400d6e0, owners = 1, waitings = 0, frozens = 0, reason = , take_lock = 1, do_pre_op = 0, lock->acquried = 1

From the code, I guess both lock->release and lock->delay_timer are marks to indicate whether a new transaction can to be added in current lifecycle, because the two conditions seem to hold at the same time. (e.g. lock->release = _gf_true and lock->delay_timer = NULL are set in afr_delayed_changelog_wake_up_cbk, when current transaction is the last owner of lock).

After verification, it is found that setting lock->release = _gf_true in afr_wakeup_same_fd_delayed_op will handle this problem. It is the simplest way to avoid stale inodelks while the root cause of the race is still unsolved.

In fact, the root cause of this issue is that afr_wakeup_same_fd_delayed_op is triggered too often. Ideally, it should only be triggered once in the last transaction of inode in a eager lock lifecycle as described at #418 . If so, this race is gone. However, it is hard to determine whether a transaction is the last without introducing more complexity.

The exact command to reproduce the issue:

1、modify glusterfs code to intensify race.

diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index 23505b7..0deb933 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -4225,6 +4225,7 @@ afr_delayed_changelog_wake_resume(xlator_t *this, inode_t *inode,
     }
     UNLOCK(&inode->lock);
 
+    usleep(((random() % 20) * 40 + 200) * 1000); // randomly sleep
     if (data_local) {
         data_local->transaction.resume_stub = stub;
     } else if (metadata_local) {
diff --git a/xlators/cluster/afr/src/afr-transaction.c b/xlators/cluster/afr/src/afr-transaction.c
index c29083a..7fa2427 100644
--- a/xlators/cluster/afr/src/afr-transaction.c
+++ b/xlators/cluster/afr/src/afr-transaction.c
@@ -2711,6 +2711,7 @@ __afr_eager_lock_handle(afr_local_t *local, gf_boolean_t *take_lock,
         }
     }
 
+    usleep(((random() % 20) * 40 + 200) * 1000); // randomly sleep
     if (lock->release) {
         list_add_tail(&local->transaction.wait_list, &lock->frozen);
         *take_lock = _gf_false;

2、create and start a normal replicated volume test, mount this volume. (e.g. mount -t glusterfs localhost:test /mnt)
3、write a same file simultaneously with 2 threads.

#!/usr/bin/python

import sys
import random
import time
import multiprocessing

def write_files(file_name):
    while True:
        with open(file_name, "w") as f:
            random.seed(time.time())
            result = random.random()
            f.write(str(result) * 1024)

def main(argv):
    thread_num = 2
    file_name = '/mnt/testfile'

    for num in range(int(thread_num)):
        proc = multiprocessing.Process(target=write_files, args=(file_name,))
        proc.start()

if __name__ == "__main__":
    main(sys.argv[1:])
    while True:
        time.sleep(1)

After Assertion failed: !(*take_lock) is printed in mount log, stale inodelk occurs, and can be checked with statedump.
The full output of the command that failed:

Expected results:
No assertion failures or stale inodelk when a same file is written simultaneously.

**- Is there any crash ? Provide the backtrace and coredump
Assert failure will cause crash in DEBUG mode.

Additional info:

- The operating system / glusterfs version:

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

chen1195585098 pushed a commit to chen1195585098/glusterfs that referenced this issue Oct 22, 2024
…delayed_op (gluster#4425)

Race between __afr_eager_lock_handle and afr_wakeup_same_fd_delayed_op will lead to
stale inodelks, which may cause subsequent IO blocked.
To avoid this race, set lock->release = _gf_true also in afr_wakeup_same_fd_delayed_op so
that a new transaction can be correctly handled as expected in __afr_eager_lock_handle.

Fixes: gluster#4425

Signed-off-by: chenjinhao <[email protected]>
@chen1195585098
Copy link
Contributor Author

chen1195585098 commented Oct 24, 2024

Hi, @pranithk , can we simply set lock->release = _gf_true to avoid this issue? When this issue is triggered, inodelk request from other mountpoint will be blocked,which greatly effect the availability of a replicated volume.

Will such a fix bring a worse effection? I noticed you provided this method in #3065, but no further modification was made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant