Extend eager-lock to Metadata transactions #418

pranithk · 2018-03-14T04:27:30Z

Eager-lock implementation in AFR at the moment only supports data transactions. When replication is used along with shard, and shard does fxattrops for doing size/blocks updates high contention is observed.
What we also observed during some usecases is that eager-lock implementation in AFR at the moment doesn't handle conflicting writes from same fd in parallel.

Contrast to this, in EC, eager-lock is enabled for both data and metadata operations. It also handles 2) well, in addition to that, When there are conflicting writes, it doesn't lead to unlocks/post-ops, it tries to do most of the work using same lock as much as possible. This implementation should be borrowed in AFR as well.

EC uses same inode lock for both data and metadata domains, but AFR can't do that to keep backward compatibility.

Lifecycle of lock:

First transaction is added to inode->owners list and an inodelk will be sent on
the wire. All the next transactions will be put in inode->waiters list until
the first transaction completes inodelk and [f]xattrop completely. Once
[f]xattrop also completes, all the requests in the inode->waiters list are
checked if it conflict with any of the existing locks which are in
inode->owners list and if not are added to inode->owners list and resumed with
doing transaction. When these transactions complete fop phase they will be
moved to inode->post_op list and resume the transactions that were paused
because of conflicts. Post-op and unlock will not be issued on the wire until
that is the last transaction on that inode. Last transaction when it has to
perform post-op can choose to sleep for deyed-post-op-secs value. During that
time if any other transaction comes, it will wake up the sleeping transaction
and takes over the ownership of the lock and the cycle continues. If the
dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping
transaction and it will set lock->release to true and starts doing post-op and
then unlock. During this time if any other transactions come, they will be put
in inode->frozen list. Once the previous unlock comes it will move the frozen
list to waiters list and moves the first element from this waiters-list to
owners-list and attempts the lock and the cycle continues. This is the general
idea. There is logic at the time of dealying and at the time of new
transaction or in flush fop to wakeup existing sleeping transactions or
choosing whether to delay a transaction etc, which is subjected to change based
on future enhancements etc.
At the moment, when more than 1 fd is open it falls back to non-eager-lock mode.

pranithk · 2018-03-14T05:04:17Z

https://review.gluster.org/19711 - has the spec. Documentation needs to be added at the time of doing release notes.

@nigelbabu I did add "Updates #418" to the patch but update is not happening on the github issue. Am I doing something wrong?

gluster-ant · 2018-03-14T05:48:45Z

A patch https://review.gluster.org/19503 has been posted that references this issue.
Commit message: cluster/afr: Make AFR eager-locking similar to EC

Problem: 1) Afr's eager-lock only works for data transactions. 2) When there are conflicting writes, write with conflicting region initiates unlock of eager-lock leading to extra pre-ops and post-ops on the file. When eager-lock goes off, it leads to extra fsyncs for random-write workload in afr. Solution (that is modeled after EC): In EC, when there is a conflicting write, it waits for the current write to complete before it winds the conflicted write. This leads to better utilization of network and disk, because we will not be doing extra xattrops and FSYNCs and inodelk/unlock. Moved fd based counters to inode based counters. I tried to model the solution based on EC's locking, but it is not similar to AFR because we had to keep backward compatibility. Lifecycle of lock: ================== First transaction is added to inode->owners list and an inodelk will be sent on the wire. All the next transactions will be put in inode->waiters list until the first transaction completes inodelk and [f]xattrop completely. Once [f]xattrop also completes, all the requests in the inode->waiters list are checked if it conflict with any of the existing locks which are in inode->owners list and if not are added to inode->owners list and resumed with doing transaction. When these transactions complete fop phase they will be moved to inode->post_op list and resume the transactions that were paused because of conflicts. Post-op and unlock will not be issued on the wire until that is the last transaction on that inode. Last transaction when it has to perform post-op can choose to sleep for deyed-post-op-secs value. During that time if any other transaction comes, it will wake up the sleeping transaction and takes over the ownership of the lock and the cycle continues. If the dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping transaction and it will set lock->release to true and starts doing post-op and then unlock. During this time if any other transactions come, they will be put in inode->frozen list. Once the previous unlock comes it will move the frozen list to waiters list and moves the first element from this waiters-list to owners-list and attempts the lock and the cycle continues. This is the general idea. There is logic at the time of dealying and at the time of new transaction or in flush fop to wakeup existing sleeping transactions or choosing whether to delay a transaction etc, which is subjected to change based on future enhancements etc. Fixes: gluster#418 BUG: 1549606 Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc Signed-off-by: Pranith Kumar K <[email protected]>

Problem: 1) Afr's eager-lock only works for data transactions. 2) When there are conflicting writes, write with conflicting region initiates unlock of eager-lock leading to extra pre-ops and post-ops on the file. When eager-lock goes off, it leads to extra fsyncs for random-write workload in afr. Solution (that is modeled after EC): In EC, when there is a conflicting write, it waits for the current write to complete before it winds the conflicted write. This leads to better utilization of network and disk, because we will not be doing extra xattrops and FSYNCs and inodelk/unlock. Moved fd based counters to inode based counters. I tried to model the solution based on EC's locking, but it is not similar to AFR because we had to keep backward compatibility. Lifecycle of lock: ================== First transaction is added to inode->owners list and an inodelk will be sent on the wire. All the next transactions will be put in inode->waiters list until the first transaction completes inodelk and [f]xattrop completely. Once [f]xattrop also completes, all the requests in the inode->waiters list are checked if it conflict with any of the existing locks which are in inode->owners list and if not are added to inode->owners list and resumed with doing transaction. When these transactions complete fop phase they will be moved to inode->post_op list and resume the transactions that were paused because of conflicts. Post-op and unlock will not be issued on the wire until that is the last transaction on that inode. Last transaction when it has to perform post-op can choose to sleep for deyed-post-op-secs value. During that time if any other transaction comes, it will wake up the sleeping transaction and takes over the ownership of the lock and the cycle continues. If the dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping transaction and it will set lock->release to true and starts doing post-op and then unlock. During this time if any other transactions come, they will be put in inode->frozen list. Once the previous unlock comes it will move the frozen list to waiters list and moves the first element from this waiters-list to owners-list and attempts the lock and the cycle continues. This is the general idea. There is logic at the time of dealying and at the time of new transaction or in flush fop to wakeup existing sleeping transactions or choosing whether to delay a transaction etc, which is subjected to change based on future enhancements etc. >Fixes: gluster#418 >BUG: 1549606 Upstream-patch: https://review.gluster.org/19503 BUG: 1491785 Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc Signed-off-by: Pranith Kumar K <[email protected]> Reviewed-on: https://code.engineering.redhat.com/gerrit/131945 Tested-by: RHGS Build Bot <[email protected]>

mscherer closed this as completed in 3467143 Mar 14, 2018

ShyamsundarR added this to the Release 4.1 (LTM) milestone May 7, 2018

chen1195585098 mentioned this issue Oct 22, 2024

Assertion failed: !(*take_lock) will cause stale inodelk and block subsequent IO #4425

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend eager-lock to Metadata transactions #418

Extend eager-lock to Metadata transactions #418

pranithk commented Mar 14, 2018

pranithk commented Mar 14, 2018

gluster-ant commented Mar 14, 2018

Extend eager-lock to Metadata transactions #418

Extend eager-lock to Metadata transactions #418

Comments

pranithk commented Mar 14, 2018

Lifecycle of lock:

pranithk commented Mar 14, 2018

gluster-ant commented Mar 14, 2018