Skip to content

Commit

Permalink
If we're doing a writeback with WB_SYNC_ALL and there's an active wri…
Browse files Browse the repository at this point in the history
…teback, do a commit to speed up its completion

If we're doing a writeback with WB_SYNC_NONE, add a small delay to make sure we catch any WB_SYNC_ALL waiters due to a race condition in filemap_write_and_wait_range()
Introduce a parameter named zfs_page_writeback_no_sync_delay_us to control the small delay.

Signed-off-by: Shaan Nobee <[email protected]>
Closes openzfs#12662
  • Loading branch information
shaan1337 committed Nov 25, 2021
1 parent ded851b commit 8ab9c8f
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 0 deletions.
6 changes: 6 additions & 0 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -952,6 +952,12 @@ are not created per-object and instead a hashtable is used where collisions
will result in objects waiting when there is not actually contention on the
same object.
.
.It Sy zfs_page_writeback_no_sync_delay_us Ns = Ns Sy 2 Ns us Pq uint
Number of microseconds to delay non-sync page writebacks to catch any concurrent
sync page writeback waiters. If you're seeing rare but long multi-second delays
during msync() calls in your applications, increasing the value of this tunable
by a few microseconds may help.
.
.It Sy zfs_slow_io_events_per_second Ns = Ns Sy 20 Ns /s Pq int
Rate limit delay and deadman zevents (which report slow I/Os) to this many per
second.
Expand Down
31 changes: 31 additions & 0 deletions module/os/linux/zfs/zfs_vnops_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,8 @@
* return (error); // done, report error
*/

unsigned int zfs_page_writeback_no_sync_delay_us = 2;

/* ARGSUSED */
int
zfs_open(struct inode *ip, int mode, int flag, cred_t *cr)
Expand Down Expand Up @@ -3530,6 +3532,14 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc)
zfs_rangelock_exit(lr);

if (wbc->sync_mode != WB_SYNC_NONE) {
/*
* Do a commit to speed up the registered commit callback in order to get
* the page out of the writeback state as soon as possible otherwise we may
* need to wait for several seconds until the transaction group closes if the
* currently active writeback was done with WB_SYNC_NONE.
*/
zil_commit(zfsvfs->z_log, zp->z_id);

if (PageWriteback(pp))
wait_on_page_bit(pp, PG_writeback);
}
Expand Down Expand Up @@ -3604,6 +3614,22 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc)
* performance reasons.
*/
zil_commit(zfsvfs->z_log, zp->z_id);
} else {
/*
* This is a workaround for an inherent race condition in filemap_write_and_wait_range()
* where a new page writeback with WB_SYNC_NONE (ourself - the current thread) can start after
* __filemap_fdatawrite_range() (the above WB_SYNC_ALL zil_commit() to be more precise)
* but before filemap_fdatawait_range() (wait_on_page_writeback() to be more precise).
*
* By adding a small delay, we ensure that we can catch any waiters. If there are any,
* we immediately do a commit to avoid making them wait for potentially several seconds
* until the transaction group closes.
*/
if (zfs_page_writeback_no_sync_delay_us > 0)
udelay(MIN(zfs_page_writeback_no_sync_delay_us, 10));

if (PageWaiters(pp))
zil_commit(zfsvfs->z_log, zp->z_id);
}

ZFS_EXIT(zfsvfs);
Expand Down Expand Up @@ -3997,3 +4023,8 @@ MODULE_PARM_DESC(zfs_delete_blocks, "Delete files larger than N blocks async");
/* END CSTYLED */

#endif

/* BEGIN CSTYLED */
ZFS_MODULE_PARAM(zfs, zfs_page_writeback_, no_sync_delay_us, UINT, ZMOD_RW,
"Number of microseconds to delay non-sync page writebacks to catch any concurrent sync page writeback waiters");
/* END CSTYLED */

0 comments on commit 8ab9c8f

Please sign in to comment.