Skip to content

Commit

Permalink
Illumos #4347 ZPL can use dmu_tx_assign(TXG_WAIT)
Browse files Browse the repository at this point in the history
Fix a lock contention issue by allowing threads not holding
ZPL locks to block when waiting to assign a transaction.

Porting Notes:

zfs_putpage() still uses TXG_NOWAIT, unlike the upstream version.  This
case may be a contention point just like zfs_write(), however it is not
safe to block here since it may be called during memory reclaim.

Reviewed by: George Wilson <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Reviewed by: Boris Protopopov <[email protected]>
Approved by: Dan McDonald <[email protected]>

References:
  https://www.illumos.org/issues/4347
  illumos/illumos-gate@e722410

Ported-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
  • Loading branch information
ahrens authored and behlendorf committed Dec 6, 2013
1 parent 7292105 commit 384f8a0
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 37 deletions.
8 changes: 1 addition & 7 deletions module/zfs/zfs_dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -973,7 +973,6 @@ zfs_make_xattrdir(znode_t *zp, vattr_t *vap, struct inode **xipp, cred_t *cr)
return (SET_ERROR(EDQUOT));
}

top:
tx = dmu_tx_create(zsb->z_os);
dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes +
ZFS_SA_BASE_ATTR_SIZE);
Expand All @@ -982,13 +981,8 @@ zfs_make_xattrdir(znode_t *zp, vattr_t *vap, struct inode **xipp, cred_t *cr)
fuid_dirtied = zsb->z_fuid_dirty;
if (fuid_dirtied)
zfs_fuid_txhold(zsb, tx);
error = dmu_tx_assign(tx, TXG_NOWAIT);
error = dmu_tx_assign(tx, TXG_WAIT);
if (error) {
if (error == ERESTART) {
dmu_tx_wait(tx);
dmu_tx_abort(tx);
goto top;
}
zfs_acl_ids_free(&acl_ids);
dmu_tx_abort(tx);
return (error);
Expand Down
32 changes: 15 additions & 17 deletions module/zfs/zfs_vnops.c
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,18 @@
* (3) All range locks must be grabbed before calling dmu_tx_assign(),
* as they can span dmu_tx_assign() calls.
*
* (4) Always pass TXG_NOWAIT as the second argument to dmu_tx_assign().
* This is critical because we don't want to block while holding locks.
* Note, in particular, that if a lock is sometimes acquired before
* the tx assigns, and sometimes after (e.g. z_lock), then failing to
* use a non-blocking assign can deadlock the system. The scenario:
* (4) If ZPL locks are held, pass TXG_NOWAIT as the second argument to
* dmu_tx_assign(). This is critical because we don't want to block
* while holding locks.
*
* If no ZPL locks are held (aside from ZFS_ENTER()), use TXG_WAIT. This
* reduces lock contention and CPU usage when we must wait (note that if
* throughput is constrained by the storage, nearly every transaction
* must wait).
*
* Note, in particular, that if a lock is sometimes acquired before
* the tx assigns, and sometimes after (e.g. z_lock), then failing
* to use a non-blocking assign can deadlock the system. The scenario:
*
* Thread A has grabbed a lock before calling dmu_tx_assign().
* Thread B is in an already-assigned tx, and blocks for this lock.
Expand Down Expand Up @@ -712,7 +719,6 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
while (n > 0) {
abuf = NULL;
woff = uio->uio_loffset;
again:
if (zfs_owner_overquota(zsb, zp, B_FALSE) ||
zfs_owner_overquota(zsb, zp, B_TRUE)) {
if (abuf != NULL)
Expand Down Expand Up @@ -762,13 +768,8 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
dmu_tx_hold_write(tx, zp->z_id, woff, MIN(n, max_blksz));
zfs_sa_upgrade_txholds(tx, zp);
error = dmu_tx_assign(tx, TXG_NOWAIT);
error = dmu_tx_assign(tx, TXG_WAIT);
if (error) {
if (error == ERESTART) {
dmu_tx_wait(tx);
dmu_tx_abort(tx);
goto again;
}
dmu_tx_abort(tx);
if (abuf != NULL)
dmu_return_arcbuf(abuf);
Expand Down Expand Up @@ -2833,12 +2834,9 @@ zfs_setattr(struct inode *ip, vattr_t *vap, int flags, cred_t *cr)

zfs_sa_upgrade_txholds(tx, zp);

err = dmu_tx_assign(tx, TXG_NOWAIT);
if (err) {
if (err == ERESTART)
dmu_tx_wait(tx);
err = dmu_tx_assign(tx, TXG_WAIT);
if (err)
goto out;
}

count = 0;
/*
Expand Down
15 changes: 2 additions & 13 deletions module/zfs/zfs_znode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1205,7 +1205,6 @@ zfs_extend(znode_t *zp, uint64_t end)
zfs_range_unlock(rl);
return (0);
}
top:
tx = dmu_tx_create(zsb->z_os);
dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
zfs_sa_upgrade_txholds(tx, zp);
Expand All @@ -1225,13 +1224,8 @@ zfs_extend(znode_t *zp, uint64_t end)
newblksz = 0;
}

error = dmu_tx_assign(tx, TXG_NOWAIT);
error = dmu_tx_assign(tx, TXG_WAIT);
if (error) {
if (error == ERESTART) {
dmu_tx_wait(tx);
dmu_tx_abort(tx);
goto top;
}
dmu_tx_abort(tx);
zfs_range_unlock(rl);
return (error);
Expand Down Expand Up @@ -1419,13 +1413,8 @@ zfs_freesp(znode_t *zp, uint64_t off, uint64_t len, int flag, boolean_t log)
tx = dmu_tx_create(zsb->z_os);
dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
zfs_sa_upgrade_txholds(tx, zp);
error = dmu_tx_assign(tx, TXG_NOWAIT);
error = dmu_tx_assign(tx, TXG_WAIT);
if (error) {
if (error == ERESTART) {
dmu_tx_wait(tx);
dmu_tx_abort(tx);
goto log;
}
dmu_tx_abort(tx);
return (error);
}
Expand Down

0 comments on commit 384f8a0

Please sign in to comment.