Large fsync's starve smaller fsync's #4603

hhhappe · 2016-05-06T15:48:17Z

This was discovered with Lustre on ZFS (0.6.5.4). Users was experiencing long delays when reading log files written from other nodes. This will make Lustre flush the new data to disk, before a client is allowed to read.

I managed to recreate what I think is the problem using plain ZFS (0.6.5.6 CentOS 6.7) on a single block device. Doing an ioping in one process while dd'ing a large file with conv=fsync. ioping is just doing random synced 4k writes in a 1MB once a second.

The ioping command:

ioping -WWW /test0/1m

The large file dd:

dd of=/test0/f0 if=/dev/zero bs=1M count=14000 conv=fsync

The server has 128GB of memory so the size is larger than zfs_dirty_data_max, in order to create a worst case.

Output from ioping when fsync kicks in:

4.0 KiB from /test0/1m (zfs test0): request=28 time=186.9 ms
4.0 KiB from /test0/1m (zfs test0): request=29 time=766.4 ms
4.0 KiB from /test0/1m (zfs test0): request=30 time=419.8 ms
/* fsync */
4.0 KiB from /test0/1m (zfs test0): request=31 time=1.5 min
4.0 KiB from /test0/1m (zfs test0): request=32 time=20.7 ms
4.0 KiB from /test0/1m (zfs test0): request=33 time=11.1 ms

After 5 secs (zfs_txg_timeout) of dd the latency goes up, but there is progress. After the dd fsync kicks in the next ping is stalled for 1.5 minutes, while the large file is flushed. The disk writes ~160MB/s.

The text was updated successfully, but these errors were encountered:

stale · 2020-08-24T23:54:03Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

devZer0 · 2020-12-25T16:27:05Z

i cannot reproduce this with ZoL 0.8.6, i see no stall >1s, so i think it's resolved

scineram · 2020-12-25T19:26:17Z

#6191 and #9409 are both in 0.8.3

hhhappe changed the title ~~Large fsync's starve smaller fsyncs~~ Large fsync's starve smaller fsync's May 6, 2016

behlendorf added the Type: Performance Performance improvement or performance problem label May 6, 2016

stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020

stale bot closed this as completed Nov 25, 2020

devZer0 mentioned this issue Jan 13, 2022

Lack of fairness of sync writes #10110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large fsync's starve smaller fsync's #4603

Large fsync's starve smaller fsync's #4603

hhhappe commented May 6, 2016

stale bot commented Aug 24, 2020

devZer0 commented Dec 25, 2020

scineram commented Dec 25, 2020

Large fsync's starve smaller fsync's #4603

Large fsync's starve smaller fsync's #4603

Comments

hhhappe commented May 6, 2016

stale bot commented Aug 24, 2020

devZer0 commented Dec 25, 2020

scineram commented Dec 25, 2020