Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large fsync's starve smaller fsync's #4603

Closed
hhhappe opened this issue May 6, 2016 · 3 comments
Closed

Large fsync's starve smaller fsync's #4603

hhhappe opened this issue May 6, 2016 · 3 comments
Labels
Status: Inactive Not being actively updated Status: Stale No recent activity for issue Type: Performance Performance improvement or performance problem

Comments

@hhhappe
Copy link

hhhappe commented May 6, 2016

This was discovered with Lustre on ZFS (0.6.5.4). Users was experiencing long delays when reading log files written from other nodes. This will make Lustre flush the new data to disk, before a client is allowed to read.

I managed to recreate what I think is the problem using plain ZFS (0.6.5.6 CentOS 6.7) on a single block device. Doing an ioping in one process while dd'ing a large file with conv=fsync. ioping is just doing random synced 4k writes in a 1MB once a second.

The ioping command:

ioping -WWW /test0/1m

The large file dd:

dd of=/test0/f0 if=/dev/zero bs=1M count=14000 conv=fsync

The server has 128GB of memory so the size is larger than zfs_dirty_data_max, in order to create a worst case.

Output from ioping when fsync kicks in:

4.0 KiB from /test0/1m (zfs test0): request=28 time=186.9 ms
4.0 KiB from /test0/1m (zfs test0): request=29 time=766.4 ms
4.0 KiB from /test0/1m (zfs test0): request=30 time=419.8 ms
/* fsync */
4.0 KiB from /test0/1m (zfs test0): request=31 time=1.5 min
4.0 KiB from /test0/1m (zfs test0): request=32 time=20.7 ms
4.0 KiB from /test0/1m (zfs test0): request=33 time=11.1 ms

After 5 secs (zfs_txg_timeout) of dd the latency goes up, but there is progress. After the dd fsync kicks in the next ping is stalled for 1.5 minutes, while the large file is flushed. The disk writes ~160MB/s.

@hhhappe hhhappe changed the title Large fsync's starve smaller fsyncs Large fsync's starve smaller fsync's May 6, 2016
@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label May 6, 2016
@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 25, 2020
@devZer0
Copy link

devZer0 commented Dec 25, 2020

i cannot reproduce this with ZoL 0.8.6, i see no stall >1s, so i think it's resolved

@scineram
Copy link

#6191 and #9409 are both in 0.8.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated Status: Stale No recent activity for issue Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @devZer0 @hhhappe @scineram and others