-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Module Parameters Regarding Log Size Limit: Throttle write when the limit reached. #11876
Conversation
This should be a serious concern for any server running zfs. During the test, I find vm can trickle down fsync to the host. If there is no disk throttle in placed on the vm, the host could experience long hang. |
8cd5a3d
to
df43d02
Compare
|
||
if (write_state == WR_COPIED || write_state == WR_NEED_COPY) { | ||
dsl_pool_wrlog_delay(size, zilog->zl_dmu_pool, tx->tx_txg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unconditionally blocking here on the txg sync may introduce some subtle problems since it will result in zfs_write()
holding open its transaction handle until the txg sync completes. That can be problematic since if the pool gets suspended and the txg can't be written there's no way to unwind the in progress write.
A better place to wait (and kick) would be in dmu_tx_assign()
which is already set up to delay when we're over the dirty limit and on error all callers are setup to unwind any open transactions. However, updating the wrlog accounting there would be tricky so it probably makes good sense to leave that bit here.
24d74f8
to
ee5f60c
Compare
zfs_wrlog_data_max The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync. It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY. This defaults to the same value of zfs_dirty_data_max zfs_wrlog_data_sync_percent The least TX_WRITE log data (as a percentage of zfs_wrlog_data_max) to kick a txg sync. Add zfs_wrlog_data_max and zfs_wrlog_data_sync_percent to man/man5/zfs-module-parameters.5 Add write-transaction zil log data counter at end of the body of zfs_log_write() and zvol_log_write(). Add delay logic into dmu_tx_try_assign(). Merge remote-tracking branch 'upstream/master' Signed-off-by: jxdking <[email protected]>
Accidentally merged with others changes in to my fork. This pull request is closing. |
I add 2 module parameters regarding total log data size. These can prevent slog device overflow, and long hang on fsync.
zfs_wrlog_data_max
zfs_wrlog_data_sync_percent
Motivation and Context
It resolves long hang regarding fsync under some specific work load. The issue can be reliably produced with fio.
Here is the script.
fio --name=fio-seq-write --rw=write --bs=1M --direct=1 --numjobs=1 --time_based=1 --runtime=30 --fsync=1000 --filename=fio-seq-write --size=20M --ioengine=libaio --iodepth=16
The command repeatedly overwrites a small file. Since the dirty data space never exceed zfs_dirty_data_sync_percent, write operations are cached in the memory and are never throttled. You will see over 1GB/s write speed on reasonable hardware. If slog device cannot keep up the speed, all the write operations are backed up in ram. That cause long system hang and huge memory usage.
The solution is find a way to "kick" txg sync based on current total size of zil log data (log could be either on memory, or on the slog device), and also throttle write when txg sync doesn't help.
Description
I add 2 module parameters regarding total log data size.
zfs_wrlog_data_max
The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync.
It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY.
This defaults to the same value of zfs_dirty_data_max
zfs_wrlog_data_sync_percent
The least TX_WRITE log data (as a percentage of zfs_wrlog_data_max) to kick a txg sync.
How Has This Been Tested?
Tested with fio, please see the example above.
System:
Ubuntu groovy running in kvm, 8 cores, 2GB memory. Host is using HDD (not SSD)
Created a zpool based on file with 1GB slog.
Run fio on the pool.
Types of changes
Checklist:
Signed-off-by
.