-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Module Parameter Regarding Log Size Limit #12284
Conversation
zfs_wrlog_data_max The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync. It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY. Add write-transaction log data counter at end of the body of zfs_log_write() and zvol_log_write(). Add delay logic into dmu_tx_try_assign(). Signed-off-by: jxdking <[email protected]>
I remove the module parameter of the sync threshold of zilog size from the original commit. I don't think it is useful to user. Thus, right now, I only add one module parameter regarding the total zilog size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring this, the rational here makes good sense to me just a few suggested changes.
@jxdking unrelated to this PR, but since you've been working to improve the txg sync / ZIL performance I thought you might also be interested in taking a look at issue 11140. Thus far it hasn't been carefully analyzed but initial indications are that for this workload the txg sync / ZIL performance is seriously impacting performance. |
I find zfs is vulnerable to fsync "abuse". When applications call fsync, applications expect most of dirty data is already on stable media, and calling fsync is just to flush the last portion in the cache. In case of zfs, unfortunately, most of dirty data may be still in memory (like the cache). That may cause delay. |
@behlendorf , I just pushed a new commit in which I resolved all the suggestions of the code review. Thanks. |
Updated man/man4/zfs.4 Added dmu_tx_wrlog_over_max to dmu_tx kstat Added ASSERT0 for dp_wrlog_total and dp_wrlog_pertxg[] on dsl_pool_close() Fixed some grammar in comments. Signed-off-by: jxdking <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't independently tested this, but just looking at the code changes, this looks reasonable to me.
Defaults to | ||
.Sy zfs_dirty_data_max*2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this default value, what do we expect the performance impact to be? I'm just wondering if there's any cases where a user may not tune this for their workload, resulting in worse performance than prior to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a workload hits zfs_dirty_data_max limit first. You will not see any performance difference after this patch.
Can a workload hits zfs_wrlog_data_max limit before hit zfs_dirty_data_max?
Yes. If a workload involves write and then rewrite the same data space within the same txg, the zilog size will be greater than dirty data size. It will result in worse performance after this patch if it hits this limit.
I find most of this kind workloads are results of synthetic benchmarks. benchmark tool often writes and rewrites over a pre-allocated file.
In real world, database workload may be in this category. I have heard of put entire database inside memory (L1ARC). Repeatedly write without triggering zfs_dirty_data_sync_percent is one of the key feature. If that's the case, zfs_wrlog_data_max should be adjust to the size of slog device.
The reason to limit zilog size is described as following.
As mentioned above, if a workload can accumulate a huge size of zilog without hit zfs_dirty_data_sync_percent through repeatedly over-write, that could be tens GB big size zilog, once a fsync occurred, there is no way to write out this amount zilog data in a short period of time. That may cause system hangs.
The proposed new module parameter is more like a safety feature than a performance improvement.
I am open to change the initial value, such like 25% of the total available memory. I find sometimes zilog could overflow the memory if memory could not fit that amount of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, makes sense to me. I'm not necessarily opposed to this default value, just wanted to ask.. I'm good as-is.
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue #12284 Closes #13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
* Add Module Parameters Regarding Log Size Limit zfs_wrlog_data_max The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync. It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY. Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: jxdking <[email protected]> Closes openzfs#12284 Signed-off-by: Ameer Hamza <[email protected]>
zfs_wrlog_data_max The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync. It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY. Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: jxdking <[email protected]> Closes openzfs#12284
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476 Signed-off-by: Ameer Hamza <[email protected]> # Conflicts: # man/man4/zfs.4
Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: jxdking <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Issue openzfs#12284 Closes openzfs#13476
This pull request is based on the original #11876
I add a module parameter regarding total log data size. These can prevent slog device overflow, and long hang on fsync.
zfs_wrlog_data_max
Motivation and Context
It resolves long hang regarding fsync under some specific work load. The issue can be reliably produced with fio.
Here is the script.
fio --name=fio-seq-write --rw=write --bs=1M --direct=1 --numjobs=1 --time_based=1 --runtime=30 --fsync=1000 --filename=fio-seq-write --size=20M --ioengine=libaio --iodepth=16
The command repeatedly overwrites a small file. Since the dirty data space never exceed zfs_dirty_data_sync_percent, write operations are cached in the memory and are never throttled. You will see over 1GB/s write speed on reasonable hardware. If slog device cannot keep up the speed, all the write operations are backed up in ram. That cause long system hang and huge memory usage.
The solution is find a way to "kick" txg sync based on current total size of zil log data (log could be either on memory, or on the slog device), and also throttle write when txg sync doesn't help.
Description
I add a module parameter regarding total log data size.
zfs_wrlog_data_max
The upper limit of TX_WRITE log data. Once it is reached,
write operation is blocked, until log data is cleared out
after txg sync. It only counts TX_WRITE log with WR_COPIED
or WR_NEED_COPY.
How Has This Been Tested?
Tested with fio, please see the example above, and observe the slog device usage.
Types of changes
Checklist:
Signed-off-by
.