Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MMP write frequency for large pools #7289

Merged
merged 1 commit into from
Mar 12, 2018

Conversation

behlendorf
Copy link
Contributor

Description

When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays.

This issue was reported on Arch Linux where HZ defaults to only 100 and thus could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 thus are unlikely to be impacted.

Motivation and Context

Issue #7205

How Has This Been Tested?

Locally created a pool with 2000 vdevs on a system with CONFIG_HZ=1000, enabled multihost and observed an IO rate of ~2000 IO/s (one per-vdev per second). With out the patch the mmp thread would spin.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and thus could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 thus are unlikely to be impacted.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#7205
@behlendorf
Copy link
Contributor Author

@tonyhutter @ofaaland we should also consider this one for 0.7.7.

@codecov
Copy link

codecov bot commented Mar 10, 2018

Codecov Report

Merging #7289 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7289      +/-   ##
==========================================
- Coverage    76.5%    76.4%   -0.11%     
==========================================
  Files         328      328              
  Lines      104005   104005              
==========================================
- Hits        79572    79461     -111     
- Misses      24433    24544     +111
Flag Coverage Δ
#kernel 76.41% <100%> (-0.07%) ⬇️
#user 65.69% <100%> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ee220b...514af1a. Read the comment docs.

Copy link
Contributor

@ofaaland ofaaland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

@behlendorf behlendorf merged commit b7eec00 into openzfs:master Mar 12, 2018
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 12, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205 
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 12, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 13, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
@tonyhutter tonyhutter mentioned this pull request Mar 13, 2018
13 tasks
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 13, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Fix "--enable-code-coverage" debug build openzfs#6674
- Update codecov.yml openzfs#6669
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit that referenced this pull request Mar 19, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7205
Closes #7289
@behlendorf behlendorf deleted the issue-7205 branch April 19, 2021 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants