-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU soft lockup & z_null_iss + mmp high cpu usage #7205
Comments
@morphinz if the system is still in this state could you place grab backtraces for the
|
@behlendorf Yes system still running. Just ask me anything you want.
And process still are there.
|
Today I tried to export the pool but when i start "zpool export" I got cpu lockups again. BTW: I have 2 pool on the server "ssdpool and clspool". Everything almost same on these pools. I will add pool information when i get the pool back.
System logs:
|
@behlendorf what happens here is mmp creates very high I/O if there are enough large number of disks in the pool. I am not sure the exact count but I observed that mmp is good with at 50 disks pool and behaves with at 165 disks pool. A full ssd pool of 165 disk has more than 33.000 I/O when set multihost=on and it becomes idle as soon as multihost is set off. Moreover it is impossible to export this pool unless multihost is set off. May I ask this bugfix with the very next release? I think 0.8 milestone could be too far to fix this one. Let me know if I can put more help in this issue. |
@morphinz, can you provide the output of |
Multihost=off
Htop Multihost=on
As you can see the usage of the pool goes up directly "120M++"
|
@morphinz to be clear, you're saying you didn't observe this behavior in 0.7.5-4? Only 0.7.6? |
@behlendorf Yes. I'm sure. On 0.7.5 I/O usage was 1 for per disk. So I have seen 140-160 I/O for mmp. |
@morphinz on a clean ARCH Linux install running the |
@behlendorf I am runing zfs commit 639b189 and 4.14.23-1-lts kernel. fkmmedium has 93 disks. Here is the pool status.
fkmbig has 167 disks. Here is the pool status.
fkmmedium has no problem setting multihost=on.
fkmbig has issues as I said before.
If I set zfs_multihost_interval>=1621 issue disappears. zpool iostat and htop is normal. However if I set zfs_multihost_interval=<1620 issue arises again. On my other cluster system I have 4.7.6 & 4.14.20-1-lts and 165 disk pool + 225 disk pool. Both pool have the problem. My opinion the problem occurs when you have more than 100++ disk. @behlendorf this is very easy to reproduce. I can set private access to server if needed. Please contact me at [email protected] |
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and thus could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 thus are unlikely to be impacted. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#7205
@morphinz I see the issue. The Arch Linux LTS kernel defaults to CONFIG_HZ_100 which causes mmp not to fully delay between writes when the number of vdevs goes over 100. Our testing was all done with a CONFIG_HZ=1000 which pushes that threshhold up to 1000 vdevs per pool. I've opened #7289 which fixes this issue by switching to the hires timers. This will properly handle low values of HZ and pool with very high numbers of vdevs. The patch is safe to cherry-pick if you'ld like to verify it resolves your issue. |
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7205 Closes openzfs#7289
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7205 Closes openzfs#7289
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7205 Closes openzfs#7289
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <[email protected]> Requires-spl: refs/pull/690/head
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <[email protected]> Requires-spl: refs/pull/690/head
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7205 Closes openzfs#7289
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Fix "--enable-code-coverage" debug build openzfs#6674 - Update codecov.yml openzfs#6669 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <[email protected]> Requires-spl: refs/pull/690/head
@behlendorf I was too busy these days. |
@morphinz no problem, the fix will be part of 0.7.7 so you might just want to wait until you can test the tagged release. |
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7205 Closes #7289
System information
Describe the problem you're observing
After upgrade ZFS & kernel from "zfs-linux 0.7.5.4.14.12.1-1" to "0.7.6-1" and "4.14.12-1-ARCH" to "4.14.19-1-lts" I got the error while importing a pool:
This is the import time:
[2018-02-20 21:32:19] zpool import ssdpool
After 30 sec i got this lockup:
Feb 20 21:32:59 DEV1 kernel: watchdog: BUG: soft lockup - CPU#22 stuck for 21s! [z_null_iss:28262]
Everything works right now but i have lockup cpu and %100 endless cpu usage on some cores.
My ZFS settings:
-
multihost=on
-
zfs_multihost_fail_intervals = 0
Boot settings:
-
pti=off earlymodules=xhci_hcd modules-load=xhci_hcd scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y ipv6.disable_ipv6=1 nomodeset vga=791 quiet loglevel=3 udev.log-priority=3
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: