Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU soft lockup & z_null_iss + mmp high cpu usage #7205

Closed
morphinz opened this issue Feb 21, 2018 · 14 comments
Closed

CPU soft lockup & z_null_iss + mmp high cpu usage #7205

morphinz opened this issue Feb 21, 2018 · 14 comments
Milestone

Comments

@morphinz
Copy link

System information

Type Version/Name
Distribution Name ARCH Linux
Distribution Version SMP Tue Feb 13 13:11:52 CET 2018 x86_64 GNU/Linux
Linux Kernel 4.14.19-1-lts
Architecture x86_64
ZFS Version zfs-linux 0.7.6-1
SPL Version spl-linux 0.7.6-1

Describe the problem you're observing

After upgrade ZFS & kernel from "zfs-linux 0.7.5.4.14.12.1-1" to "0.7.6-1" and "4.14.12-1-ARCH" to "4.14.19-1-lts" I got the error while importing a pool:

This is the import time:
[2018-02-20 21:32:19] zpool import ssdpool

After 30 sec i got this lockup:
Feb 20 21:32:59 DEV1 kernel: watchdog: BUG: soft lockup - CPU#22 stuck for 21s! [z_null_iss:28262]

Everything works right now but i have lockup cpu and %100 endless cpu usage on some cores.

39780 root        0 -20     0     0     0 R 100.  0.0 14h56:58 z_null_iss
31356 root       20   0     0     0     0 R 100.  0.0 15h01:36 mmp
28262 root        0 -20     0     0     0 R 100.  0.0 15h01:45 z_null_iss
6786 root       20   0     0     0     0 R 100.  0.0 14h59:34 mmp

My ZFS settings:
-multihost=on
-zfs_multihost_fail_intervals = 0

Boot settings:
-pti=off earlymodules=xhci_hcd modules-load=xhci_hcd scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y ipv6.disable_ipv6=1 nomodeset vga=791 quiet loglevel=3 udev.log-priority=3

Include any warning/errors/backtraces from the system logs

Feb 20 21:28:26 DEV1 kernel: pci 0000:82:00.0: reg 0x30: [mem 0xfb900000-0xfb9fffff pref]
Feb 20 21:28:26 DEV1 kernel: pci 0000:82:00.0: enabling Extended Tags
Feb 20 21:28:26 DEV1 kernel: pci 0000:82:00.0: supports D1 D2
Feb 20 21:28:26 DEV1 kernel: pci 0000:82:00.0: reg 0x174: [mem 0x00000000-0x0000ffff 64bit]
Feb 20 21:28:26 DEV1 kernel: pci 0000:82:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit] (contains BAR0 for 16 VFs)
...skipping...
Feb 20 21:32:59 DEV1 kernel: watchdog: BUG: soft lockup - CPU#22 stuck for 21s! [z_null_iss:28262]
Feb 20 21:32:59 DEV1 kernel: Modules linked in: msr dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ses enclosure 8021q mrp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp i
Feb 20 21:32:59 DEV1 kernel:  mptscsih mptbase hpsa hid_generic usbhid zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) sd_mod mpt3sas raid_class scsi_transport_sas megaraid_sas ahci
Feb 20 21:32:59 DEV1 kernel: CPU: 22 PID: 28262 Comm: z_null_iss Tainted: P           O    4.14.19-1-lts #1
Feb 20 21:32:59 DEV1 kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
Feb 20 21:32:59 DEV1 kernel: task: ffff8a76a0083a00 task.stack: ffffa61a8c074000
Feb 20 21:32:59 DEV1 kernel: RIP: 0010:SHA256TransformBlocks+0xf14/0x1300 [icp]
Feb 20 21:32:59 DEV1 kernel: RSP: 0018:ffffa61a8c077a00 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
Feb 20 21:32:59 DEV1 kernel: RAX: 00000000ac8afa43 RBX: 0000000021fcf6b8 RCX: 00000000ed685c94
Feb 20 21:32:59 DEV1 kernel: RDX: 00000000d3515c9e RSI: ffff8a769df35bc0 RDI: 000000000000003a
Feb 20 21:32:59 DEV1 kernel: RBP: ffffffffc0642400 R08: 00000000d2693222 R09: 00000000d75e1120
Feb 20 21:32:59 DEV1 kernel: R10: 0000000073c4bb28 R11: 0000000037a3b700 R12: 000000007c107125
Feb 20 21:32:59 DEV1 kernel: R13: 0000000012eca1cf R14: 00000000ffcefb6b R15: 0000000073c4bb28
Feb 20 21:32:59 DEV1 kernel: FS:  0000000000000000(0000) GS:ffff8a78bfb00000(0000) knlGS:0000000000000000
Feb 20 21:32:59 DEV1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 21:32:59 DEV1 kernel: CR2: 00007f5dfd784728 CR3: 00000051c000a003 CR4: 00000000003606e0
Feb 20 21:32:59 DEV1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 20 21:32:59 DEV1 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 20 21:32:59 DEV1 kernel: Call Trace:
Feb 20 21:32:59 DEV1 kernel:  ? SHA2Update+0x1c3/0x1e0 [icp]
Feb 20 21:32:59 DEV1 kernel:  ? sa_register_update_callback_locked+0x20/0x20 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? sha_incremental+0x16/0x20 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? abd_iterate_func+0xa4/0x130 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? sha_incremental+0x20/0x20 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? abd_checksum_SHA256+0x51/0xb0 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? blk_mq_flush_plug_list+0x1c9/0x240
Feb 20 21:32:59 DEV1 kernel:  ? abd_iterate_func+0x73/0x130 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? zio_checksum_compute+0x1ae/0x300 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? zio_checksum_generate+0x3f/0x60 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? zio_execute+0x87/0xe0 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? taskq_thread+0x295/0x4a0 [spl]
Feb 20 21:32:59 DEV1 kernel:  ? wake_up_q+0x70/0x70
Feb 20 21:32:59 DEV1 kernel:  ? zio_reexecute+0x370/0x370 [zfs]
Feb 20 21:32:59 DEV1 kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 20 21:32:59 DEV1 kernel:  ? kthread+0x11a/0x130
Feb 20 21:32:59 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 20 21:32:59 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 20 21:32:59 DEV1 kernel:  ? ret_from_fork+0x35/0x40
Feb 20 21:32:59 DEV1 kernel: Code: 89 d1 45 01 ec 45 01 fc 45 89 d5 45 89 d6 41 c1 c9 02 41 c1 cd 0d 45 89 d7 44 03 64 bd 00 45 31 e9 41 c1 cd 09 41 09 c6 45 31 e9 <41> 21 c7 44 01 e3 45 21 de 45 01 e1
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c5003022e31b (dm-80): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c5003023ff67 (dm-2): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c5003022e3bb (dm-313): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c5003022e2e7 (dm-158): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c5003022e26f (dm-235): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c50093dd9ba3 (dm-377): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c50093cb7353 (dm-334): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c50093d28e0b (dm-273): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c50093ddc013 (dm-202): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c50093dda863 (dm-67): 256
Feb 20 21:34:08 DEV1 kernel: ZFS: Unable to set "noop" scheduler for /dev/mapper/35000c50093d51bf3 (dm-285): 256
lines 9640-96
@morphinz morphinz changed the title BUG: soft lockup - CPU & z_null_iss, mmp high cpu usage CPU soft lockup - CPU & z_null_iss, mmp high cpu usage Feb 21, 2018
@morphinz morphinz changed the title CPU soft lockup - CPU & z_null_iss, mmp high cpu usage CPU soft lockup & z_null_iss + mmp high cpu usage Feb 21, 2018
@behlendorf
Copy link
Contributor

@morphinz if the system is still in this state could you place grab backtraces for the z_null_iss and mmp processes if possible.

cat /proc/<pid>/stack

@morphinz
Copy link
Author

morphinz commented Feb 22, 2018

@behlendorf Yes system still running. Just ask me anything you want.

cat /proc/39780/stack
[<ffffffffffffffff>] 0xffffffffffffffff
cat /proc/31356/stack
[<ffffffffffffffff>] 0xffffffffffffffff
cat /proc/28262/stack
[<ffffffffffffffff>] 0xffffffffffffffff
cat /proc/6786/stack
[<ffffffffffffffff>] 0xffffffffffffffff

And process still are there.

  PID  STIME+  USER      PRI  NI  VIRT   RES   SHR S CPU%    DISK R/W MEM%   TIME+  NLWP     CGROUP IO Command
28262 37h46:40 root        0 -20     0     0     0 R 100.    0.00 B/s  0.0 37h46:40    1 ::/?       B0 z_null_iss
31356 37h45:57 root       20   0     0     0     0 R 101.    0.00 B/s  0.0 37h45:57    1 ::/?       B4 mmp
 6786 37h44:01 root       20   0     0     0     0 R 100.    0.00 B/s  0.0 37h44:01    1 ::/?       B4 mmp
39780 37h40:28 root        0 -20     0     0     0 R 101.    0.00 B/s  0.0 37h40:28    1 ::/?       B0 z_null_iss

@morphinz
Copy link
Author

morphinz commented Feb 23, 2018

Today I tried to export the pool but when i start "zpool export" I got cpu lockups again.
(I still did not reboot the server since When i open the issue. but im gonna shoot the server and run failover. )
I just waited 40 min but nothing changed, zpool export command still waiting.
Zpool status, zfs list vs. Not working either.

BTW: I have 2 pool on the server "ssdpool and clspool".
When I upgrade zfs and kernel I tried to import first ssdpool and I got these lockups. After 2 min later I tried to import clspool and everything was normal. I didnt get any lockups while importing clspool.

Everything almost same on these pools. I will add pool information when i get the pool back.

  PID  STIME+  USER      PRI  NI  VIRT   RES   SHR S CPU%    DISK R/W MEM%   TIME+  NLWP     CGROUP IO Command
28262 70h13:51 root        0 -20     0     0     0 R 100.    0.00 B/s  0.0 70h13:51    1 ::/?       B0 z_null_iss
 6786 70h08:35 root       20   0     0     0     0 R 100.    0.00 B/s  0.0 70h08:35    1 ::/?       B4 mmp
31356 70h11:14 root       20   0     0     0     0 R 100.    0.00 B/s  0.0 70h11:14    1 ::/?       B4 mmp
39780 70h04:45 root        0 -20     0     0     0 R 100.    0.00 B/s  0.0 70h04:45    1 ::/?       B0 z_null_iss
39781  8h14:14 root        0 -20     0     0     0 S 10.6    0.00 B/s  0.0  8h14:14    1 ::/?       B0 z_null_int
28263  8h10:49 root        0 -20     0     0     0 S 10.6    0.00 B/s  0.0  8h10:49    1 ::/?       B0 z_null_int
[<ffffffffffffffff>] 0xffffffffffffffff
[root@DEV1-no-kid ~]# cat /proc/39780/stack
[<ffffffffffffffff>] 0xffffffffffffffff
[root@DEV1-no-kid ~]# cat /proc/31356/stack
[<ffffffffffffffff>] 0xffffffffffffffff
[root@DEV1-no-kid ~]# cat /proc/6786/stack
[<ffffffffffffffff>] 0xffffffffffffffff
[root@DEV1-no-kid ~]# cat /proc/28263/stack
[<ffffffffc05a64f5>] taskq_thread+0x3d5/0x4a0 [spl]
[<ffffffffb209d0da>] kthread+0x11a/0x130
[<ffffffffb28001b5>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

System logs:

Feb 23 19:32:50 DEV1 sshd[1377]: pam_unix(sshd:session): session opened for user root by (uid=0)
Feb 23 19:32:50 DEV1 systemd-logind[16685]: New session c53 of user root.
Feb 23 19:32:50 DEV1 systemd[1]: Started Session c53 of user root.
Feb 23 19:33:08 DEV1 Route(sm1-defaultgw)[11395]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:33:08 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:11346:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:33:28 DEV1 Route(sm1-defaultgw)[19613]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:33:28 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:19575:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:33:35 DEV1 kernel: watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [z_null_iss:28262]
Feb 23 19:33:35 DEV1 kernel: Modules linked in: msr dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ses enclosure 8021q mrp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate pcspkr intel_rapl_perf ttm drm_kms_helper drm agpgart syscopyarea sysfillrect evdev input_leds sysimgblt mei_me joydev mousedev led_class igb mei fb_sys_fops i2c_i801 mac_hid ioatdma i2c_algo_bit lpc_ich ipmi_ssif dca i40e ptp pps_core shpchp bonding ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad button nfsd auth_rpcgss oid_registry nfs_acl lockd grace sch_fq_codel sunrpc ip_tables x_tables dm_mod mptsas
Feb 23 19:33:35 DEV1 kernel:  mptscsih mptbase hpsa hid_generic usbhid zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) sd_mod mpt3sas raid_class scsi_transport_sas megaraid_sas ahci libahci libata virtio_scsi virtio_ring virtio vmw_vsock_vmci_transport vsock vmw_pvscsi scsi_mod vmw_vmci hid xhci_pci ehci_pci ehci_hcd xhci_hcd usbcore usb_common
Feb 23 19:33:35 DEV1 kernel: CPU: 30 PID: 28262 Comm: z_null_iss Tainted: P           O L  4.14.19-1-lts #1
Feb 23 19:33:35 DEV1 kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
Feb 23 19:33:35 DEV1 kernel: task: ffff8a76a0083a00 task.stack: ffffa61a8c074000
Feb 23 19:33:35 DEV1 kernel: RIP: 0010:SHA256TransformBlocks+0xdc9/0x1300 [icp]
Feb 23 19:33:35 DEV1 kernel: RSP: 0018:ffffa61a8c077a00 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff10
Feb 23 19:33:35 DEV1 kernel: RAX: 0000000023b6901d RBX: 000000004ca3a4e3 RCX: 00000000010ea0b6
Feb 23 19:33:35 DEV1 kernel: RDX: 00000000796cb531 RSI: ffff8a5ee878c540 RDI: 0000000000000039
Feb 23 19:33:35 DEV1 kernel: RBP: ffffffffc0642400 R08: 0000000031dcd1c9 R09: 000000006e1f3ae6
Feb 23 19:33:35 DEV1 kernel: R10: 0000000029e8633a R11: 00000000141e2787 R12: 0000000001d95a4a
Feb 23 19:33:35 DEV1 kernel: R13: 00000000da40748e R14: 0000000001a6a0b7 R15: 0000000001068014
Feb 23 19:33:35 DEV1 kernel: FS:  0000000000000000(0000) GS:ffff8ab8bf280000(0000) knlGS:0000000000000000
Feb 23 19:33:35 DEV1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 19:33:35 DEV1 kernel: CR2: 00007fe59ad307b8 CR3: 00000051c000a005 CR4: 00000000003606e0
Feb 23 19:33:35 DEV1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 23 19:33:35 DEV1 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 23 19:33:35 DEV1 kernel: Call Trace:
Feb 23 19:33:35 DEV1 kernel:  ? SHA2Update+0x1c3/0x1e0 [icp]
Feb 23 19:33:35 DEV1 kernel:  ? sa_register_update_callback_locked+0x20/0x20 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? sha_incremental+0x16/0x20 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? abd_iterate_func+0xa4/0x130 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? sha_incremental+0x20/0x20 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? abd_checksum_SHA256+0x51/0xb0 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? blk_mq_flush_plug_list+0x1c9/0x240
Feb 23 19:33:35 DEV1 kernel:  ? abd_iterate_func+0x73/0x130 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? zio_checksum_compute+0x1ae/0x300 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? zio_checksum_generate+0x3f/0x60 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? zio_execute+0x87/0xe0 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? taskq_thread+0x295/0x4a0 [spl]
Feb 23 19:33:35 DEV1 kernel:  ? wake_up_q+0x70/0x70
Feb 23 19:33:35 DEV1 kernel:  ? zio_reexecute+0x370/0x370 [zfs]
Feb 23 19:33:35 DEV1 kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 23 19:33:35 DEV1 kernel:  ? kthread+0x11a/0x130
Feb 23 19:33:35 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:33:35 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:33:35 DEV1 kernel:  ? ret_from_fork+0x35/0x40
Feb 23 19:33:35 DEV1 kernel: Code: c1 cd 0d 41 89 c7 44 03 64 bd 00 45 31 eb 41 c1 cd 09 41 09 ce 45 31 eb 41 21 cf 44 01 e2 41 21 de 45 01 e3 45 09 fe 48 8d 7f 01 <45> 01 f3 44 8b 6c 24 28 44 8b 64 24 1c 45 89 ef 41 c1 ed 03 41
Feb 23 19:33:48 DEV1 Route(sm1-defaultgw)[27722]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:33:48 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:27686:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:34:08 DEV1 Route(sm1-defaultgw)[35842]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:34:08 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:35818:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:34:11 DEV1 kernel: watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [z_null_iss:28262]
Feb 23 19:34:11 DEV1 kernel: Modules linked in: msr dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ses enclosure 8021q mrp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate pcspkr intel_rapl_perf ttm drm_kms_helper drm agpgart syscopyarea sysfillrect evdev input_leds sysimgblt mei_me joydev mousedev led_class igb mei fb_sys_fops i2c_i801 mac_hid ioatdma i2c_algo_bit lpc_ich ipmi_ssif dca i40e ptp pps_core shpchp bonding ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad button nfsd auth_rpcgss oid_registry nfs_acl lockd grace sch_fq_codel sunrpc ip_tables x_tables dm_mod mptsas
Feb 23 19:34:11 DEV1 kernel:  mptscsih mptbase hpsa hid_generic usbhid zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) sd_mod mpt3sas raid_class scsi_transport_sas megaraid_sas ahci libahci libata virtio_scsi virtio_ring virtio vmw_vsock_vmci_transport vsock vmw_pvscsi scsi_mod vmw_vmci hid xhci_pci ehci_pci ehci_hcd xhci_hcd usbcore usb_common
Feb 23 19:34:11 DEV1 kernel: CPU: 14 PID: 28262 Comm: z_null_iss Tainted: P           O L  4.14.19-1-lts #1
Feb 23 19:34:11 DEV1 kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
Feb 23 19:34:11 DEV1 kernel: task: ffff8a76a0083a00 task.stack: ffffa61a8c074000
Feb 23 19:34:11 DEV1 kernel: RIP: 0010:SHA256TransformBlocks+0xcac/0x1300 [icp]
Feb 23 19:34:11 DEV1 kernel: RSP: 0018:ffffa61a8c077a00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
Feb 23 19:34:11 DEV1 kernel: RAX: 00000000733f915b RBX: 000000007872adee RCX: 0000000067101088
Feb 23 19:34:11 DEV1 kernel: RDX: 00000000e5066895 RSI: ffff8a62298d9b00 RDI: 0000000000000017
Feb 23 19:34:11 DEV1 kernel: RBP: ffffffffc0642400 R08: 00000000279ead77 R09: 00000000c9bf9c67
Feb 23 19:34:11 DEV1 kernel: R10: 00000000d71f9f78 R11: 000000002900da79 R12: 0000000000000000
Feb 23 19:34:11 DEV1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Feb 23 19:34:11 DEV1 kernel: FS:  0000000000000000(0000) GS:ffff8ab8bf100000(0000) knlGS:0000000000000000
Feb 23 19:34:11 DEV1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 19:34:11 DEV1 kernel: CR2: 00007f8fa7d67498 CR3: 00000051c000a002 CR4: 00000000003606e0
Feb 23 19:34:11 DEV1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 23 19:34:11 DEV1 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 23 19:34:11 DEV1 kernel: Call Trace:
Feb 23 19:34:11 DEV1 kernel:  ? SHA2Update+0x1c3/0x1e0 [icp]
Feb 23 19:34:11 DEV1 kernel:  ? sa_register_update_callback_locked+0x20/0x20 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? sha_incremental+0x16/0x20 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? abd_iterate_func+0xa4/0x130 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? sha_incremental+0x20/0x20 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? abd_checksum_SHA256+0x51/0xb0 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? blk_mq_flush_plug_list+0x1c9/0x240
Feb 23 19:34:11 DEV1 kernel:  ? abd_iterate_func+0x73/0x130 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? zio_checksum_compute+0x1ae/0x300 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? zio_checksum_generate+0x3f/0x60 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? zio_execute+0x87/0xe0 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? taskq_thread+0x295/0x4a0 [spl]
Feb 23 19:34:11 DEV1 kernel:  ? wake_up_q+0x70/0x70
Feb 23 19:34:11 DEV1 kernel:  ? zio_reexecute+0x370/0x370 [zfs]
Feb 23 19:34:11 DEV1 kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 23 19:34:11 DEV1 kernel:  ? kthread+0x11a/0x130
Feb 23 19:34:11 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:34:11 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:34:11 DEV1 kernel:  ? ret_from_fork+0x35/0x40
Feb 23 19:34:11 DEV1 kernel: Code: 03 41 c1 cf 07 45 31 fd 41 c1 cf 0b 45 31 fd 45 89 e6 41 c1 ec 0a 41 c1 ce 11 45 31 f4 41 c1 ce 02 45 31 f4 45 01 ec 44 03 24 24 <44> 03 64 24 1c 45 89 cd 45 89 ce 45 89 d7 41 c1 cd 06 41 c1 ce
Feb 23 19:34:28 DEV1 Route(sm1-defaultgw)[3156]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:34:28 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:3081:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:34:48 DEV1 Route(sm1-defaultgw)[11263]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:34:48 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:11234:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:35:09 DEV1 Route(sm1-defaultgw)[20449]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:35:09 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:20395:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:35:29 DEV1 Route(sm1-defaultgw)[27701]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:35:29 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:27649:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:35:31 DEV1 kernel: watchdog: BUG: soft lockup - CPU#38 stuck for 23s! [z_null_iss:28262]
Feb 23 19:35:31 DEV1 kernel: Modules linked in: msr dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ses enclosure 8021q mrp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate pcspkr intel_rapl_perf ttm drm_kms_helper drm agpgart syscopyarea sysfillrect evdev input_leds sysimgblt mei_me joydev mousedev led_class igb mei fb_sys_fops i2c_i801 mac_hid ioatdma i2c_algo_bit lpc_ich ipmi_ssif dca i40e ptp pps_core shpchp bonding ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad button nfsd auth_rpcgss oid_registry nfs_acl lockd grace sch_fq_codel sunrpc ip_tables x_tables dm_mod mptsas
Feb 23 19:35:31 DEV1 kernel:  mptscsih mptbase hpsa hid_generic usbhid zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) sd_mod mpt3sas raid_class scsi_transport_sas megaraid_sas ahci libahci libata virtio_scsi virtio_ring virtio vmw_vsock_vmci_transport vsock vmw_pvscsi scsi_mod vmw_vmci hid xhci_pci ehci_pci ehci_hcd xhci_hcd usbcore usb_common
Feb 23 19:35:31 DEV1 kernel: CPU: 38 PID: 28262 Comm: z_null_iss Tainted: P           O L  4.14.19-1-lts #1
Feb 23 19:35:31 DEV1 kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
Feb 23 19:35:31 DEV1 kernel: task: ffff8a76a0083a00 task.stack: ffffa61a8c074000
Feb 23 19:35:31 DEV1 kernel: RIP: 0010:SHA256TransformBlocks+0xe3f/0x1300 [icp]
Feb 23 19:35:31 DEV1 kernel: RSP: 0018:ffffa61a8c077a00 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
Feb 23 19:35:31 DEV1 kernel: RAX: 00000000c8979591 RBX: 000000005f0b83b4 RCX: 00000000e2afb886
Feb 23 19:35:31 DEV1 kernel: RDX: 00000000e09b6bbc RSI: ffff8a753d1ae280 RDI: 0000000000000039
Feb 23 19:35:31 DEV1 kernel: RBP: ffffffffc0642400 R08: 0000000032be68c6 R09: 00000000b5d8272f
Feb 23 19:35:31 DEV1 kernel: R10: 00000000865aa4bb R11: 00000000865aa4bb R12: 00000000caacb494
Feb 23 19:35:31 DEV1 kernel: R13: 00000000c9aba0b3 R14: 000000004db5de70 R15: 0000000035da6c87
Feb 23 19:35:31 DEV1 kernel: FS:  0000000000000000(0000) GS:ffff8ab8bf480000(0000) knlGS:0000000000000000
Feb 23 19:35:31 DEV1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 19:35:31 DEV1 kernel: CR2: 00007fe524d23000 CR3: 00000051c000a002 CR4: 00000000003606e0
Feb 23 19:35:31 DEV1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 23 19:35:31 DEV1 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 23 19:35:31 DEV1 kernel: Call Trace:
Feb 23 19:35:31 DEV1 kernel:  ? SHA2Update+0x1c3/0x1e0 [icp]
Feb 23 19:35:31 DEV1 kernel:  ? sa_register_update_callback_locked+0x20/0x20 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? sha_incremental+0x16/0x20 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? abd_iterate_func+0xa4/0x130 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? sha_incremental+0x20/0x20 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? abd_checksum_SHA256+0x51/0xb0 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? blk_mq_flush_plug_list+0x1c9/0x240
Feb 23 19:35:31 DEV1 kernel:  ? abd_iterate_func+0x73/0x130 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? zio_checksum_compute+0x1ae/0x300 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? zio_checksum_generate+0x3f/0x60 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? zio_execute+0x87/0xe0 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? taskq_thread+0x295/0x4a0 [spl]
Feb 23 19:35:31 DEV1 kernel:  ? wake_up_q+0x70/0x70
Feb 23 19:35:31 DEV1 kernel:  ? zio_reexecute+0x370/0x370 [zfs]
Feb 23 19:35:31 DEV1 kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 23 19:35:31 DEV1 kernel:  ? kthread+0x11a/0x130
Feb 23 19:35:31 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:35:31 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:35:31 DEV1 kernel:  ? ret_from_fork+0x35/0x40
Feb 23 19:35:31 DEV1 kernel: Code: 89 c7 41 c1 cd 06 41 c1 ce 0b 45 31 cf 45 31 f5 41 c1 ce 0e 41 21 d7 44 89 64 24 24 45 31 f5 45 31 cf 45 01 d4 45 89 da 45 01 ec <45> 01 fc 45 89 dd 45 89 de 41 c1 ca 02 41 c1 cd 0d 45 89 df 44
Feb 23 19:35:49 DEV1 Route(sm1-defaultgw)[36951]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:35:49 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:36922:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:36:02 DEV1 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 24 bytes - shutting down socket
Feb 23 19:36:09 DEV1 Route(sm1-defaultgw)[4659]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:36:09 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:4633:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:36:16 DEV1 sshd[7283]: Accepted publickey for root from 192.168.170.6 port 49597 ssh2: RSA SHA256:DChmaJmC/ZqAoykkJm5fXzNJ1GSwvJuMxGer3T3RxMk
Feb 23 19:36:16 DEV1 sshd[7283]: pam_unix(sshd:session): session opened for user root by (uid=0)
Feb 23 19:36:16 DEV1 systemd-logind[16685]: New session c54 of user root.
Feb 23 19:36:16 DEV1 systemd[1]: Started Session c54 of user root.
Feb 23 19:36:29 DEV1 Route(sm1-defaultgw)[12447]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:36:29 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:12397:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:36:49 DEV1 Route(sm1-defaultgw)[20807]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:36:49 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:20769:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:37:09 DEV1 Route(sm1-defaultgw)[28984]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:37:09 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:28952:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:37:29 DEV1 Route(sm1-defaultgw)[36948]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:37:29 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:36850:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:37:43 DEV1 kernel: watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [z_null_iss:28262]
Feb 23 19:37:43 DEV1 kernel: Modules linked in: msr dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ses enclosure 8021q mrp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate pcspkr intel_rapl_perf ttm drm_kms_helper drm agpgart syscopyarea sysfillrect evdev input_leds sysimgblt mei_me joydev mousedev led_class igb mei fb_sys_fops i2c_i801 mac_hid ioatdma i2c_algo_bit lpc_ich ipmi_ssif dca i40e ptp pps_core shpchp bonding ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad button nfsd auth_rpcgss oid_registry nfs_acl lockd grace sch_fq_codel sunrpc ip_tables x_tables dm_mod mptsas
Feb 23 19:37:43 DEV1 kernel:  mptscsih mptbase hpsa hid_generic usbhid zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) sd_mod mpt3sas raid_class scsi_transport_sas megaraid_sas ahci libahci libata virtio_scsi virtio_ring virtio vmw_vsock_vmci_transport vsock vmw_pvscsi scsi_mod vmw_vmci hid xhci_pci ehci_pci ehci_hcd xhci_hcd usbcore usb_common
Feb 23 19:37:43 DEV1 kernel: CPU: 19 PID: 28262 Comm: z_null_iss Tainted: P           O L  4.14.19-1-lts #1
Feb 23 19:37:43 DEV1 kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
Feb 23 19:37:43 DEV1 kernel: task: ffff8a76a0083a00 task.stack: ffffa61a8c074000
Feb 23 19:37:43 DEV1 kernel: RIP: 0010:SHA256TransformBlocks+0x101a/0x1300 [icp]
Feb 23 19:37:43 DEV1 kernel: RSP: 0018:ffffa61a8c077a00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
Feb 23 19:37:43 DEV1 kernel: RAX: 00000000fae423fb RBX: 000000001ccc5b3a RCX: 00000000b77db7b9
Feb 23 19:37:43 DEV1 kernel: RDX: 00000000c036aeac RSI: ffff8a74afc7e700 RDI: 000000000000001c
Feb 23 19:37:43 DEV1 kernel: RBP: ffffffffc0642400 R08: 00000000193ad78a R09: 00000000826790e0
Feb 23 19:37:43 DEV1 kernel: R10: 00000000e61142f8 R11: 00000000bc4bd95b R12: 0000000000000000
Feb 23 19:37:43 DEV1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Feb 23 19:37:43 DEV1 kernel: FS:  0000000000000000(0000) GS:ffff8ab8bf240000(0000) knlGS:0000000000000000
Feb 23 19:37:43 DEV1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 19:37:43 DEV1 kernel: CR2: 000055ee8ea5b018 CR3: 00000051c000a006 CR4: 00000000003606e0
Feb 23 19:37:43 DEV1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 23 19:37:43 DEV1 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 23 19:37:43 DEV1 kernel: Call Trace:
Feb 23 19:37:43 DEV1 kernel:  ? SHA2Update+0x1c3/0x1e0 [icp]
Feb 23 19:37:43 DEV1 kernel:  ? sa_register_update_callback_locked+0x20/0x20 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? sha_incremental+0x16/0x20 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? abd_iterate_func+0xa4/0x130 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? sha_incremental+0x20/0x20 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? abd_checksum_SHA256+0x51/0xb0 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? blk_mq_flush_plug_list+0x1c9/0x240
Feb 23 19:37:43 DEV1 kernel:  ? abd_iterate_func+0x73/0x130 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? zio_checksum_compute+0x1ae/0x300 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? zio_checksum_generate+0x3f/0x60 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? zio_execute+0x87/0xe0 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? taskq_thread+0x295/0x4a0 [spl]
Feb 23 19:37:43 DEV1 kernel:  ? wake_up_q+0x70/0x70
Feb 23 19:37:43 DEV1 kernel:  ? zio_reexecute+0x370/0x370 [zfs]
Feb 23 19:37:43 DEV1 kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 23 19:37:43 DEV1 kernel:  ? kthread+0x11a/0x130
Feb 23 19:37:43 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:37:43 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:37:43 DEV1 kernel:  ? ret_from_fork+0x35/0x40
Feb 23 19:37:43 DEV1 kernel: Code: 31 fd 41 c1 cf 0b 45 31 fd 45 89 e6 41 c1 ec 0a 41 c1 ce 11 45 31 f4 41 c1 ce 02 45 31 f4 45 01 ec 44 03 64 24 14 44 03 64 24 30 <41> 89 c5 41 89 c6 41 89 df 41 c1 cd 06 41 c1 ce 0b 41 31 cf 45
Feb 23 19:37:49 DEV1 Route(sm1-defaultgw)[4790]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:37:49 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:4758:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:38:09 DEV1 Route(sm1-defaultgw)[13391]: ERROR: Address family detection requires a numeric destination address.
Feb 23 19:38:09 DEV1 lrmd[18392]:   notice: sm1-defaultgw_monitor_20000:13361:stderr [ ocf-exit-reason:Address family detection requires a numeric destination address. ]
Feb 23 19:38:27 DEV1 kernel: watchdog: BUG: soft lockup - CPU#39 stuck for 22s! [z_null_iss:28262]
Feb 23 19:38:27 DEV1 kernel: Modules linked in: msr dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ses enclosure 8021q mrp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate pcspkr intel_rapl_perf ttm drm_kms_helper drm agpgart syscopyarea sysfillrect evdev input_leds sysimgblt mei_me joydev mousedev led_class igb mei fb_sys_fops i2c_i801 mac_hid ioatdma i2c_algo_bit lpc_ich ipmi_ssif dca i40e ptp pps_core shpchp bonding ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad button nfsd auth_rpcgss oid_registry nfs_acl lockd grace sch_fq_codel sunrpc ip_tables x_tables dm_mod mptsas
Feb 23 19:38:27 DEV1 kernel:  mptscsih mptbase hpsa hid_generic usbhid zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) sd_mod mpt3sas raid_class scsi_transport_sas megaraid_sas ahci libahci libata virtio_scsi virtio_ring virtio vmw_vsock_vmci_transport vsock vmw_pvscsi scsi_mod vmw_vmci hid xhci_pci ehci_pci ehci_hcd xhci_hcd usbcore usb_common
Feb 23 19:38:27 DEV1 kernel: CPU: 39 PID: 28262 Comm: z_null_iss Tainted: P           O L  4.14.19-1-lts #1
Feb 23 19:38:27 DEV1 kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
Feb 23 19:38:27 DEV1 kernel: task: ffff8a76a0083a00 task.stack: ffffa61a8c074000
Feb 23 19:38:27 DEV1 kernel: RIP: 0010:SHA256TransformBlocks+0xf2f/0x1300 [icp]
Feb 23 19:38:27 DEV1 kernel: RSP: 0018:ffffa61a8c077a00 EFLAGS: 00000203 ORIG_RAX: ffffffffffffff10
Feb 23 19:38:27 DEV1 kernel: RAX: 00000000462410c0 RBX: 00000000f84e7b7b RCX: 0000000037f258bc
Feb 23 19:38:27 DEV1 kernel: RDX: 00000000b3e643a2 RSI: ffff8a6dc112dcc0 RDI: 000000000000001b
Feb 23 19:38:27 DEV1 kernel: RBP: ffffffffc0642400 R08: 00000000e17572a3 R09: 000000001547e237
Feb 23 19:38:27 DEV1 kernel: R10: 0000000039383997 R11: 000000003d8aebfd R12: 00000000f60e0d22
Feb 23 19:38:27 DEV1 kernel: R13: 0000000000000000 R14: 000000003d2839d5 R15: 0000000000201080
Feb 23 19:38:27 DEV1 kernel: FS:  0000000000000000(0000) GS:ffff8ab8bf4c0000(0000) knlGS:0000000000000000
Feb 23 19:38:27 DEV1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 19:38:27 DEV1 kernel: CR2: 00007fe524de8000 CR3: 00000051c000a003 CR4: 00000000003606e0
Feb 23 19:38:27 DEV1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 23 19:38:27 DEV1 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 23 19:38:27 DEV1 kernel: Call Trace:
Feb 23 19:38:27 DEV1 kernel:  ? SHA2Update+0x1c3/0x1e0 [icp]
Feb 23 19:38:27 DEV1 kernel:  ? sa_register_update_callback_locked+0x20/0x20 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? sha_incremental+0x16/0x20 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? abd_iterate_func+0xa4/0x130 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? sha_incremental+0x20/0x20 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? abd_checksum_SHA256+0x51/0xb0 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? blk_mq_flush_plug_list+0x1c9/0x240
Feb 23 19:38:27 DEV1 kernel:  ? abd_iterate_func+0x73/0x130 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? zio_checksum_compute+0x1ae/0x300 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? zio_checksum_generate+0x3f/0x60 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? zio_execute+0x87/0xe0 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? taskq_thread+0x295/0x4a0 [spl]
Feb 23 19:38:27 DEV1 kernel:  ? wake_up_q+0x70/0x70
Feb 23 19:38:27 DEV1 kernel:  ? zio_reexecute+0x370/0x370 [zfs]
Feb 23 19:38:27 DEV1 kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 23 19:38:27 DEV1 kernel:  ? kthread+0x11a/0x130
Feb 23 19:38:27 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:38:27 DEV1 kernel:  ? kthread_create_on_node+0x70/0x70
Feb 23 19:38:27 DEV1 kernel:  ? ret_from_fork+0x35/0x40
Feb 23 19:38:27 DEV1 kernel: Code: 64 bd 00 45 31 e9 41 c1 cd 09 41 09 c6 45 31 e9 41 21 c7 44 01 e3 45 21 de 45 01 e1 45 09 fe 48 8d 7f 01 45 01 f1 44 8b 6c 24 30 <44> 8b 64 24 24 45 89 ef 41 c1 ed 03 41 c1 cf 07 45 31 fd 41 c1

@behlendorf behlendorf added this to the 0.8.0 milestone Feb 23, 2018
@morphinz
Copy link
Author

@behlendorf what happens here is mmp creates very high I/O if there are enough large number of disks in the pool. I am not sure the exact count but I observed that mmp is good with at 50 disks pool and behaves with at 165 disks pool. A full ssd pool of 165 disk has more than 33.000 I/O when set multihost=on and it becomes idle as soon as multihost is set off. Moreover it is impossible to export this pool unless multihost is set off.
BTW when I set multihost=off, %100 cpu usage of "z_null_iss & mmp" drops which causes soft lockups.
These are all very repeatable.

May I ask this bugfix with the very next release? I think 0.8 milestone could be too far to fix this one. Let me know if I can put more help in this issue.

@behlendorf
Copy link
Contributor

@morphinz thanks for the hint this is mmp related. Let me tag @ofaaland so he's aware of this issue, he's been looking in to resolving mmp issues exactly along these lines. Assuming the fixes don't end up being too invasive there's a good chance we'll get them in a point release.

@ofaaland
Copy link
Contributor

@morphinz, can you provide the output of
head /sys/module/zfs/parameters/*multihost*
on the machine while it has the 165-disk pool imported and multihost=on? And to confirm, you are seeing 33,000 I/O per second in that configuration?

@morphinz
Copy link
Author

@ofaaland

Multihost=off

[root@A-server~]# zpool get multihost ssdpool
NAME     PROPERTY   VALUE      SOURCE
ssdpool  multihost  off        default
[root@A-server~]# zpool iostat -yl ssdpool 1
              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub
pool        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ssdpool     9.28T   130T  1.28K     94  18.8M   378K  265us  143us  260us  138us    1us    1us    4us      -      -
ssdpool     9.28T   130T  1.41K    105  20.2M   424K  247us   87us  245us   85us    1us    1us    2us      -      -
ssdpool     9.28T   130T  1.39K     47  19.8M   189K  235us   73us  232us   72us    1us    1us    1us      -      -
ssdpool     9.28T   130T  1.32K     58  19.0M   236K  231us   95us  229us   95us    1us    2us    2us      -      -
ssdpool     9.28T   130T  1.53K  2.13K  22.1M  48.1M  222us    3ms  220us    1ms    1us    1us    1us    2ms      -
ssdpool     9.28T   130T  1.51K  3.34K  23.0M  60.9M  260us    2ms  256us  840us    1us    1us    3us    1ms      -
ssdpool     9.28T   130T  1.40K  4.16K  21.3M  63.8M  253us    1ms  249us  412us    1us    1us    4us  672us      -
ssdpool     9.28T   130T  1.76K  5.65K  26.9M   110M  284us    1ms  272us  676us    3us    1us   13us    1ms      -
ssdpool     9.28T   130T  1.41K     47  20.5M   189K  243us   68us  241us   68us    1us    1us    1us      -      -
ssdpool     9.28T   130T  1.43K  2.52K  21.1M  59.1M  242us    2ms  239us    1ms    1us    1us    1us    1ms      -
ssdpool     9.28T   130T  1.41K     58  21.0M   236K  252us   79us  250us   77us    1us    1us    1us      -      -
ssdpool     9.28T   130T  1.38K      0  19.9M      0  231us      -  228us      -    1us      -    2us      -      -
ssdpool     9.28T   130T  1.61K     47  24.3M   189K  227us   81us  225us   80us    1us    1us    2us      -      -

Htop

ekran resmi 2018-02-28 11 23 32

Multihost=on

[root@A-server~]# zpool set multihost=on ssdpool
[root@A-server~]# zpool iostat -yl ssdpool 1
              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub
pool        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ssdpool     9.28T   130T  1.41K  34.5K  21.6M   173M  330us    1ms  323us    1ms    1us    1us    6us  856us      -
ssdpool     9.28T   130T  1.39K  32.3K  20.9M   129M  261us    1ms  257us    1ms    1us    1us    4us      -      -
ssdpool     9.28T   130T  1.34K  32.5K  19.4M   130M  240us    2ms  236us    2ms    1us    1us    2us      -      -
ssdpool     9.28T   130T  1.29K  31.1K  19.4M   125M  238us    1ms  234us    1ms    1us    1us    2us      -      -
ssdpool     9.28T   130T  1.56K  32.4K  22.8M   130M  237us    1ms  235us    1ms    1us    1us    2us      -      -
ssdpool     9.28T   130T  1.24K  33.5K  18.9M   177M  242us    1ms  236us    1ms    1us    1us    4us    1ms      -
ssdpool     9.28T   130T  1.50K  30.9K  22.7M   124M  244us    2ms  236us    2ms  965ns    1us    6us      -      -

ekran resmi 2018-02-28 11 28 59

As you can see the usage of the pool goes up directly "120M++"
Before the update I mean "0.7.5-4 to 0.7.6"the usage was 100-140 I/O. Right now I see 30K++.

[root@A-server~]# head /sys/module/zfs/parameters/*multihost*
==> /sys/module/zfs/parameters/zfs_multihost_fail_intervals <==
0

==> /sys/module/zfs/parameters/zfs_multihost_history <==
0

==> /sys/module/zfs/parameters/zfs_multihost_import_intervals <==
10

==> /sys/module/zfs/parameters/zfs_multihost_interval <==
1000

@behlendorf
Copy link
Contributor

behlendorf commented Feb 28, 2018

@morphinz to be clear, you're saying you didn't observe this behavior in 0.7.5-4? Only 0.7.6?

@morphinz
Copy link
Author

@behlendorf Yes. I'm sure. On 0.7.5 I/O usage was 1 for per disk. So I have seen 140-160 I/O for mmp.

@behlendorf
Copy link
Contributor

@morphinz on a clean ARCH Linux install running the 4.14.20-1-lts kernel and 0.7.6 from the zfs-linux-lts AUR I was unable to reproduce this issue. I see the expected 1 I/O per second per disk when multihost=on. Is there something else specific about your configuration? Does this issue persist after upgrading the kernel and rebuilding 0.7.6?

@morphinz
Copy link
Author

morphinz commented Mar 9, 2018

@behlendorf I am runing zfs commit 639b189 and 4.14.23-1-lts kernel.
I have two different pools, fkmmedium and fkmbig.

fkmmedium has 93 disks. Here is the pool status.

zpool status fkmmedium
  pool: fkmmedium
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: none requested
config:

	NAME                   STATE     READ WRITE CKSUM
	fkmmedium              ONLINE       0     0     0
	  raidz2-0             ONLINE       0     0     0
	    35000c500844bbd4b  ONLINE       0     0     0
	    35000c50095f3fa87  ONLINE       0     0     0
	    35000c500844b8017  ONLINE       0     0     0
	    35000c5009604e493  ONLINE       0     0     0
	    35000c500844b9e1f  ONLINE       0     0     0
	    35000c50095f9e53f  ONLINE       0     0     0
	    35000c50095ff4abf  ONLINE       0     0     0
	  raidz2-1             ONLINE       0     0     0
	    35000c50095bf136b  ONLINE       0     0     0
	    35000c5009604c2ab  ONLINE       0     0     0
	    35000c500844bd82f  ONLINE       0     0     0
	    35000c50095f852b7  ONLINE       0     0     0
	    35000c50095f4e74f  ONLINE       0     0     0
	    35000c5009604c7af  ONLINE       0     0     0
	    35000c50095fe3b53  ONLINE       0     0     0
	  raidz2-2             ONLINE       0     0     0
	    35000c50095f38377  ONLINE       0     0     0
	    35000c50096071d7f  ONLINE       0     0     0
	    35000c500960f1e23  ONLINE       0     0     0
	    35000c500844c14bf  ONLINE       0     0     0
	    35000c5009609d11f  ONLINE       0     0     0
	    35000c500960c0f4b  ONLINE       0     0     0
	    35000c500844bae13  ONLINE       0     0     0
	  raidz2-3             ONLINE       0     0     0
	    35000c500960f3507  ONLINE       0     0     0
	    35000c50095f5e8c7  ONLINE       0     0     0
	    35000c50095f60dd3  ONLINE       0     0     0
	    35000c500844a7353  ONLINE       0     0     0
	    35000c50095f7deb3  ONLINE       0     0     0
	    35000c500960c67e7  ONLINE       0     0     0
	    35000c500960c4a3f  ONLINE       0     0     0
	  raidz2-4             ONLINE       0     0     0
	    35000c5009603592b  ONLINE       0     0     0
	    35000c500844ad437  ONLINE       0     0     0
	    35000c50095f33f07  ONLINE       0     0     0
	    35000c50096034fcb  ONLINE       0     0     0
	    35000c500960925ff  ONLINE       0     0     0
	    35000c500844be673  ONLINE       0     0     0
	    35000c50096125517  ONLINE       0     0     0
	  raidz2-5             ONLINE       0     0     0
	    35000c50095f92327  ONLINE       0     0     0
	    35000c500844bca53  ONLINE       0     0     0
	    35000c50095f4b277  ONLINE       0     0     0
	    35000c500844bd9d3  ONLINE       0     0     0
	    35000c50095f7df27  ONLINE       0     0     0
	    35000c500844bd4bb  ONLINE       0     0     0
	    35000c5009604ee5f  ONLINE       0     0     0
	  raidz2-6             ONLINE       0     0     0
	    35000c50095ddc787  ONLINE       0     0     0
	    35000c50095ff45df  ONLINE       0     0     0
	    35000c50096071ebb  ONLINE       0     0     0
	    35000c5009609dc8b  ONLINE       0     0     0
	    35000c50095f3e27b  ONLINE       0     0     0
	    35000c500844bc42f  ONLINE       0     0     0
	    35000c50095fa4327  ONLINE       0     0     0
	  raidz2-7             ONLINE       0     0     0
	    35000c500960eb903  ONLINE       0     0     0
	    35000c500844b114b  ONLINE       0     0     0
	    35000c50095e30d63  ONLINE       0     0     0
	    35000c500960e81e3  ONLINE       0     0     0
	    35000c500844ba6ff  ONLINE       0     0     0
	    35000c500844ad27f  ONLINE       0     0     0
	    35000c50095f5afef  ONLINE       0     0     0
	  raidz2-8             ONLINE       0     0     0
	    35000c50095f9b4f7  ONLINE       0     0     0
	    35000c500844b127f  ONLINE       0     0     0
	    35000c50095f7d71f  ONLINE       0     0     0
	    35000c50095e27c17  ONLINE       0     0     0
	    35000c500844bdc1f  ONLINE       0     0     0
	    35000c500844b5fab  ONLINE       0     0     0
	    35000c50095e1a507  ONLINE       0     0     0
	  raidz2-9             ONLINE       0     0     0
	    35000c500844bef67  ONLINE       0     0     0
	    35000c50095ede39f  ONLINE       0     0     0
	    35000c50095f843af  ONLINE       0     0     0
	    35000c500844bc45f  ONLINE       0     0     0
	    35000c50095f4b45b  ONLINE       0     0     0
	    35000c500844b81b7  ONLINE       0     0     0
	    35000c500844bc053  ONLINE       0     0     0
	  raidz2-10            ONLINE       0     0     0
	    35000c50095f92a8f  ONLINE       0     0     0
	    35000c50095f8ba23  ONLINE       0     0     0
	    35000c500844be667  ONLINE       0     0     0
	    35000c5009606ed9b  ONLINE       0     0     0
	    35000c500844bf083  ONLINE       0     0     0
	    35000c50095f56c9f  ONLINE       0     0     0
	    35000c500844ad1bb  ONLINE       0     0     0
	  raidz2-11            ONLINE       0     0     0
	    35000c50095e1a287  ONLINE       0     0     0
	    35000c50095f8f597  ONLINE       0     0     0
	    35000c500960f3df3  ONLINE       0     0     0
	    35000c50095fb12b7  ONLINE       0     0     0
	    35000c50095febecf  ONLINE       0     0     0
	    35000c500844b839f  ONLINE       0     0     0
	    35000c5009609287f  ONLINE       0     0     0
	logs
	  35000c5003023fe73    ONLINE       0     0     0
	  35000c5003022e38f    ONLINE       0     0     0
	cache
	  35000c5003023feeb    ONLINE       0     0     0
	  35000c5003023feef    ONLINE       0     0     0
	  35000c5003022e327    ONLINE       0     0     0
	  35000c5003022e2f3    ONLINE       0     0     0
	  35000c5003022e2c3    ONLINE       0     0     0
	spares
	  35000c50095f7b9c3    AVAIL
	  35000c50086636db3    AVAIL

errors: No known data errors

fkmbig has 167 disks. Here is the pool status.

zpool status fkmbig
  pool: fkmbig
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 19:01:45 with 0 errors on Mon Feb 26 14:22:26 2018
config:

	NAME                   STATE     READ WRITE CKSUM
	fkmbig                 ONLINE       0     0     0
	  raidz2-0             ONLINE       0     0     0
	    35000c50093d4c3db  ONLINE       0     0     0
	    35000c50093dda10b  ONLINE       0     0     0
	    35000c50093d4ec27  ONLINE       0     0     0
	    35000c50093d4c8f3  ONLINE       0     0     0
	    35000c50093dd9f43  ONLINE       0     0     0
	    35000c50093d51c43  ONLINE       0     0     0
	    35000c50093cc22ef  ONLINE       0     0     0
	    35000c50093d4ef53  ONLINE       0     0     0
	    35000c50093d4e7c7  ONLINE       0     0     0
	    35000c50093ddcf1f  ONLINE       0     0     0
	  raidz2-1             ONLINE       0     0     0
	    35000c50093d4d0a3  ONLINE       0     0     0
	    35000c50093d5177f  ONLINE       0     0     0
	    35000c50093d5254f  ONLINE       0     0     0
	    35000c50093d50c7b  ONLINE       0     0     0
	    35000c50093cb15b3  ONLINE       0     0     0
	    35000c50093ddd3d7  ONLINE       0     0     0
	    35000c50093d5135f  ONLINE       0     0     0
	    35000c50093d4e0ff  ONLINE       0     0     0
	    35000c50093cbfdc3  ONLINE       0     0     0
	    35000c50093d4d1c3  ONLINE       0     0     0
	  raidz2-2             ONLINE       0     0     0
	    35000c50093cb013f  ONLINE       0     0     0
	    35000c50093d4ff23  ONLINE       0     0     0
	    35000c50093cafcdb  ONLINE       0     0     0
	    35000c50093d51a27  ONLINE       0     0     0
	    35000c50093cac95f  ONLINE       0     0     0
	    35000c50093d51bf7  ONLINE       0     0     0
	    35000c50093d5039f  ONLINE       0     0     0
	    35000c50093d4c6eb  ONLINE       0     0     0
	    35000c50093d4dc8f  ONLINE       0     0     0
	    35000c50093dd7f8f  ONLINE       0     0     0
	  raidz2-3             ONLINE       0     0     0
	    35000c50093d4f027  ONLINE       0     0     0
	    35000c50093cabf6b  ONLINE       0     0     0
	    35000c50093d4c57f  ONLINE       0     0     0
	    35000c50093d4ff47  ONLINE       0     0     0
	    35000c50093d4f127  ONLINE       0     0     0
	    35000c50093cb69d3  ONLINE       0     0     0
	    35000c50093d4dd1b  ONLINE       0     0     0
	    35000c50093ddb0c3  ONLINE       0     0     0
	    35000c50093d51213  ONLINE       0     0     0
	    35000c50093cb4763  ONLINE       0     0     0
	  raidz2-4             ONLINE       0     0     0
	    35000c50093d51aa3  ONLINE       0     0     0
	    35000c50093ddaacf  ONLINE       0     0     0
	    35000c50093d4fdc7  ONLINE       0     0     0
	    35000c50093d4c09f  ONLINE       0     0     0
	    35000c50093d4c353  ONLINE       0     0     0
	    35000c50093caf60f  ONLINE       0     0     0
	    35000c50093d4cba7  ONLINE       0     0     0
	    35000c50093cafc97  ONLINE       0     0     0
	    35000c50093d50313  ONLINE       0     0     0
	    35000c50093cb11b7  ONLINE       0     0     0
	  raidz2-5             ONLINE       0     0     0
	    35000c50093d4fbbb  ONLINE       0     0     0
	    35000c50093d5151f  ONLINE       0     0     0
	    35000c50093d516df  ONLINE       0     0     0
	    35000c50093d506eb  ONLINE       0     0     0
	    35000c50093d50437  ONLINE       0     0     0
	    35000c50099287da7  ONLINE       0     0     0
	    35000c50093dde247  ONLINE       0     0     0
	    35000c50093d4fbf3  ONLINE       0     0     0
	    35000c50093d4dcff  ONLINE       0     0     0
	    35000c50093d5035b  ONLINE       0     0     0
	  raidz2-6             ONLINE       0     0     0
	    35000c50093d4d123  ONLINE       0     0     0
	    35000c50093d4d19f  ONLINE       0     0     0
	    35000c50093cac377  ONLINE       0     0     0
	    35000c50093ddb1bb  ONLINE       0     0     0
	    35000c50093d5063f  ONLINE       0     0     0
	    35000c50093d4d21f  ONLINE       0     0     0
	    35000c50093d4f2f3  ONLINE       0     0     0
	    35000c50093cb8623  ONLINE       0     0     0
	    35000c50093ddeef3  ONLINE       0     0     0
	    35000c50093cacf6f  ONLINE       0     0     0
	  raidz2-7             ONLINE       0     0     0
	    35000c50093d4ebf3  ONLINE       0     0     0
	    35000c50093d51a33  ONLINE       0     0     0
	    35000c50093dd6d1f  ONLINE       0     0     0
	    35000c50093cddcab  ONLINE       0     0     0
	    35000c50093d5208f  ONLINE       0     0     0
	    35000c50093caefe3  ONLINE       0     0     0
	    35000c50093d4db73  ONLINE       0     0     0
	    35000c50093caf9b7  ONLINE       0     0     0
	    35000c50093d4d26f  ONLINE       0     0     0
	    35000c50093cac297  ONLINE       0     0     0
	  raidz2-8             ONLINE       0     0     0
	    35000c50093d50d77  ONLINE       0     0     0
	    35000c50093cacd73  ONLINE       0     0     0
	    35000c50093d502f7  ONLINE       0     0     0
	    35000c50093cb1457  ONLINE       0     0     0
	    35000c50093d4ee0f  ONLINE       0     0     0
	    35000c50093d4c787  ONLINE       0     0     0
	    35000c50093dd8ed7  ONLINE       0     0     0
	    35000c50093cb08f7  ONLINE       0     0     0
	    35000c50093d504d7  ONLINE       0     0     0
	    35000c50093cde84f  ONLINE       0     0     0
	  raidz2-9             ONLINE       0     0     0
	    35000c50093d521d3  ONLINE       0     0     0
	    35000c50093d4d0b3  ONLINE       0     0     0
	    35000c50093d4f7c7  ONLINE       0     0     0
	    35000c50093ca46f7  ONLINE       0     0     0
	    35000c50093d50ed3  ONLINE       0     0     0
	    35000c50093cb9dcb  ONLINE       0     0     0
	    35000c50093dda56b  ONLINE       0     0     0
	    35000c50093cb17fb  ONLINE       0     0     0
	    35000c50093d51417  ONLINE       0     0     0
	    35000c50093ddc737  ONLINE       0     0     0
	  raidz2-10            ONLINE       0     0     0
	    35000c50093d4ea97  ONLINE       0     0     0
	    35000c50093cb7f93  ONLINE       0     0     0
	    35000c50093dd7acf  ONLINE       0     0     0
	    35000c50093cadf9f  ONLINE       0     0     0
	    35000c50093d511a3  ONLINE       0     0     0
	    35000c50093ca8f63  ONLINE       0     0     0
	    35000c50093d4fb9f  ONLINE       0     0     0
	    35000c50093cb9e87  ONLINE       0     0     0
	    35000c50093ddad77  ONLINE       0     0     0
	    35000c50093caf45f  ONLINE       0     0     0
	  raidz2-11            ONLINE       0     0     0
	    35000c50093ddae8b  ONLINE       0     0     0
	    35000c50093caf05b  ONLINE       0     0     0
	    35000c50093dda18b  ONLINE       0     0     0
	    35000c50093cafdf7  ONLINE       0     0     0
	    35000c50093cae9bf  ONLINE       0     0     0
	    35000c50093cade1b  ONLINE       0     0     0
	    35000c50093dd9fb3  ONLINE       0     0     0
	    35000c50093cafafb  ONLINE       0     0     0
	    35000c50093d4fe07  ONLINE       0     0     0
	    35000c50093cb0a9f  ONLINE       0     0     0
	  raidz2-12            ONLINE       0     0     0
	    35000c50093dd989f  ONLINE       0     0     0
	    35000c50093cabf8b  ONLINE       0     0     0
	    35000c50093dd9503  ONLINE       0     0     0
	    35000c50093cb738f  ONLINE       0     0     0
	    35000c50093d4d0ab  ONLINE       0     0     0
	    35000c50093cb896b  ONLINE       0     0     0
	    35000c50093dda993  ONLINE       0     0     0
	    35000c50093d4df8f  ONLINE       0     0     0
	    35000c50093ddb9af  ONLINE       0     0     0
	    35000c50093cac847  ONLINE       0     0     0
	  raidz2-13            ONLINE       0     0     0
	    35000c50093dde76b  ONLINE       0     0     0
	    35000c50093dd943f  ONLINE       0     0     0
	    35000c50093d50b8f  ONLINE       0     0     0
	    35000c50093d4d95f  ONLINE       0     0     0
	    35000c50093ddbf63  ONLINE       0     0     0
	    35000c50093d5024b  ONLINE       0     0     0
	    35000c50093dd9f5f  ONLINE       0     0     0
	    35000c50093d4ebe3  ONLINE       0     0     0
	    35000c50093dd98d7  ONLINE       0     0     0
	    35000c50093d50bd7  ONLINE       0     0     0
	  raidz2-14            ONLINE       0     0     0
	    35000c50093ddbb83  ONLINE       0     0     0
	    35000c50093d4c0eb  ONLINE       0     0     0
	    35000c50093cace4f  ONLINE       0     0     0
	    35000c50098f99333  ONLINE       0     0     0
	    35000c50093d4c77f  ONLINE       0     0     0
	    35000c50093dda193  ONLINE       0     0     0
	    35000c50093dda293  ONLINE       0     0     0
	    35000c50093d4f8ef  ONLINE       0     0     0
	    35000c50093d29e03  ONLINE       0     0     0
	    35000c50093dd6da7  ONLINE       0     0     0
	  raidz2-15            ONLINE       0     0     0
	    35000c50093d4c6f7  ONLINE       0     0     0
	    35000c50093dd863b  ONLINE       0     0     0
	    35000c50093ddc253  ONLINE       0     0     0
	    35000c50093cacbeb  ONLINE       0     0     0
	    35000c50093d5017b  ONLINE       0     0     0
	    35000c50093d3c05f  ONLINE       0     0     0
	    35000c50093d4f10b  ONLINE       0     0     0
	    35000c50093d4fd1b  ONLINE       0     0     0
	    35000c50093d51197  ONLINE       0     0     0
	    35000c50093d51447  ONLINE       0     0     0
	logs
	  35000c50030240077    ONLINE       0     0     0
	  35000c5003023ff07    ONLINE       0     0     0
	cache
	  35000c5003023fef3    ONLINE       0     0     0
	  35000c5003022e38b    ONLINE       0     0     0
	  35000c5003022e2a7    ONLINE       0     0     0
	  35000c50030240053    ONLINE       0     0     0
	  35000c5003024035b    ONLINE       0     0     0

errors: No known data errors

fkmmedium has no problem setting multihost=on.

zpool set multihost=on fkmmedium

zpool iostat 1 fkmmedium show 50 write IOPS. htop shows no cpu usage.```

fkmbig has issues as I said before.

zpool import -o cachefile=none fkmbig
zpool set multihost=on fkmbig

zpool iostat 1 fkmbig shows 32K write IOPS. htop shows 2 of 32 cores is stucked at ~%100.```

If I set zfs_multihost_interval>=1621 issue disappears. zpool iostat and htop is normal. However if I set zfs_multihost_interval=<1620 issue arises again.

On my other cluster system I have 4.7.6 & 4.14.20-1-lts and 165 disk pool + 225 disk pool. Both pool have the problem. My opinion the problem occurs when you have more than 100++ disk.
So this means you should not see if you have not more than 100++ disk in one pool.

@behlendorf this is very easy to reproduce. I can set private access to server if needed. Please contact me at [email protected]

behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 9, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and thus could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 thus are unlikely to be impacted.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#7205
@behlendorf
Copy link
Contributor

@morphinz I see the issue. The Arch Linux LTS kernel defaults to CONFIG_HZ_100 which causes mmp not to fully delay between writes when the number of vdevs goes over 100. Our testing was all done with a CONFIG_HZ=1000 which pushes that threshhold up to 1000 vdevs per pool.

I've opened #7289 which fixes this issue by switching to the hires timers. This will properly handle low values of HZ and pool with very high numbers of vdevs. The patch is safe to cherry-pick if you'ld like to verify it resolves your issue.

tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 12, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205 
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 12, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7205
Closes openzfs#7289
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Fix "--enable-code-coverage" debug build openzfs#6674
- Update codecov.yml openzfs#6669
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
@morphinz
Copy link
Author

@behlendorf I was too busy these days.
I will try the patch as soon as I can.
Thank you for the patch btw.

@behlendorf
Copy link
Contributor

@morphinz no problem, the fix will be part of 0.7.7 so you might just want to wait until you can test the tagged release.

tonyhutter pushed a commit that referenced this issue Mar 19, 2018
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7205
Closes #7289
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants