[PROXMOX] ZFS replication crashes the pool and servers #15375

djgerry300 · 2023-10-09T15:02:54Z

System information

Describe the problem you're observing

ZFS completely crashes when doing a zfs send/recv command (sometimes booting a VM can trigger the same reaction).
This makes zfs replication between servers hang and eventually crashes the zpool and takes the server down with it, making zfs completely unusable as a hypervisor backend.

I have 9 servers in this proxmox cluster and this message pops up only on the 2 most recently rented servers.
Never had an issue like this before where the whole server was rendered unusable.

Describe how to reproduce the problem

Clean install zfs via proxmox, zfs send a vm image (100GB+) to this server via replication, next I do a ZFS replication send to another healthy server in our cluster. Booting the VM mid-transfer somethimes triggers this behaviour, altough it's not required.
Might occure after 2MB transfered, might happen after 75GB, ...

Include any warning/errors/backtraces from the system logs

I was able to capture follow kernel messages via serial-over-LAN:

[  449.713210] BUG: unable to handle page fault for address: ff40d10c8841ecff
[  449.721134] #PF: supervisor write access in kernel mode
[  449.727145] #PF: error_code(0x0003) - permissions violation
[  449.733576] PGD 6fc7c01067 P4D 6fc7c02067 PUD 1415cd063 PMD 148605063 PTE 800000014841e161
[  449.743073] Oops: 0003 [#1] PREEMPT SMP NOPTI
[  449.748125] CPU: 63 PID: 1541979 Comm: z_rd_int_6 Tainted: P        W  O       6.2.16-15-pve #1
[  449.758118] Hardware name: To Be Filled By O.E.M. SPC741D8QM3-NL/SPC741D8QM3-NL, BIOS 1.05.OV01 06/07/2023
[  449.769200] RIP: 0010:kfpu_begin+0x31/0xa0 [zcommon]
[  449.774947] Code: 3f 48 89 e5 fa 0f 1f 44 00 00 48 8b 15 88 89 00 00 65 8b 05 6d b5 4b 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 29 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 0f 1f 44
[  449.796396] RSP: 0018:ff778a5309da37f0 EFLAGS: 00010082
[  449.802402] RAX: 00000000ffffffff RBX: ff40d10e30a08000 RCX: ff40d10c8841c000
[  449.810610] RDX: 00000000ffffffff RSI: ff40d10e30a08000 RDI: ff778a5309da3940
[  449.818810] RBP: ff778a5309da37f0 R08: 0000000000000000 R09: 000001767c2bc000
[  449.827018] R10: ff40d10df4f1f500 R11: 0000000000000000 R12: ff40d10e30a0a000
[  449.835208] R13: ff778a5309da3940 R14: 0000000000002000 R15: 0000000000000000
[  449.846413] FS:  0000000000000000(0000) GS:ff40d189bffc0000(0000) knlGS:0000000000000000
[  449.858790] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  449.868474] CR2: ff40d10c8841ecff CR3: 0000006fc6210006 CR4: 0000000000773ee0
[  449.879722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  449.890944] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  449.902084] PKRU: 55555554
[  449.908228] Call Trace:
[  449.914079]  <TASK>
[  449.919531]  ? show_regs+0x6d/0x80
[  449.926378]  ? __die+0x24/0x80
[  449.932777]  ? page_fault_oops+0x176/0x500
[  449.940327]  ? kfpu_begin+0x31/0xa0 [zcommon]
[  449.948169]  ? kernelmode_fixup_or_oops+0xb2/0x140
[  449.956493]  ? __bad_area_nosemaphore+0x1a5/0x2c0
[  449.964742]  ? bad_area_nosemaphore+0x16/0x30
[  449.972530]  ? do_kern_addr_fault+0x7b/0xa0
[  449.980104]  ? exc_page_fault+0x10a/0x1b0
[  449.987429]  ? asm_exc_page_fault+0x27/0x30
[  449.994895]  ? kfpu_begin+0x31/0xa0 [zcommon]
[  450.002506]  fletcher_4_avx512f_native+0x1d/0xb0 [zcommon]
[  450.011397]  abd_fletcher_4_iter+0x71/0xe0 [zcommon]
[  450.019701]  abd_iterate_func+0x104/0x1e0 [zfs]
[  450.027546]  ? __pfx_abd_fletcher_4_iter+0x10/0x10 [zcommon]
[  450.036540]  ? __pfx_abd_fletcher_4_native+0x10/0x10 [zfs]
[  450.045392]  abd_fletcher_4_native+0x89/0xd0 [zfs]
[  450.053384]  zio_checksum_error_impl+0x1b3/0x800 [zfs]
[  450.061709]  ? __slab_free+0xe9/0x2f0
[  450.068256]  ? update_load_avg+0x82/0x810
[  450.075105]  ? __slab_free+0xe9/0x2f0
[  450.081495]  zio_checksum_error+0x6e/0xf0 [zfs]
[  450.088908]  vdev_raidz_io_done+0x225/0x810 [zfs]
[  450.096496]  zio_vdev_io_done+0x81/0x240 [zfs]
[  450.103765]  zio_execute+0x94/0x170 [zfs]
[  450.110540]  taskq_thread+0x2ac/0x4d0 [spl]
[  450.117366]  ? __pfx_default_wake_function+0x10/0x10
[  450.125113]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[  450.132558]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[  450.139982]  kthread+0xe6/0x110
[  450.145583]  ? __pfx_kthread+0x10/0x10
[  450.151828]  ret_from_fork+0x29/0x50
[  450.157847]  </TASK>
[  450.162304] Modules linked in: tcp_diag inet_diag ceph libceph fscache netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_physdev xt_addrtype xt_multiport xt_conntrack xt_comment xt_tcpudp xt_set xt_mark ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nvme_fabrics iptable_filter bpfilter softdog nfnetlink_cttimeout bonding openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd drm_shmem_helper cryptd cmdlinepart drm_kms_helper pmt_telemetry pmt_crashlog i2c_algo_bit spi_nor mei_me pmt_class intel_sdsi syscopyarea idxd rapl isst_if_mmio
[  450.162359]  isst_if_mbox_pci sysfillrect acpi_ipmi mtd intel_vsec idxd_bus mei isst_if_common intel_cstate sysimgblt ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter joydev input_leds mac_hid isofs zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear simplefb mlx5_ib ib_uverbs ib_core usbkbd hid_generic usbmouse usbhid hid cdc_ether usbnet mii raid1 mlx5_core xhci_pci nvme mlxfw xhci_pci_renesas psample crc32_pclmul nvme_core tls xhci_hcd ahci spi_intel_pci pci_hyperv_intf i2c_i801 nvme_common libahci spi_intel i2c_ismt i2c_smbus wmi pinctrl_emmitsburg
[  450.397619] CR2: ff40d10c8841ecff
[  450.404964] ---[ end trace 0000000000000000 ]---
[  450.474413] RIP: 0010:kfpu_begin+0x31/0xa0 [zcommon]
[  450.483643] Code: 3f 48 89 e5 fa 0f 1f 44 00 00 48 8b 15 88 89 00 00 65 8b 05 6d b5 4b 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 29 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 0f 1f 44
[  450.515984] RSP: 0018:ff778a5309da37f0 EFLAGS: 00010082
[  450.525720] RAX: 00000000ffffffff RBX: ff40d10e30a08000 RCX: ff40d10c8841c000
[  450.537675] RDX: 00000000ffffffff RSI: ff40d10e30a08000 RDI: ff778a5309da3940
[  450.549600] RBP: ff778a5309da37f0 R08: 0000000000000000 R09: 000001767c2bc000
[  450.561547] R10: ff40d10df4f1f500 R11: 0000000000000000 R12: ff40d10e30a0a000
[  450.573501] R13: ff778a5309da3940 R14: 0000000000002000 R15: 0000000000000000
[  450.585457] FS:  0000000000000000(0000) GS:ff40d189bffc0000(0000) knlGS:0000000000000000
[  450.598549] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  450.609005] CR2: ff40d10c8841ecff CR3: 0000006fc6210006 CR4: 0000000000773ee0
[  450.621098] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  450.633201] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  450.645216] PKRU: 55555554
[  450.652143] note: z_rd_int_6[1541979] exited with irqs disabled
[  450.662720] note: z_rd_int_6[1541979] exited with preempt_count 1
[  450.673666] x86/split lock detection: #AC: CPU 1/KVM/1539989 took a split_lock trap at address: 0x7ef1e050
[  450.673669] x86/split lock detection: #AC: CPU 2/KVM/1540003 took a split_lock trap at address: 0x7ef1e050
[  450.680318] BUG: unable to handle page fault for address: ff40d10c8841ecff
[  450.680320] #PF: supervisor write access in kernel mode
[  450.680321] #PF: error_code(0x0003) - permissions violation
[  450.680323] PGD 6fc7c01067 P4D 6fc7c02067 PUD 1415cd063 PMD 148605063 PTE 800000014841e161
[  450.680326] Oops: 0003 [#2] PREEMPT SMP NOPTI
[  450.680328] CPU: 63 PID: 1548599 Comm: z_rd_int_5 Tainted: P      D W  O       6.2.16-15-pve #1
[  450.680330] Hardware name: To Be Filled By O.E.M. SPC741D8QM3-NL/SPC741D8QM3-NL, BIOS 1.05.OV01 06/07/2023
[  450.680331] RIP: 0010:kfpu_begin+0x31/0xa0 [zcommon]
[  450.680341] Code: 3f 48 89 e5 fa 0f 1f 44 00 00 48 8b 15 88 89 00 00 65 8b 05 6d b5 4b 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 29 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 0f 1f 44
[  450.680342] RSP: 0018:ff778a531c8e77f0 EFLAGS: 00010082
[  450.680344] RAX: 00000000ffffffff RBX: ff40d10e2e8c9000 RCX: ff40d10c8841c000
[  450.680345] RDX: 00000000ffffffff RSI: ff40d10e2e8c9000 RDI: ff778a531c8e7940
[  450.680346] RBP: ff778a531c8e77f0 R08: 0000000000000000 R09: 00000035675f6000
[  450.680347] R10: ff40d10e07569d40 R11: 0000000000000000 R12: ff40d10e2e8ca000
[  450.680348] R13: ff778a531c8e7940 R14: 0000000000001000 R15: 0000000000000000
[  450.680348] FS:  0000000000000000(0000) GS:ff40d189bffc0000(0000) knlGS:0000000000000000
[  450.680350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  450.680351] CR2: ff40d10c8841ecff CR3: 0000006fc6210006 CR4: 0000000000773ee0
[  450.680352] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  450.680353] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  450.680353] PKRU: 55555554
[  450.680354] Call Trace:
[  450.680355]  <TASK>
[  450.680356]  ? show_regs+0x6d/0x80
[  450.680362]  ? __die+0x24/0x80
[  450.680365]  ? page_fault_oops+0x176/0x500
[  450.680368]  ? kfpu_begin+0x31/0xa0 [zcommon]
[  450.680375]  ? kernelmode_fixup_or_oops+0xb2/0x140
[  450.680377]  ? __bad_area_nosemaphore+0x1a5/0x2c0
[  450.680380]  ? bad_area_nosemaphore+0x16/0x30
[  450.680382]  ? do_kern_addr_fault+0x7b/0xa0
[  450.680384]  ? exc_page_fault+0x10a/0x1b0
[  450.680388]  ? asm_exc_page_fault+0x27/0x30
[  450.680392]  ? kfpu_begin+0x31/0xa0 [zcommon]
[  450.680398]  fletcher_4_avx512f_native+0x1d/0xb0 [zcommon]
[  450.680405]  abd_fletcher_4_iter+0x71/0xe0 [zcommon]
[  450.680413]  abd_iterate_func+0x104/0x1e0 [zfs]
[  450.680490]  ? __pfx_abd_fletcher_4_iter+0x10/0x10 [zcommon]
[  450.680497]  ? __pfx_abd_fletcher_4_native+0x10/0x10 [zfs]
[  450.680623]  abd_fletcher_4_native+0x89/0xd0 [zfs]
[  450.680728]  ? update_sd_lb_stats.constprop.0+0x18f/0xf30
[  450.680733]  zio_checksum_error_impl+0x1b3/0x800 [zfs]
[  450.680834]  ? sched_clock+0x9/0x10
[  450.680837]  ? __slab_free+0xe9/0x2f0
[  450.680840]  ? __slab_free+0xe9/0x2f0
[  450.680842]  zio_checksum_error+0x6e/0xf0 [zfs]
[  450.680938]  vdev_raidz_io_done+0x225/0x810 [zfs]
[  450.681057]  zio_vdev_io_done+0x81/0x240 [zfs]
[  450.681167]  zio_execute+0x94/0x170 [zfs]
[  450.681269]  taskq_thread+0x2ac/0x4d0 [spl]
[  450.681280]  ? __pfx_default_wake_function+0x10/0x10
[  450.681283]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[  450.681384]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[  450.681393]  kthread+0xe6/0x110
[  450.681396]  ? __pfx_kthread+0x10/0x10
[  450.681399]  ret_from_fork+0x29/0x50
[  450.681401]  </TASK>
[  450.681402] Modules linked in: tcp_diag inet_diag ceph libceph fscache netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_physdev xt_addrtype xt_multiport xt_conntrack xt_comment xt_tcpudp xt_set xt_mark ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nvme_fabrics iptable_filter bpfilter softdog nfnetlink_cttimeout bonding openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd drm_shmem_helper cryptd cmdlinepart drm_kms_helper pmt_telemetry pmt_crashlog i2c_algo_bit spi_nor mei_me pmt_class intel_sdsi syscopyarea idxd rapl isst_if_mmio
[  450.681435]  isst_if_mbox_pci sysfillrect acpi_ipmi mtd intel_vsec idxd_bus mei isst_if_common intel_cstate sysimgblt ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter joydev input_leds mac_hid isofs zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear simplefb mlx5_ib ib_uverbs ib_core usbkbd hid_generic usbmouse usbhid hid cdc_ether usbnet mii raid1 mlx5_core xhci_pci nvme mlxfw xhci_pci_renesas psample crc32_pclmul nvme_core tls xhci_hcd ahci spi_intel_pci pci_hyperv_intf i2c_i801 nvme_common libahci spi_intel i2c_ismt i2c_smbus wmi pinctrl_emmitsburg
[  450.681468] CR2: ff40d10c8841ecff
[  450.681469] ---[ end trace 0000000000000000 ]---
[  450.764515] RIP: 0010:kfpu_begin+0x31/0xa0 [zcommon]
[  450.764527] Code: 3f 48 89 e5 fa 0f 1f 44 00 00 48 8b 15 88 89 00 00 65 8b 05 6d b5 4b 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 29 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 0f 1f 44
[  450.764528] RSP: 0018:ff778a5309da37f0 EFLAGS: 00010082
[  450.764530] RAX: 00000000ffffffff RBX: ff40d10e30a08000 RCX: ff40d10c8841c000
[  450.764531] RDX: 00000000ffffffff RSI: ff40d10e30a08000 RDI: ff778a5309da3940
[  450.764532] RBP: ff778a5309da37f0 R08: 0000000000000000 R09: 000001767c2bc000
[  450.764533] R10: ff40d10df4f1f500 R11: 0000000000000000 R12: ff40d10e30a0a000
[  450.764535] R13: ff778a5309da3940 R14: 0000000000002000 R15: 0000000000000000
[  450.764536] FS:  0000000000000000(0000) GS:ff40d189bffc0000(0000) knlGS:0000000000000000
[  450.764537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  450.764538] CR2: ff40d10c8841ecff CR3: 0000006fc6210006 CR4: 0000000000773ee0
[  450.764540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  450.764549] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  450.764550] PKRU: 55555554
[  450.764551] note: z_rd_int_5[1548599] exited with irqs disabled
[  450.764557] note: z_rd_int_5[1548599] exited with preempt_count 1

The text was updated successfully, but these errors were encountered:

rincebrain · 2023-10-09T18:36:28Z

Smells like #14989 to me. Try #15168.

djgerry300 · 2023-10-10T09:35:02Z

Compiled and installed zfs from the 2.2 master branch and this indeed fixes the issue.
Copied some data (100GB+) between the 2 Rapid sapphire lake servers and booted VM's without any issues now.
Will this fix be backported to 2.1? I don't know if proxmox will follow the 2.2 release schedule and provide these packages?

ThomasLamprecht · 2023-10-13T15:10:21Z

FYI, we backported this fix to our kernel, and doing so resolved at least our local reproducer.

W.r.t. backporting: There are only minimal changes in the variable referenced in the changed line between 2.1.13 and master, so it should be relatively easy to backport when assembling the next ZFS 2.1 stable release – but I can send one too if that's preferred.

FWIW, in Proxmox VE the fix is shipped in proxmox-kernel-6.2.16-17-pve version 6.2.16-17 (or newer).

djgerry300 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 9, 2023

djgerry300 closed this as completed Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROXMOX] ZFS replication crashes the pool and servers #15375

[PROXMOX] ZFS replication crashes the pool and servers #15375

djgerry300 commented Oct 9, 2023

rincebrain commented Oct 9, 2023

djgerry300 commented Oct 10, 2023 •

edited

Loading

ThomasLamprecht commented Oct 13, 2023

[PROXMOX] ZFS replication crashes the pool and servers #15375

[PROXMOX] ZFS replication crashes the pool and servers #15375

Comments

djgerry300 commented Oct 9, 2023

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

rincebrain commented Oct 9, 2023

djgerry300 commented Oct 10, 2023 • edited Loading

ThomasLamprecht commented Oct 13, 2023

djgerry300 commented Oct 10, 2023 •

edited

Loading