Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Protection Fault during scrub #3917

Closed
dasjoe opened this issue Oct 13, 2015 · 5 comments
Closed

General Protection Fault during scrub #3917

dasjoe opened this issue Oct 13, 2015 · 5 comments
Milestone

Comments

@dasjoe
Copy link
Contributor

dasjoe commented Oct 13, 2015

I saw segmentation faults by cron running munin-cron during a scrub, after sshing in I found some commands to segfault. syslog recorded GPFs earlier this morning, here are the first two of as of now 1031 GPF traces:

Oct 13 09:11:25 sol kernel: [83706.613786] general protection fault: 0000 [#1] SMP 
Oct 13 09:11:25 sol kernel: [83706.613813] Modules linked in: intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ast sb_edac ttm edac_core drm_kms_helper drm i2c_algo_bit syscopyarea sysfillrect sysimgblt joydev lpc_ich mei_me mei ioatdma ipmi_ssif ipmi_si ipmi_msghandler 8250_fintek wmi shpchp acpi_power_meter acpi_pad mac_hid zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) ses enclosure hid_generic ixgbe dca vxlan ip6_udp_tunnel usbhid udp_tunnel ahci mpt2sas ptp hid pps_core libahci raid_class mdio scsi_transport_sas
Oct 13 09:11:25 sol kernel: [83706.614041] CPU: 19 PID: 4592 Comm: z_rd_iss Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
Oct 13 09:11:25 sol kernel: [83706.614069] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1 04/14/2015
Oct 13 09:11:25 sol kernel: [83706.614091] task: ffff88103020e220 ti: ffff880abcb10000 task.ti: ffff880abcb10000
Oct 13 09:11:25 sol kernel: [83706.614114] RIP: 0010:[<ffffffff811d0a3b>]  [<ffffffff811d0a3b>] __kmalloc_node+0xfb/0x2e0
Oct 13 09:11:25 sol kernel: [83706.614143] RSP: 0018:ffff880abcb139e8  EFLAGS: 00010246
Oct 13 09:11:25 sol kernel: [83706.614160] RAX: 0000000000000000 RBX: 000000000000c210 RCX: 0000000003eb15af
Oct 13 09:11:25 sol kernel: [83706.614181] RDX: 0000000003eb15ae RSI: 0000000000000000 RDI: 0000000000017180
Oct 13 09:11:25 sol kernel: [83706.614202] RBP: ffff880abcb13a38 R08: ffff88207fcf7180 R09: 00ffff880a58e4f4
Oct 13 09:11:25 sol kernel: [83706.614223] R10: ffff88103f803800 R11: ffffffffc03d9b50 R12: 000000000000c210
Oct 13 09:11:25 sol kernel: [83706.614244] R13: 00000000000000c0 R14: 00000000ffffffff R15: ffff88103f803800
Oct 13 09:11:25 sol kernel: [83706.614265] FS:  0000000000000000(0000) GS:ffff88207fce0000(0000) knlGS:0000000000000000
Oct 13 09:11:25 sol kernel: [83706.614288] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 09:11:25 sol kernel: [83706.614305] CR2: 00007f307fab7000 CR3: 0000000001c16000 CR4: 00000000001407e0
Oct 13 09:11:25 sol kernel: [83706.614326] Stack:
Oct 13 09:11:25 sol kernel: [83706.614335]  ffff880abcb139f8 ffffffff8101df59 ffffffffc03d9b50 ffff88103f803800
Oct 13 09:11:25 sol kernel: [83706.614360]  ffff880c073ed780 0000000000000000 000000000000c210 00000000000000c0
Oct 13 09:11:25 sol kernel: [83706.614384]  0000000000000000 0000000000000000 ffff880abcb13a78 ffffffffc03d9b50
Oct 13 09:11:25 sol kernel: [83706.614409] Call Trace:
Oct 13 09:11:25 sol kernel: [83706.614423]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
Oct 13 09:11:25 sol kernel: [83706.614444]  [<ffffffffc03d9b50>] ? spl_kmem_zalloc+0xd0/0x180 [spl]
Oct 13 09:11:25 sol kernel: [83706.614466]  [<ffffffffc03d9b50>] spl_kmem_zalloc+0xd0/0x180 [spl]
Oct 13 09:11:25 sol kernel: [83706.614513]  [<ffffffffc05718b7>] __vdev_disk_physio+0x67/0x450 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614551]  [<ffffffffc0572192>] vdev_disk_io_start+0xa2/0x200 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614592]  [<ffffffffc05b0277>] zio_vdev_io_start+0xa7/0x2f0 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614637]  [<ffffffffc05b3936>] zio_nowait+0xc6/0x1b0 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614678]  [<ffffffffc0578ed4>] vdev_raidz_io_start+0x174/0x2f0 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614722]  [<ffffffffc05768f0>] ? vdev_raidz_asize+0x60/0x60 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614766]  [<ffffffffc05b0277>] zio_vdev_io_start+0xa7/0x2f0 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614809]  [<ffffffffc05b3936>] zio_nowait+0xc6/0x1b0 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614850]  [<ffffffffc05754f3>] vdev_mirror_io_start+0x163/0x190 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614894]  [<ffffffffc0574a80>] ? vdev_mirror_worst_error+0x80/0x80 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614940]  [<ffffffffc05b03bd>] zio_vdev_io_start+0x1ed/0x2f0 [zfs]
Oct 13 09:11:25 sol kernel: [83706.614984]  [<ffffffffc05b1428>] zio_execute+0xc8/0x180 [zfs]
Oct 13 09:11:25 sol kernel: [83706.615009]  [<ffffffffc03dd10d>] taskq_thread+0x20d/0x410 [spl]
Oct 13 09:11:25 sol kernel: [83706.615033]  [<ffffffff810a0a90>] ? wake_up_state+0x20/0x20
Oct 13 09:11:25 sol kernel: [83706.615057]  [<ffffffffc03dcf00>] ? taskq_cancel_id+0x120/0x120 [spl]
Oct 13 09:11:25 sol kernel: [83706.615083]  [<ffffffff81093822>] kthread+0xd2/0xf0
Oct 13 09:11:25 sol kernel: [83706.615102]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 13 09:11:25 sol kernel: [83706.615129]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
Oct 13 09:11:25 sol kernel: [83706.615151]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 13 09:11:25 sol kernel: [83706.615173] Code: 48 89 45 c8 0f 1f 44 00 00 48 83 c4 28 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 49 63 47 20 48 8d 4a 01 49 8b 3f <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 53 ff 
Oct 13 09:11:25 sol kernel: [83706.616813] RIP  [<ffffffff811d0a3b>] __kmalloc_node+0xfb/0x2e0
Oct 13 09:11:25 sol kernel: [83706.617553]  RSP <ffff880abcb139e8>
Oct 13 09:11:25 sol kernel: [83706.639850] ---[ end trace bc1706637b26c28b ]---
Oct 13 09:11:26 sol kernel: [83706.835664] general protection fault: 0000 [#2] SMP 
Oct 13 09:11:26 sol kernel: [83706.837046] Modules linked in: intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ast sb_edac ttm edac_core drm_kms_helper drm i2c_algo_bit syscopyarea sysfillrect sysimgblt joydev lpc_ich mei_me mei ioatdma ipmi_ssif ipmi_si ipmi_msghandler 8250_fintek wmi shpchp acpi_power_meter acpi_pad mac_hid zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) ses enclosure hid_generic ixgbe dca vxlan ip6_udp_tunnel usbhid udp_tunnel ahci mpt2sas ptp hid pps_core libahci raid_class mdio scsi_transport_sas
Oct 13 09:11:26 sol kernel: [83706.843025] CPU: 19 PID: 4699 Comm: z_rd_int_5 Tainted: P      D    OE  3.19.0-30-generic #34~14.04.1-Ubuntu
Oct 13 09:11:26 sol kernel: [83706.843988] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1 04/14/2015
Oct 13 09:11:26 sol kernel: [83706.844904] task: ffff8820335d8000 ti: ffff88044e834000 task.ti: ffff88044e834000
Oct 13 09:11:26 sol kernel: [83706.845807] RIP: 0010:[<ffffffff811d0a3b>]  [<ffffffff811d0a3b>] __kmalloc_node+0xfb/0x2e0
Oct 13 09:11:26 sol kernel: [83706.846704] RSP: 0018:ffff88044e837b08  EFLAGS: 00010246
Oct 13 09:11:26 sol kernel: [83706.847588] RAX: 0000000000000000 RBX: 000000000000c210 RCX: 0000000003eb15b1
Oct 13 09:11:26 sol kernel: [83706.848467] RDX: 0000000003eb15b0 RSI: 0000000000000000 RDI: 0000000000017180
Oct 13 09:11:26 sol kernel: [83706.849335] RBP: ffff88044e837b58 R08: ffff88207fcf7180 R09: 00ffff880a58e4f4
Oct 13 09:11:26 sol kernel: [83706.850223] R10: ffff88103f803800 R11: ffffffffc03d9b50 R12: 000000000000c210
Oct 13 09:11:26 sol kernel: [83706.851044] R13: 00000000000000c0 R14: 00000000ffffffff R15: ffff88103f803800
Oct 13 09:11:26 sol kernel: [83706.851861] FS:  0000000000000000(0000) GS:ffff88207fce0000(0000) knlGS:0000000000000000
Oct 13 09:11:26 sol kernel: [83706.852684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 09:11:26 sol kernel: [83706.853499] CR2: 00007f307fab7000 CR3: 0000000001c16000 CR4: 00000000001407e0
Oct 13 09:11:26 sol kernel: [83706.854354] Stack:
Oct 13 09:11:26 sol kernel: [83706.855195]  ffff882038fb7c00 000000000002e3d1 ffffffffc03d9b50 ffff88103f803800
Oct 13 09:11:26 sol kernel: [83706.856106]  00000000000037ca 0000000000000000 000000000000c210 00000000000000c0
Oct 13 09:11:26 sol kernel: [83706.856991]  0000000000000000 0000000000000000 ffff88044e837b98 ffffffffc03d9b50
Oct 13 09:11:26 sol kernel: [83706.857852] Call Trace:
Oct 13 09:11:26 sol kernel: [83706.858706]  [<ffffffffc03d9b50>] ? spl_kmem_zalloc+0xd0/0x180 [spl]
Oct 13 09:11:26 sol kernel: [83706.859577]  [<ffffffffc03d9b50>] spl_kmem_zalloc+0xd0/0x180 [spl]
Oct 13 09:11:26 sol kernel: [83706.860438]  [<ffffffffc05718b7>] __vdev_disk_physio+0x67/0x450 [zfs]
Oct 13 09:11:26 sol kernel: [83706.861265]  [<ffffffffc03d018d>] ? avl_find+0x5d/0xa0 [zavl]
Oct 13 09:11:26 sol kernel: [83706.862187]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
Oct 13 09:11:26 sol kernel: [83706.863100]  [<ffffffff810e2936>] ? getrawmonotonic64+0x36/0xd0
Oct 13 09:11:26 sol kernel: [83706.864001]  [<ffffffffc0572192>] vdev_disk_io_start+0xa2/0x200 [zfs]
Oct 13 09:11:26 sol kernel: [83706.864814]  [<ffffffffc05b0277>] zio_vdev_io_start+0xa7/0x2f0 [zfs]
Oct 13 09:11:26 sol kernel: [83706.865639]  [<ffffffffc05b1428>] zio_execute+0xc8/0x180 [zfs]
Oct 13 09:11:26 sol kernel: [83706.866467]  [<ffffffffc05765dd>] vdev_queue_io_done+0x17d/0x270 [zfs]
Oct 13 09:11:26 sol kernel: [83706.867256]  [<ffffffff8101359c>] ? __switch_to+0xdc/0x570
Oct 13 09:11:26 sol kernel: [83706.868070]  [<ffffffffc05b00d8>] zio_vdev_io_done+0x88/0x180 [zfs]
Oct 13 09:11:26 sol kernel: [83706.868939]  [<ffffffffc05b1428>] zio_execute+0xc8/0x180 [zfs]
Oct 13 09:11:26 sol kernel: [83706.869839]  [<ffffffffc03dd10d>] taskq_thread+0x20d/0x410 [spl]
Oct 13 09:11:26 sol kernel: [83706.870750]  [<ffffffff810a0a90>] ? wake_up_state+0x20/0x20
Oct 13 09:11:26 sol kernel: [83706.871637]  [<ffffffffc03dcf00>] ? taskq_cancel_id+0x120/0x120 [spl]
Oct 13 09:11:26 sol kernel: [83706.872522]  [<ffffffff81093822>] kthread+0xd2/0xf0
Oct 13 09:11:26 sol kernel: [83706.873351]  [<ffffffff8109d0a7>] ? finish_task_switch+0x67/0x140
Oct 13 09:11:26 sol kernel: [83706.874197]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 13 09:11:26 sol kernel: [83706.875057]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
Oct 13 09:11:26 sol kernel: [83706.875894]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 13 09:11:26 sol kernel: [83706.876721] Code: 48 89 45 c8 0f 1f 44 00 00 48 83 c4 28 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 49 63 47 20 48 8d 4a 01 49 8b 3f <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 53 ff 
Oct 13 09:11:26 sol kernel: [83706.878478] RIP  [<ffffffff811d0a3b>] __kmalloc_node+0xfb/0x2e0
Oct 13 09:11:26 sol kernel: [83706.879255]  RSP <ffff88044e837b08>
Oct 13 09:11:26 sol kernel: [83706.880023] ---[ end trace bc1706637b26c28c ]---
Oct 13 09:11:26 sol kernel: [83706.980359] general protection fault: 0000 [#3] SMP 
Oct 13 09:11:26 sol kernel: [83706.981513] Modules linked in: intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ast sb_edac ttm edac_core drm_kms_helper drm i2c_algo_bit syscopyarea sysfillrect sysimgblt joydev lpc_ich mei_me mei ioatdma ipmi_ssif ipmi_si ipmi_msghandler 8250_fintek wmi shpchp acpi_power_meter acpi_pad mac_hid zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) ses enclosure hid_generic ixgbe dca vxlan ip6_udp_tunnel usbhid udp_tunnel ahci mpt2sas ptp hid pps_core libahci raid_class mdio scsi_transport_sas
Oct 13 09:11:26 sol kernel: [83706.986904] CPU: 19 PID: 7144 Comm: z_rd_int_4 Tainted: P      D    OE  3.19.0-30-generic #34~14.04.1-Ubuntu
Oct 13 09:11:26 sol kernel: [83706.987700] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1 04/14/2015
Oct 13 09:11:26 sol kernel: [83706.988489] task: ffff881034ba93a0 ti: ffff882037434000 task.ti: ffff882037434000
Oct 13 09:11:26 sol kernel: [83706.989359] RIP: 0010:[<ffffffff811d0a3b>]  [<ffffffff811d0a3b>] __kmalloc_node+0xfb/0x2e0
Oct 13 09:11:26 sol kernel: [83706.990235] RSP: 0018:ffff882037437b08  EFLAGS: 00010246
Oct 13 09:11:26 sol kernel: [83706.991081] RAX: 0000000000000000 RBX: 000000000000c210 RCX: 0000000003eb15b9
Oct 13 09:11:26 sol kernel: [83706.991909] RDX: 0000000003eb15b8 RSI: 0000000000000000 RDI: 0000000000017180
Oct 13 09:11:26 sol kernel: [83706.992726] RBP: ffff882037437b58 R08: ffff88207fcf7180 R09: 00ffff880a58e4f4
Oct 13 09:11:26 sol kernel: [83706.993507] R10: ffff88103f803800 R11: ffffffffc03d9b50 R12: 000000000000c210
Oct 13 09:11:26 sol kernel: [83706.994283] R13: 00000000000000c0 R14: 00000000ffffffff R15: ffff88103f803800
Oct 13 09:11:26 sol kernel: [83706.995062] FS:  0000000000000000(0000) GS:ffff88207fce0000(0000) knlGS:0000000000000000
Oct 13 09:11:26 sol kernel: [83706.995838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 09:11:26 sol kernel: [83706.996605] CR2: 00007f307fab7000 CR3: 0000000001c16000 CR4: 00000000001407e0
Oct 13 09:11:26 sol kernel: [83706.997377] Stack:
Oct 13 09:11:26 sol kernel: [83706.998160]  0000000000000000 0000000000000000 ffffffffc03d9b50 ffff88103f803800
Oct 13 09:11:26 sol kernel: [83706.998946]  ffff882037437b98 0000000000000000 000000000000c210 00000000000000c0
Oct 13 09:11:26 sol kernel: [83706.999845]  0000000000000000 0000000000000000 ffff882037437b98 ffffffffc03d9b50
Oct 13 09:11:26 sol kernel: [83707.000636] Call Trace:
Oct 13 09:11:26 sol kernel: [83707.001422]  [<ffffffffc03d9b50>] ? spl_kmem_zalloc+0xd0/0x180 [spl]
Oct 13 09:11:26 sol kernel: [83707.002234]  [<ffffffffc03d9b50>] spl_kmem_zalloc+0xd0/0x180 [spl]
Oct 13 09:11:26 sol kernel: [83707.003070]  [<ffffffffc05718b7>] __vdev_disk_physio+0x67/0x450 [zfs]
Oct 13 09:11:26 sol kernel: [83707.003865]  [<ffffffff817b4c2b>] ? __mutex_lock_slowpath+0x2b/0x100
Oct 13 09:11:26 sol kernel: [83707.004694]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
Oct 13 09:11:26 sol kernel: [83707.005483]  [<ffffffff810e2936>] ? getrawmonotonic64+0x36/0xd0
Oct 13 09:11:26 sol kernel: [83707.006359]  [<ffffffffc0572192>] vdev_disk_io_start+0xa2/0x200 [zfs]
Oct 13 09:11:26 sol kernel: [83707.007174]  [<ffffffffc05b0277>] zio_vdev_io_start+0xa7/0x2f0 [zfs]
Oct 13 09:11:26 sol kernel: [83707.007987]  [<ffffffffc05b3936>] zio_nowait+0xc6/0x1b0 [zfs]
Oct 13 09:11:26 sol kernel: [83707.008795]  [<ffffffffc0576635>] vdev_queue_io_done+0x1d5/0x270 [zfs]
Oct 13 09:11:26 sol kernel: [83707.009610]  [<ffffffffc05b00d8>] zio_vdev_io_done+0x88/0x180 [zfs]
Oct 13 09:11:26 sol kernel: [83707.010469]  [<ffffffffc05b1428>] zio_execute+0xc8/0x180 [zfs]
Oct 13 09:11:26 sol kernel: [83707.011225]  [<ffffffffc03dd10d>] taskq_thread+0x20d/0x410 [spl]
Oct 13 09:11:26 sol kernel: [83707.011942]  [<ffffffff810a0a90>] ? wake_up_state+0x20/0x20
Oct 13 09:11:26 sol kernel: [83707.012659]  [<ffffffffc03dcf00>] ? taskq_cancel_id+0x120/0x120 [spl]
Oct 13 09:11:26 sol kernel: [83707.013377]  [<ffffffff81093822>] kthread+0xd2/0xf0
Oct 13 09:11:26 sol kernel: [83707.014093]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 13 09:11:26 sol kernel: [83707.014812]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
Oct 13 09:11:26 sol kernel: [83707.015527]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 13 09:11:26 sol kernel: [83707.016231] Code: 48 89 45 c8 0f 1f 44 00 00 48 83 c4 28 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 49 63 47 20 48 8d 4a 01 49 8b 3f <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 53 ff 
Oct 13 09:11:26 sol kernel: [83707.017730] RIP  [<ffffffff811d0a3b>] __kmalloc_node+0xfb/0x2e0
Oct 13 09:11:26 sol kernel: [83707.018446]  RSP <ffff882037437b08>
Oct 13 09:11:26 sol kernel: [83707.019170] ---[ end trace bc1706637b26c28d ]---

A current zpool status:

  pool: sol-hot
 state: ONLINE
  scan: scrub repaired 0 in 3h50m with 0 errors on Tue Oct 13 01:45:48 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    sol-hot                                         ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG16LBT  ONLINE       0     0     0
        ata-ST2000DM001-9YN164_Z2F02YQN             ONLINE       0     0     0
      mirror-1                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAGK23BT  ONLINE       0     0     0
        ata-ST2000DM001-9YN164_W1E08XTH             ONLINE       0     0     0
      mirror-2                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG435ET  ONLINE       0     0     0
        ata-ST2000DM001-1CH164_W2412010             ONLINE       0     0     0
      mirror-3                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG5Y3ET  ONLINE       0     0     0
        ata-ST2000DM001-1CH164_W1E2HNTX             ONLINE       0     0     0
      mirror-4                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG5XVET  ONLINE       0     0     0
        ata-ST2000DM001-9YN164_Z1E05KTH             ONLINE       0     0     0
      mirror-5                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAGK354T  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B805022          ONLINE       0     0     0
      mirror-6                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAGK5XGT  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B805157          ONLINE       0     0     0
      mirror-7                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG5XZET  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J9JBA00854          ONLINE       0     0     0
      mirror-8                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG5XTGT  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J9JBA00855          ONLINE       0     0     0
      mirror-9                                      ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK11B1BFJ4N0TF  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B805222          ONLINE       0     0     0
      mirror-10                                     ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAGK4N7T  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B804865          ONLINE       0     0     0
      mirror-11                                     ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG137RT  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J9JBA00853          ONLINE       0     0     0
      mirror-12                                     ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG5U93T  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B805012          ONLINE       0     0     0
      mirror-13                                     ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG17T0T  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B805197          ONLINE       0     0     0
      mirror-14                                     ONLINE       0     0     0
        ata-Hitachi_HUA722020ALA330_JK1130YAG562VT  ONLINE       0     0     0
        ata-SAMSUNG_HD204UI_S2H7J90B805053          ONLINE       0     0     0

errors: No known data errors

  pool: sol-rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Oct 12 21:55:21 2015
config:

    NAME                                                     STATE     READ WRITE CKSUM
    sol-rpool                                                ONLINE       0     0     0
      mirror-0                                               ONLINE       0     0     0
        ata-Samsung_SSD_850_PRO_256GB_S1SUNSAG336649A-part1  ONLINE       0     0     0
        ata-Samsung_SSD_850_PRO_256GB_S1SUNSAG336664B-part1  ONLINE       0     0     0

errors: No known data errors

  pool: sol-tank
 state: ONLINE
  scan: scrub in progress since Mon Oct 12 21:55:17 2015
    59.1T scanned out of 167T at 1.25G/s, 24h31m to go
    0 repaired, 35.38% done
config:

    NAME                           STATE     READ WRITE CKSUM
    sol-tank                       ONLINE       0     0     0
      raidz2-0                     ONLINE       0     0     0
        scsi-35000c500636bfd73     ONLINE       0     0     0
        scsi-35000c500636fc9e3     ONLINE       0     0     0
        scsi-35000c5007eed0cb3     ONLINE       0     0     0
        scsi-35000c5007eec3b5f     ONLINE       0     0     0
        scsi-35000c5006349068b     ONLINE       0     0     0
        scsi-35000c5007ef12397     ONLINE       0     0     0
      raidz2-1                     ONLINE       0     0     0
        scsi-35000c500636fbb83     ONLINE       0     0     0
        scsi-35000c500636fbc63     ONLINE       0     0     0
        scsi-35000c5007ef122cb     ONLINE       0     0     0
        scsi-35000c5006346d1e7     ONLINE       0     0     0
        scsi-35000c500634650af     ONLINE       0     0     0
        scsi-35000c500593110d3     ONLINE       0     0     0
      raidz2-2                     ONLINE       0     0     0
        scsi-35000c500634b6693     ONLINE       0     0     0
        scsi-35000c500636bfadf     ONLINE       0     0     0
        scsi-35000c5005930de5b     ONLINE       0     0     0
        scsi-35000c5007eec3c5f     ONLINE       0     0     0
        scsi-35000c50063446fcf     ONLINE       0     0     0
        scsi-35000c5007eec3abb     ONLINE       0     0     0
      raidz2-3                     ONLINE       0     0     0
        scsi-35000c500634aa9d7     ONLINE       0     0     0
        scsi-35000c5006349d1c7     ONLINE       0     0     0
        scsi-35000c5007eed2223     ONLINE       0     0     0
        scsi-35000c5007eed3147     ONLINE       0     0     0
        scsi-35000c5006348bcdf     ONLINE       0     0     0
        scsi-35000c5007eec5373     ONLINE       0     0     0
      raidz2-4                     ONLINE       0     0     0
        scsi-35000c500634b6523     ONLINE       0     0     0
        scsi-35000c50063450067     ONLINE       0     0     0
        scsi-35000c5007eed395b     ONLINE       0     0     0
        scsi-35000c5007eed1aff     ONLINE       0     0     0
        scsi-35000c500636c085f     ONLINE       0     0     0
        scsi-35000c50063490b07     ONLINE       0     0     0
      raidz2-5                     ONLINE       0     0     0
        scsi-35000c50063476e8f     ONLINE       0     0     0
        scsi-35000c500634b7e73     ONLINE       0     0     0
        scsi-35000c5007ee8331f     ONLINE       0     0     0
        scsi-35000c5007eec559f     ONLINE       0     0     0
        scsi-35000c500636fb9ff     ONLINE       0     0     0
        scsi-35000c5007eec59bb     ONLINE       0     0     0
      raidz2-6                     ONLINE       0     0     0
        scsi-35000c500636fbd07     ONLINE       0     0     0
        scsi-35000c50063452fcb     ONLINE       0     0     0
        scsi-35000c5005930c9c7     ONLINE       0     0     0
        scsi-35000c5006340d98b     ONLINE       0     0     0
        scsi-35000c50059312feb     ONLINE       0     0     0
        scsi-35000c5007eec45df     ONLINE       0     0     0
      raidz2-7                     ONLINE       0     0     0
        scsi-35000c500636c1d27     ONLINE       0     0     0
        scsi-35000c5006349e257     ONLINE       0     0     0
        scsi-35000c500636bfc9b     ONLINE       0     0     0
        scsi-35000c5005930bfe7     ONLINE       0     0     0
        scsi-35000c5007ee847a7     ONLINE       0     0     0
        scsi-35000c5007ef12403     ONLINE       0     0     0
      raidz2-8                     ONLINE       0     0     0
        scsi-35000c500636bfb5f     ONLINE       0     0     0
        scsi-35000c5006348840b     ONLINE       0     0     0
        scsi-35000c5007eec5e8b     ONLINE       0     0     0
        scsi-35000c5005930b3bb     ONLINE       0     0     0
        scsi-35000c5006349336b     ONLINE       0     0     0
        scsi-35000c5007eed1e97     ONLINE       0     0     0
    logs
      mirror-9                     ONLINE       0     0     0
        SSD_S1SUNSAG336649A-part3  ONLINE       0     0     0
        SSD_S1SUNSAG336664B-part3  ONLINE       0     0     0
    cache
      SSD_S1SUNSAG336649A-part4    ONLINE       0     0     0
      SSD_S1SUNSAG336664B-part4    ONLINE       0     0     0

errors: No known data errors
@sluitz
Copy link

sluitz commented Oct 13, 2015

I have been getting similar GPFs during a scrub/resilver. This is with 0.6.5.2-1 on CentOS-7 / 3.10.0-229.14.1.el7.x86_64. Pool was created on the same machine with 0.6.4.2 and there were no problems before upgrading to 0.6.5.2.

[32955.630307] general protection fault: 0000 [#1] SMP
[32955.630351] Modules linked in: bonding ext4 mbcache jbd2 powernow_k8 pcspkr serio_raw i2c_amd756 i2c_amd8111 i2c_core amd64_edac_mod edac_mce_amd k8temp edac_core amd_rng shpchp binfmt_misc xfs libcrc32c raid1 sr_mod cdrom sd_mod crc_t10dif crct10dif_common usb_storage ata_generic pata_acpi sata_mv pata_amd e1000 libata dm_mirror dm_region_hash dm_log dm_mod zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate
[32955.630654] CPU: 3 PID: 14912 Comm: z_rd_iss Tainted: PF O-------------- 3.10.0-229.14.1.el7.x86_64 #1
[32955.630704] Hardware name: Sun Microsystems Sun Fire X4500/Sun Fire X4500, BIOS 080010 05/24/2007
[32955.630747] task: ffff8803efb716c0 ti: ffff88033ad80000 task.ti: ffff88033ad80000
[32955.630783] RIP: 0010:[] [] __kmalloc_node+0x103/0x280
[32955.630831] RSP: 0018:ffff88033ad83a88 EFLAGS: 00010246
[32955.630856] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000032f01de
[32955.630889] RDX: 00000000032f01dd RSI: 0000000000000000 RDI: 0000000000016440
[32955.630922] RBP: ffff88033ad83ac8 R08: ffff8803ffd16440 R09: ffff8801f7801800
[32955.630956] R10: ffff8801f7801800 R11: ffffffffa000ab60 R12: 000000000000c210
[32955.630988] R13: 00000000000000c0 R14: 00ffff8803efb7a9 R15: 00000000ffffffff
[32955.631014] FS: 00007f1a835fe780(0000) GS:ffff8803ffd00000(0000) knlGS:0000000000000000
[32955.631336] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[32955.631336] CR2: 00007f1a8360b000 CR3: 000000000190a000 CR4: 00000000000007e0
[32955.631647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[32955.631647] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[32955.631647] Stack:
[32955.631647] ffff8803ee8a9000 ffffffffa000ab60 ffff8801f7801800 0000000000000000
[32955.631647] 000000000000c210 00000000000000c0 0000000000000000 0000000000000000
[32955.631647] ffff88033ad83b00 ffffffffa000ab60 ffff88033a411ea0 ffff8800dce1d000
[32955.631647] Call Trace:
[32955.631647] [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
[32955.631647] [] spl_kmem_zalloc+0xc0/0x170 [spl]
[32955.631647] [] __vdev_disk_physio+0x64/0x440 [zfs]
[32955.631647] [] vdev_disk_io_start+0xa2/0x200 [zfs]
[32955.631647] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[32955.631647] [] zio_nowait+0xc6/0x1b0 [zfs]
[32955.631647] [] vdev_raidz_io_start+0x174/0x2f0 [zfs]
[32955.631647] [] ? vdev_raidz_asize+0x60/0x60 [zfs]
[32955.631647] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[32955.631647] [] zio_nowait+0xc6/0x1b0 [zfs]
[32955.631647] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[32955.631647] [] ? vdev_config_sync+0x140/0x140 [zfs]
[32955.631647] [] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[32955.631647] [] zio_execute+0xc8/0x180 [zfs]
[32955.631647] [] taskq_thread+0x21e/0x420 [spl]
[32955.631647] [] ? wake_up_state+0x20/0x20
[32955.631647] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[32955.631647] [] kthread+0xcf/0xe0
[32955.631647] [] ? finish_task_switch+0x53/0x170
[32955.631647] [] ? kthread_create_on_node+0x140/0x140
[32955.631647] [] ret_from_fork+0x58/0x90
[32955.631647] [] ? kthread_create_on_node+0x140/0x140
[32955.631647] Code: 48 89 45 d0 66 66 66 66 90 4c 89 f0 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 49 63 42 20 48 8d 4a 01 49 8b 3a <49> 8b 1c 06 4c 89 f0 48 8d 37 e8 3e 35 13 00 84 c0 0f 84 52 ff
[32955.631647] RIP [] __kmalloc_node+0x103/0x280
[32955.631647] RSP

[root@lz-nfs02-data 127.0.0.1-2015.10.05-13:48:12]# zpool status
pool: z
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 4K in 17h53m with 0 errors on Wed Oct 7 11:21:59 2015
config:

NAME                                                  STATE     READ WRITE CKSUM
z                                                     ONLINE       0     0     0
  raidz3-0                                            ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_084752PXB6_9QJ2PXB6  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_09025328QG_9QJ328QG  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091153EGNW_9QJ3EGNW  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091153EXZP_9QJ3EXZP  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091153F3Z4_9QJ3F3Z4  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091153GJX1_9QJ3GJX1  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091253CLPR_9QJ3CLPR  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091253HM9N_9QJ3HM9N  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091653GQQQ_9QJ3GQQQ  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091853TT2S_9QJ3TT2S  ONLINE       0     0     0
  raidz3-1                                            ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091853V479_9QJ3V479  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_102758A760_9QJ8A760  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091953ZB4T_9QJ3ZB4T  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_092053ZPK8_9QJ3ZPK8  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_093354WQAW_9QJ4WQAW  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_0936551YG3_9QJ51YG3  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_093655360V_9QJ5360V  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_0936554ETD_9QJ54ETD  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094155JWQK_9QJ5JWQK  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094455RVH5_9QJ5RVH5  ONLINE       0     0     0
  raidz3-2                                            ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094455RVJX_9QJ5RVJX  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094455RVZZ_9QJ5RVZZ  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094455RW4Q_9QJ5RW4Q  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094455RW68_9QJ5RW68  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094455TMLF_9QJ5TMLF  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094555P0WT_9QJ5P0WT  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094555QFJ4_9QJ5QFJ4  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_091153EASG_9QJ3EASG  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094555RCHF_9QJ5RCHF  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_101657YV5X_9QJ7YV5X  ONLINE       0     0     0
  raidz3-3                                            ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_095156ENYA_9QJ6ENYA  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_095356Q5HJ_9QJ6Q5HJ  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_095356QA1M_9QJ6QA1M  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_100156SKJ4_9QJ6SKJ4  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_1003570CXN_9QJ70CXN  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_094555TV7W_9QJ5TV7W  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_101857Z0L2_9QJ7Z0L2  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_101857Z67A_9QJ7Z67A  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_1026589Q0H_9QJ89Q0H  ONLINE       0     0     0
    ata-SEAGATE_ST31000NSSUN1.0T_102658AGNC_9QJ8AGNC  ONLINE       0     0     0

errors: No known data errors

@tuxoko
Copy link
Contributor

tuxoko commented Oct 13, 2015

Seems same as #3880 although he closed it.

@dasjoe @sluitz Can you try run it with boot option slub_debug. It will slow down the system, but hopefully, when the problem occurs again, we'll know what's wrong.

@behlendorf behlendorf added the Bug label Oct 13, 2015
@behlendorf behlendorf added this to the 0.7.0 milestone Oct 13, 2015
@tuxoko
Copy link
Contributor

tuxoko commented Oct 13, 2015

Ok, this issue is extremely interesting.

From #3880, we see R14: 02ffff88069b84c9.
Here we see R09: 00ffff880a58e4f4 and R14: 00ffff8803efb7a9

This is too much of a coincidence. There must be some thing going on here, becuase it looked like the pointers are been offset by one.

@dasjoe
Copy link
Contributor Author

dasjoe commented Oct 13, 2015

@tuxoko sure, I will edit grub's configuration to include slub_debug and reboot to make it active, when I am at the machine tomorrow. This is a main production box, but slowing it by a bit should be fine.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Oct 14, 2015
Currently, vdev_disk_physio_completion will try to wake up an waiter without
first checking the existence. This creates a race window in which complete is
called after dr is freed.

We add dr_wait in dio_request to indicate the existence of waiter. Also,
remove dr_rw since no one is using it, and reorder dr_ref to make the struct
more compact in 64bit.

Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3917
Issue openzfs#3880
@behlendorf
Copy link
Contributor

Likely fixed in 0.6.5.3 by:

aa159af Fix use-after-free in vdev_disk_physio_completion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants