Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0 in zio_remove_child #8390

Closed
tstackhouse opened this issue Feb 9, 2019 · 3 comments
Labels
Status: Stale No recent activity for issue

Comments

@tstackhouse
Copy link

System information

Type Version/Name
Distribution Name Ubuntu Server (Xenial)
Distribution Version 16.04 LTS
Linux Kernel 4.15.0-15-generic
Architecture x86_64
ZFS Version 0.7.12-1ubuntu5~16.04.york0
SPL Version 0.7.12-1ubuntu2~16.04.york0

Describe the problem you're observing

Kernel oops with a stacktrace followed by a system reboot (I believe I have this system configured to reboot following kernel failures)

Describe how to reproduce the problem

Seemingly occurs randomly. A scrub was running at the time, but that does not reliably cause the issue.

Include any warning/errors/backtraces from the system logs

-- Logs begin at Mon 2019-02-04 12:43:18 EST, end at Sat 2019-02-09 12:00:27 EST. --
Feb 09 11:41:25 tarkus kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
Feb 09 11:41:25 tarkus kernel: IP: mutex_lock+0x1d/0x40
Feb 09 11:41:25 tarkus kernel: PGD 0 P4D 0 
Feb 09 11:41:25 tarkus kernel: Oops: 0002 [#1] SMP PTI
Feb 09 11:41:25 tarkus kernel: Modules linked in: ses enclosure scsi_transport_sas uas usb_storage veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs overlay zfs(POE) zunicode(PO) zavl(PO) icp(POE) zcommon(POE) znvpair(POE) spl(OE) gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_hdmi irqbypass intel_cstate snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_perf snd_hda_intel serio_raw snd_hda_codec joydev input_leds snd_hda_core snd_hwdep snd_pcm snd_timer mei_me snd soundcore mei i2c_i801 shpchp lpc_ich mac_hid ib_iser rdma_cm iw_cm ib_cm
Feb 09 11:41:25 tarkus kernel:  ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic i915 mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect aesni_intel usbhid aes_x86_64 crypto_simd sysimgblt glue_helper cryptd hid fb_sys_fops r8169 mii drm pata_acpi wmi
Feb 09 11:41:25 tarkus kernel: CPU: 0 PID: 879 Comm: z_rd_int_6 Tainted: P           OE    4.15.0-15-generic #16~16.04.1-Ubuntu
Feb 09 11:41:25 tarkus kernel: Hardware name: Gigabyte Technology Co., Ltd. Z68XP-UD3/Z68XP-UD3, BIOS F10 03/20/2012
Feb 09 11:41:25 tarkus kernel: RIP: 0010:mutex_lock+0x1d/0x40
Feb 09 11:41:25 tarkus kernel: RSP: 0018:ffff9a39c2243ce0 EFLAGS: 00010246
Feb 09 11:41:25 tarkus kernel: RAX: 0000000000000000 RBX: 00000000000003b0 RCX: 0000000000000000
Feb 09 11:41:25 tarkus kernel: RDX: ffff8e9a4c1f1700 RSI: ffff8e997f717b30 RDI: 00000000000003b0
Feb 09 11:41:25 tarkus kernel: RBP: ffff9a39c2243ce8 R08: 00000000032f7e01 R09: 0000000180550045
Feb 09 11:41:25 tarkus kernel: R10: ffff9a39c2243c98 R11: 0000000000000300 R12: 0000000000000030
Feb 09 11:41:25 tarkus kernel: R13: ffff8e997f704580 R14: ffff8e997f717eb0 R15: 00000000000003b0
Feb 09 11:41:25 tarkus kernel: FS:  0000000000000000(0000) GS:ffff8e9a5fa00000(0000) knlGS:0000000000000000
Feb 09 11:41:25 tarkus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 09 11:41:25 tarkus kernel: CR2: 00000000000003b0 CR3: 00000001ec20a005 CR4: 00000000000606f0
Feb 09 11:41:25 tarkus kernel: Call Trace:
Feb 09 11:41:25 tarkus kernel:  zio_remove_child+0x52/0x140 [zfs]
Feb 09 11:41:25 tarkus kernel:  zio_done+0x4c4/0xe30 [zfs]
Feb 09 11:41:25 tarkus kernel:  zio_execute+0x95/0xf0 [zfs]
Feb 09 11:41:25 tarkus kernel:  taskq_thread+0x2b0/0x4e0 [spl]
Feb 09 11:41:25 tarkus kernel:  ? wake_up_q+0x70/0x70
Feb 09 11:41:25 tarkus kernel:  ? zio_reexecute+0x3a0/0x3a0 [zfs]
Feb 09 11:41:25 tarkus kernel:  kthread+0x105/0x140
Feb 09 11:41:25 tarkus kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 09 11:41:25 tarkus kernel:  ? kthread_associate_blkcg+0xa0/0xa0
Feb 09 11:41:25 tarkus kernel:  ? do_syscall_64+0x73/0x130
Feb 09 11:41:25 tarkus kernel:  ? SyS_exit_group+0x14/0x20
Feb 09 11:41:25 tarkus kernel:  ret_from_fork+0x35/0x40
Feb 09 11:41:25 tarkus kernel: Code: ff 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb e8 ae e4 ff ff 65 48 8b 14 25 00 5c 01 00 31 c0 <f0> 48 0f b1 13 48 85 c0 74 08 48 89 df e8 b1 ff ff ff 5b 5d c3 
Feb 09 11:41:25 tarkus kernel: RIP: mutex_lock+0x1d/0x40 RSP: ffff9a39c2243ce0
Feb 09 11:41:25 tarkus kernel: CR2: 00000000000003b0
Feb 09 11:41:25 tarkus kernel: ---[ end trace c727d68c07413430 ]---
@dweeezil
Copy link
Contributor

This looks like a duplicate of #7168, however, version 0.7.12 does have the fix in place; it appears to have been introduced in 0.7.7. AFAIK, there haven't been any similar reports since. If you don't have a reliable reproducer that a developer could use, the best way to diagnose the situation would be to perform a similar analysis of a kernel core image similar to that which was done in https://www.illumos.org/issues/8857 (which is shown using mdb under illumos).

@tstackhouse
Copy link
Author

It's possible that this was related to a system failure that my machine was experiencing. I'm waiting on some more hardware, to migrate the array to a new host machine. The old machine was using non-ECC RAM and was experiencing memory failure.

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants