Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General protection fault - userquota_updates_task -- still #9104

Closed
rfehren opened this issue Jul 31, 2019 · 1 comment
Closed

General protection fault - userquota_updates_task -- still #9104

rfehren opened this issue Jul 31, 2019 · 1 comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rfehren
Copy link

rfehren commented Jul 31, 2019

System information

Type Version/Name
Distribution Name Qlustar
Distribution Version 11.0
Linux Kernel 4.19.46
Architecture x86_64
ZFS Version 0.7.13
SPL Version 0.7.13

Describe the problem you're observing

Same issue as in #7147 and unfortunately not solved by b32f127

Describe how to reproduce the problem

Triggered randomly on a Lustre MDT. Brings the whole Lustre FS to a halt. Since this happened on MDT 0, after a reboot the MDT wouldn't finish recovery complaining that operation "quota_acquire" failed. Final consequence was, that recovery had to be aborted, leading to a large amount of corrupt data in the FS. In other words, this bug is highly critical.

Include any warning/errors/backtraces from the system logs

Jul 30 14:37:39 cpcsrvh kernel: [4655955.544654] general protection fault: 0000 [#1] SMP NOPTI
Jul 30 14:37:39 cpcsrvh kernel: [4655955.550296] CPU: 7 PID: 7300 Comm: dp_sync_taskq Tainted: P           O      4.19.46-ql-generic-11.0-7 #1
Jul 30 14:37:39 cpcsrvh kernel: [4655955.560073] Hardware name: Supermicro Super Server/H11SSL-C, BIOS 1.0c 10/04/2018
Jul 30 14:37:39 cpcsrvh kernel: [4655955.567793] RIP: 0010:multilist_sublist_remove+0xb/0x30 [zfs]
Jul 30 14:37:39 cpcsrvh kernel: [4655955.573743] Code: 38 49 8d 04 11 48 01 d6 48 8b 50 08 48 89 70 08 48 89 06 48 89 56 08 48 89 32 f3 c3 0f 1f 00 48 03 77 38 48 8b 46 08 48 8b 16 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 06 48 05
Jul 30 14:37:39 cpcsrvh kernel: [4655955.594718] RSP: 0018:ffffad0f8488fd30 EFLAGS: 00010286
Jul 30 14:37:39 cpcsrvh kernel: [4655955.602054] RAX: ffff9d28b70e2340 RBX: ffff9d39fa952800 RCX: 0000000000000000
Jul 30 14:37:39 cpcsrvh kernel: [4655955.611446] RDX: dead000000000100 RSI: ffff9d2b65b2d628 RDI: ffff9d28b70e2300
Jul 30 14:37:39 cpcsrvh kernel: [4655955.620724] RBP: ffff9d2b65b2d648 R08: 0000000000000020 R09: 0000000000000008
Jul 30 14:37:39 cpcsrvh kernel: [4655955.629920] R10: ffffad0f8488fc40 R11: ffffad0f8488fcff R12: ffff9d2b65b2d668
Jul 30 14:37:39 cpcsrvh kernel: [4655955.639233] R13: ffff9d2b65b2d550 R14: ffff9d28b70e2300 R15: ffff9d39fd6a29c0
Jul 30 14:37:39 cpcsrvh kernel: [4655955.648478] FS:  0000000000000000(0000) GS:ffff9d29ffac0000(0000) knlGS:0000000000000000
Jul 30 14:37:39 cpcsrvh kernel: [4655955.658615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 30 14:37:39 cpcsrvh kernel: [4655955.666349] CR2: 00007ffef54e4038 CR3: 00000007fc20e000 CR4: 00000000003406e0
Jul 30 14:37:39 cpcsrvh kernel: [4655955.675435] Call Trace:
Jul 30 14:37:39 cpcsrvh kernel: [4655955.679826]  userquota_updates_task+0xee/0x510 [zfs]
Jul 30 14:37:39 cpcsrvh kernel: [4655955.686688]  ? __switch_to_asm+0x41/0x70
Jul 30 14:37:39 cpcsrvh kernel: [4655955.692492]  ? dmu_objset_userobjspace_upgradable+0x50/0x50 [zfs]
Jul 30 14:37:39 cpcsrvh kernel: [4655955.700451]  ? dmu_objset_userobjspace_upgradable+0x50/0x50 [zfs]
Jul 30 14:37:39 cpcsrvh kernel: [4655955.708361]  taskq_thread+0x290/0x4a0 [spl]
Jul 30 14:37:39 cpcsrvh kernel: [4655955.714329]  ? wake_up_q+0x70/0x70
Jul 30 14:37:39 cpcsrvh kernel: [4655955.719485]  ? taskq_thread_should_stop+0x70/0x70 [spl]
Jul 30 14:37:39 cpcsrvh kernel: [4655955.726440]  kthread+0x109/0x120
Jul 30 14:37:39 cpcsrvh kernel: [4655955.731372]  ? kthread_park+0x80/0x80
Jul 30 14:37:39 cpcsrvh kernel: [4655955.736708]  ret_from_fork+0x22/0x40
Jul 30 14:37:39 cpcsrvh kernel: [4655955.741924] Modules linked in: ofd(O) ost(O) osp(O) mdd(O) lod(O) mdt(O) lfsck(O) mgs(O) mgc(O) osd_zfs(O) lquota(O) fid(O) fld(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) 8021q garp mrp ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 zfs(PO) zunicode(PO) amd64_edac_mod zavl(PO) edac_mce_amd icp(PO) bridge kvm_amd stp llc kvm irqbypass crc32_pclmul pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper zcommon(PO) znvpair(PO) spl(O) ast drm_kms_helper ttm drm sg drm_panel_orientation_quirks agpgart cfbfillrect cfbimgblt cfbcopyarea fb_sys_fops syscopyarea sysfillrect sysimgblt fb font fbdev k10temp sp5100_tco ipmi_si ipmi_devintf ipmi_msghandler evdev pcc_cpufreq mac_hid acpi_cpufreq sch_fq_codel vhost_net tun vhost tap nfsd ip_tables ipv6 ext4
Jul 30 14:37:39 cpcsrvh kernel: [4655955.823719]  jbd2 fscrypto dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid xhci_pci xhci_hcd mpt3sas ixgbe igb ahci i2c_algo_bit mdio crc32c_intel scsi_transport_sas usbcore i2c_piix4 libahci dca usb_common
Jul 30 14:37:39 cpcsrvh kernel: [4655955.855898] ---[ end trace fc6382abad652748 ]---
Jul 30 14:37:39 cpcsrvh kernel: [4655956.000173] RIP: 0010:multilist_sublist_remove+0xb/0x30 [zfs]
Jul 30 14:37:39 cpcsrvh kernel: [4655956.008016] Code: 38 49 8d 04 11 48 01 d6 48 8b 50 08 48 89 70 08 48 89 06 48 89 56 08 48 89 32 f3 c3 0f 1f 00 48 03 77 38 48 8b 46 08 48 8b 16 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 06 48 05
Jul 30 14:37:39 cpcsrvh kernel: [4655956.030259] RSP: 0018:ffffad0f8488fd30 EFLAGS: 00010286
Jul 30 14:37:39 cpcsrvh kernel: [4655956.037296] RAX: ffff9d28b70e2340 RBX: ffff9d39fa952800 RCX: 0000000000000000
Jul 30 14:37:39 cpcsrvh kernel: [4655956.046035] RDX: dead000000000100 RSI: ffff9d2b65b2d628 RDI: ffff9d28b70e2300
Jul 30 14:37:39 cpcsrvh kernel: [4655956.054747] RBP: ffff9d2b65b2d648 R08: 0000000000000020 R09: 0000000000000008
Jul 30 14:37:39 cpcsrvh kernel: [4655956.063558] R10: ffffad0f8488fc40 R11: ffffad0f8488fcff R12: ffff9d2b65b2d668
Jul 30 14:37:39 cpcsrvh kernel: [4655956.072329] R13: ffff9d2b65b2d550 R14: ffff9d28b70e2300 R15: ffff9d39fd6a29c0
Jul 30 14:37:39 cpcsrvh kernel: [4655956.081083] FS:  0000000000000000(0000) GS:ffff9d29ffac0000(0000) knlGS:0000000000000000
Jul 30 14:37:39 cpcsrvh kernel: [4655956.090690] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 30 14:37:39 cpcsrvh kernel: [4655956.097929] CR2: 00007ffef54e4038 CR3: 00000007fc20e000 CR4: 00000000003406e0
@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 21, 2019
@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants