Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: unable to handle kernel paging request #6334

Closed
edillmann opened this issue Jul 10, 2017 · 5 comments
Closed

BUG: unable to handle kernel paging request #6334

edillmann opened this issue Jul 10, 2017 · 5 comments
Labels
Status: Stale No recent activity for issue

Comments

@edillmann
Copy link
Contributor

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 5.0
Linux Kernel 4.10.15-15
Architecture amd64
ZFS Version 0.7.0-rc4_96_g0ea05c64f
SPL Version 0.7.0-rc4_5_g7a35f2b

Describe the problem you're observing

I'm observing the following stack trace on the receiving side of zfs send but this time receiving a regular dataset.
I don't know if this is related to #6330

Describe how to reproduce the problem

Setup a regular send to remote system (1/hour)

Include any warning/errors/backtraces from the system logs

[192656.325619] BUG: unable to handle kernel paging request at 0000000004ed4fe1
[192656.326546] IP: dbuf_find+0x92/0x1a0 [zfs]
[192656.327948] PGD 0 

[192656.330805] Oops: 0000 [#1] SMP
[192656.332225] Modules linked in: sch_sfq cls_u32 sch_htb xt_REDIRECT nf_nat_redirect iptable_nat nf_nat_ipv4 nf_nat nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit veth ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_addrtype xt_multiport xt_conntrack ip_set_hash_net ip_set tpm_rng iptable_filter softdog nfnetlink_log nfnetlink arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi pcbc iwlmvm mac80211 aesni_intel iwlwifi aes_x86_64 crypto_simd glue_helper cryptd btusb btrtl btbcm intel_cstate btintel i915 intel_rapl_perf bluetooth snd_hda_codec_realtek
[192656.343321]  snd_hda_codec_generic cfg80211 drm_kms_helper snd_hda_intel pcspkr snd_hda_codec mei_me drm snd_hda_core i2c_algo_bit fb_sys_fops syscopyarea lpc_ich sysfillrect mei sysimgblt snd_hwdep shpchp snd_soc_rt5640 snd_soc_ssm4567 snd_soc_rl6231 snd_soc_core snd_compress ac97_bus dw_dmac snd_pcm_dmaengine snd_soc_sst_acpi video dw_dmac_core snd_pcm snd_soc_sst_match snd_timer elan_i2c snd soundcore acpi_als 8250_dw kfifo_buf mac_hid spi_pxa2xx_platform tpm_crb industrialio acpi_pad vhost_net vhost macvtap macvlan ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_vs_wrr ip_vs nf_conntrack libcrc32c ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) i2c_i801 ahci libahci r8169 mii sdhci_acpi
[192656.356619]  sdhci fjes i2c_hid hid i2c_designware_platform i2c_designware_core
[192656.358622] CPU: 0 PID: 13036 Comm: receive_writer Tainted: P           O    4.10.15-1-pve #1
[192656.360700] Hardware name: GIGABYTE GB-BXi3-5010/MQLP3AP-00, BIOS F4 12/01/2015
[192656.362788] task: ffff88c47cc98000 task.stack: ffffb11cc91c8000
[192656.364922] RIP: 0010:dbuf_find+0x92/0x1a0 [zfs]
[192656.367043] RSP: 0018:ffffb11cc91cbb68 EFLAGS: 00010206
[192656.369142] RAX: ffff88c47cc98000 RBX: 0000000000006010 RCX: 0000000000006000
[192656.371289] RDX: ffffb11ccabf0000 RSI: ffff88c47cc98000 RDI: ffffffffc066ebd0
[192656.373773] RBP: ffffb11cc91cbbc0 R08: 000000000000008c R09: ffff88c60e403040
[192656.376475] R10: 0000000000000000 R11: ffff88c46d1189c0 R12: ffffffffc066ebd0
[192656.378565] R13: 0000000000000000 R14: ffff88c30a0dd800 R15: 0000000004ed4fe1
[192656.380638] FS:  0000000000000000(0000) GS:ffff88c61ec00000(0000) knlGS:0000000000000000
[192656.382663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192656.384668] CR2: 0000000004ed4fe1 CR3: 0000000292a09000 CR4: 00000000003406f0
[192656.386634] Call Trace:
[192656.388614]  ? dbuf_rele_and_unlock+0x27b/0x470 [zfs]
[192656.390548]  __dbuf_hold_impl+0x42/0x5f0 [zfs]
[192656.392423]  ? spl_kmem_alloc+0x9b/0x170 [spl]
[192656.394266]  dbuf_hold_impl+0xa8/0xd0 [zfs]
[192656.396111]  dnode_next_offset_level+0xb7/0x400 [zfs]
[192656.397989]  dnode_next_offset+0xf0/0x1e0 [zfs]
[192656.399883]  dmu_object_next+0xc2/0x100 [zfs]
[192656.401724]  ? dnode_hold+0x1b/0x20 [zfs]
[192656.403573]  receive_freeobjects.isra.14+0x70/0xd0 [zfs]
[192656.405384]  ? spl_kmem_free+0x33/0x40 [spl]
[192656.407193]  ? mutex_lock+0x12/0x40
[192656.409015]  ? bqueue_dequeue+0xc7/0xf0 [zfs]
[192656.410889]  receive_writer_thread+0x262/0x6e0 [zfs]
[192656.412669]  ? set_curr_task_fair+0x2b/0x60
[192656.414427]  ? spl_kmem_free+0x33/0x40 [spl]
[192656.416224]  ? receive_freeobjects.isra.14+0xd0/0xd0 [zfs]
[192656.417979]  thread_generic_wrapper+0x72/0x80 [spl]
[192656.419769]  kthread+0x109/0x140
[192656.421538]  ? __thread_exit+0x20/0x20 [spl]
[192656.423294]  ? kthread_create_on_node+0x60/0x60
[192656.424989]  ret_from_fork+0x2c/0x40
[192656.426658] Code: 04 25 00 d3 00 00 48 89 83 e8 8b 66 c0 48 89 c6 4e 8b 3c fa 4d 85 ff 75 12 e9 a8 00 00 00 4d 8b 7f 38 4d 85 ff 0f 84 9b 00 00 00 <4d> 3b 2f 75 ee 4d 3b 77 20 75 e8 0f b6 45 cf 41 3a 47 50 75 de 
[192656.430546] RIP: dbuf_find+0x92/0x1a0 [zfs] RSP: ffffb11cc91cbb68
[192656.432228] CR2: 0000000004ed4fe1
[192656.433827] ---[ end trace 64f65c6a692083d5 ]---
@thirtythreeforty
Copy link

thirtythreeforty commented Aug 20, 2017

I can confirm this issue on CentOS 7 with Linux 3.10.0-514.26.2.el7.x86_64 and ZFS 0.7.1. I'd be happy to send a kernel crash dump to someone interested. In the interest of full disclosure, I am also running a proprietary FusionIO storage driver on an ioDrive Octal, but that does not seem to be causing any issues here—correct me if I'm wrong.

This occurs when receiving a stream created by the same system from a different pool named the same as the pool I'm receiving to ("octal"). The crash is very reliable and almost immediate. I am receiving pretty fast, saturating gigabit ethernet from a stream saved on a different machine, although the same thing happens when I rate-limit it to 40MB/s.

Here's my log:

[  247.661527] BUG: unable to handle kernel paging request at ffffc90027dc6ff8
[  247.661666] IP: [<ffffffff81327313>] memmove+0x73/0x1a0
[  247.661768] PGD 17fc58067 PUD 497c03067 PMD 793273067 PTE 0
[  247.661884] Oops: 0000 [#1] SMP 
[  247.661952] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel ipmi_ssif lrw gf128mul iomemory_vsl(POE) glue_helper ablk_helper cryptd
[  247.663461]  sg iTCO_wdt gpio_ich acpi_power_meter iTCO_vendor_support hpwdt hpilo ipmi_si i7core_edac ipmi_msghandler shpchp pcspkr edac_core tpm_infineon lpc_ich acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common drm crc32c_intel tg3 serio_raw ptp i2c_core pps_core hpsa scsi_transport_sas fjes dm_mirror dm_region_hash dm_log dm_mod
[  247.664538] CPU: 14 PID: 20036 Comm: receive_writer Tainted: P          IOE  ------------   3.10.0-514.26.2.el7.x86_64 #1
[  247.664707] Hardware name: HP ProLiant ML350 G6, BIOS D22 07/02/2013
[  247.664805] task: ffff88075d7f6dd0 ti: ffff88075cf60000 task.ti: ffff88075cf60000
[  247.664921] RIP: 0010:[<ffffffff81327313>]  [<ffffffff81327313>] memmove+0x73/0x1a0
[  247.665051] RSP: 0018:ffff88075cf63c90  EFLAGS: 00010246
[  247.665135] RAX: ffffc90031735000 RBX: ffff8802c3c16000 RCX: 0000000000020000
[  247.665245] RDX: 0000000000020000 RSI: ffffc90027da7000 RDI: ffffc90031735000
[  247.665355] RBP: ffff88075cf63cc8 R08: ffffc90027dc7000 R09: 0000000000000000
[  247.665467] R10: 000000003e801f00 R11: ffffffffa05ce037 R12: 0000000000000000
[  247.665578] R13: 0000000000003c63 R14: 0000000000000000 R15: ffff880313ea1dd0
[  247.665690] FS:  0000000000000000(0000) GS:ffff880407bc0000(0000) knlGS:0000000000000000
[  247.665816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  247.665906] CR2: ffffc90027dc6ff8 CR3: 0000000766b69000 CR4: 00000000000007e0
[  247.666019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  247.666131] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  247.666242] Stack:
[  247.666276]  ffffffffa1310ae1 ffffffffa05ce037 000000025cf63e30 ffff8802c3c16000
[  247.666417]  0000000000000000 0000000000003c63 0000000000000000 ffff88075cf63d10
[  247.666556]  ffffffffa1310d09 0000000000000000 ffffffffa144be30 ffff88075cf63e30
[  247.666696] Call Trace:
[  247.666804]  [<ffffffffa1310ae1>] ? __dbuf_hold_impl+0x4a1/0x620 [zfs]
[  247.666919]  [<ffffffffa05ce037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
[  247.667060]  [<ffffffffa1310d09>] dbuf_hold_impl+0xa9/0xd0 [zfs]
[  247.667204]  [<ffffffffa1310d65>] dbuf_hold_level+0x35/0x60 [zfs]
[  247.671721]  [<ffffffffa1310da6>] dbuf_hold+0x16/0x20 [zfs]
[  247.676245]  [<ffffffffa131ad79>] dmu_buf_hold_noread+0x89/0x120 [zfs]
[  247.680782]  [<ffffffffa131ae3f>] dmu_buf_hold+0x2f/0x80 [zfs]
[  247.685297]  [<ffffffffa1327e13>] receive_writer_thread+0x3d3/0xa40 [zfs]
[  247.689817]  [<ffffffff810c8395>] ? sched_clock_cpu+0x85/0xc0
[  247.694402]  [<ffffffffa1327a40>] ? receive_freeobjects.isra.12+0x120/0x120 [zfs]
[  247.699028]  [<ffffffffa05cddaa>] ? spl_kmem_free+0x2a/0x40 [spl]
[  247.703728]  [<ffffffffa1327a40>] ? receive_freeobjects.isra.12+0x120/0x120 [zfs]
[  247.708435]  [<ffffffffa05cff91>] thread_generic_wrapper+0x71/0x80 [spl]
[  247.713080]  [<ffffffffa05cff20>] ? __thread_exit+0x20/0x20 [spl]
[  247.717657]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[  247.722201]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[  247.726771]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[  247.731408]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[  247.735795] Code: 76 20 4c 89 1f 4c 89 57 08 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a8 00 00 00 0f 1f 84 00 00 00 00 00 48 89 d1 <4c> 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 f3 48 a5 4d 89 1a e9 
[  247.745130] RIP  [<ffffffff81327313>] memmove+0x73/0x1a0
[  247.749479]  RSP <ffff88075cf63c90>
[  247.753684] CR2: ffffc90027dc6ff8

Again, let me know if someone wants kdump files.

@DHowett
Copy link

DHowett commented Sep 9, 2017

I've hit this in 0.7.1 on Linux antares 4.12.0-1-amd64 #1 SMP Debian 4.12.6-1 (2017-08-12) x86_64 GNU/Linux.

It triggers consistently on receipt of a deduplicated compressed stream (from zfs send -Dc), but not on receipt of a compressed-only stream.

Unlike @thirtythreeforty, I'm sending and receiving on the same pool and am not using any unusual hardware or proprietary drivers.

@DHowett
Copy link

DHowett commented Sep 11, 2017

Additional info:

Type Version/Name
Distribution Name Debian
Distribution Version unstable
Linux Kernel 4.12.0-1-amd64
Architecture amd64
ZFS Version 0.7.1-1
SPL Version 0.7.1-1

panic log from pstore

Panic#2 Part1
<4>[1806390.773530] R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
<4>[1806390.773557] FS:  0000000000000000(0000) GS:ffff93c45fb00000(0000) knlGS:0000000000000000
<4>[1806390.773586] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[1806390.773608] CR2: ffffa48544948000 CR3: 00000001b2225000 CR4: 00000000001406e0
<4>[1806390.773635] Call Trace:
<4>[1806390.773690]  ? __dbuf_hold_impl+0x4da/0x630 [zfs]
<4>[1806390.773738]  ? dbuf_hold_impl+0xa2/0xd0 [zfs]
<4>[1806390.773784]  ? dbuf_hold+0x2c/0x60 [zfs]
<4>[1806390.773835]  ? dmu_buf_hold_noread+0x87/0x100 [zfs]
<4>[1806390.773884]  ? dmu_buf_hold+0x2a/0x80 [zfs]
<4>[1806390.773931]  ? receive_writer_thread+0x56f/0x970 [zfs]
<4>[1806390.773954]  ? set_curr_task_fair+0x26/0x50
<4>[1806390.773977]  ? thread_generic_wrapper+0x62/0x80 [spl]
<4>[1806390.774028]  ? receive_freeobjects.isra.12+0xf0/0xf0 [zfs]
<4>[1806390.774053]  ? thread_generic_wrapper+0x6d/0x80 [spl]
<4>[1806390.774076]  ? kthread+0xfc/0x130
<4>[1806390.774094]  ? __thread_exit+0x20/0x20 [spl]
<4>[1806390.774113]  ? kthread_create_on_node+0x70/0x70
<4>[1806390.774133]  ? kthread_create_on_node+0x70/0x70
<4>[1806390.774153]  ? ret_from_fork+0x25/0x30
<4>[1806390.774171] Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
<1>[1806390.774272] RIP: __memmove+0x24/0x1a0 RSP: ffffa4852c433d68
<4>[1806390.774293] CR2: ffffa48544948000
<4>[1806390.789765] ---[ end trace 7afb36f89a2b8dfa ]---
<0>[1806391.274347] Kernel panic - not syncing: Fatal exception
<0>[1806391.274377] Kernel Offset: 0x38200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

@chris13524
Copy link

Getting a similar error: https://gist.github.com/chris13524/440fdcd810895cdfbc3ea233d43d3ce5

Seems to have been caused by a MariaDB Docker container hanging. (using the ZFS storage driver for Docker)

Also now my system no longer sleeps.

On Ubuntu 17.10.

uname -a
Linux lemon 4.13.0-45-generic #50-Ubuntu SMP Wed May 30 08:23:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

4 participants