This repository has been archived by the owner on Apr 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 338
V0.88 kernel-3.11.3 BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 #28
Comments
Duplicate of #27 |
duncanje
pushed a commit
to bluca/mptcp
that referenced
this issue
May 20, 2014
Oleksii reported that he had seen an oops similar to this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 IP: [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 PGD 0 Oops: 0000 [multipath-tcp#1] PREEMPT SMP Modules linked in: ipt_MASQUERADE xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables carl9170 ath usb_storage f2fs nfnetlink_log nfnetlink md4 cifs dns_resolver hid_generic usbhid hid af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfcomm btusb bnep bluetooth qmi_wwan qcserial cdc_wdm usb_wwan usbnet usbserial mii snd_hda_codec_hdmi snd_hda_codec_realtek iwldvm mac80211 coretemp intel_powerclamp kvm_intel kvm iwlwifi snd_hda_intel cfg80211 snd_hda_codec xhci_hcd e1000e ehci_pci snd_hwdep sdhci_pci snd_pcm ehci_hcd microcode psmouse sdhci thinkpad_acpi mmc_core i2c_i801 pcspkr usbcore hwmon snd_timer snd_page_alloc snd ptp rfkill pps_core soundcore evdev usb_common vboxnetflt(O) vboxdrv(O)Oops#2 Part8 loop tun binfmt_misc fuse msr acpi_call(O) ipv6 autofs4 CPU: 0 PID: 21612 Comm: kworker/0:1 Tainted: G W O 3.10.1SIGN multipath-tcp#28 Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET92WW (2.52 ) 02/22/2013 Workqueue: cifsiod cifs_echo_request [cifs] task: ffff8801e1f416f0 ti: ffff880148744000 task.ti: ffff880148744000 RIP: 0010:[<ffffffff814dcc13>] [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 RSP: 0000:ffff880148745b00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880148745b78 RCX: 0000000000000048 RDX: ffff880148745c90 RSI: ffff880181864a00 RDI: ffff880148745b78 RBP: ffff880148745c48 R08: 0000000000000048 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880181864a00 R13: ffff880148745c90 R14: 0000000000000048 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000088 CR3: 000000020c42c000 CR4: 00000000001407b0 Oops#2 Part7 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880148745b30 ffffffff810c4af9 0000004848745b30 ffff880181864a00 ffffffff81ffbc40 0000000000000000 ffff880148745c90 ffffffff810a5aab ffff880148745bc0 ffffffff81ffbc40 ffff880148745b60 ffffffff815a9fb8 Call Trace: [<ffffffff810c4af9>] ? finish_task_switch+0x49/0xe0 [<ffffffff810a5aab>] ? lock_timer_base.isra.36+0x2b/0x50 [<ffffffff815a9fb8>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff810a673f>] ? try_to_del_timer_sync+0x4f/0x70 [<ffffffff815aa38f>] ? _raw_spin_unlock_bh+0x1f/0x30 [<ffffffff814dcc87>] kernel_sendmsg+0x37/0x50 [<ffffffffa081a0e0>] smb_send_kvec+0xd0/0x1d0 [cifs] [<ffffffffa081a263>] smb_send_rqst+0x83/0x1f0 [cifs] [<ffffffffa081ab6c>] cifs_call_async+0xec/0x1b0 [cifs] [<ffffffffa08245e0>] ? free_rsp_buf+0x40/0x40 [cifs] Oops#2 Part6 [<ffffffffa082606e>] SMB2_echo+0x8e/0xb0 [cifs] [<ffffffffa0808789>] cifs_echo_request+0x79/0xa0 [cifs] [<ffffffff810b45b3>] process_one_work+0x173/0x4a0 [<ffffffff810b52a1>] worker_thread+0x121/0x3a0 [<ffffffff810b5180>] ? manage_workers.isra.27+0x2b0/0x2b0 [<ffffffff810bae00>] kthread+0xc0/0xd0 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120 [<ffffffff815b199c>] ret_from_fork+0x7c/0xb0 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120 Code: 84 24 b8 00 00 00 4c 89 f1 4c 89 ea 4c 89 e6 48 89 df 4c 89 60 18 48 c7 40 28 00 00 00 00 4c 89 68 30 44 89 70 14 49 8b 44 24 28 <ff> 90 88 00 00 00 3d ef fd ff ff 74 10 48 8d 65 e0 5b 41 5c 41 RIP [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 RSP <ffff880148745b00> CR2: 0000000000000088 The client was in the middle of trying to send a frame when the server->ssocket pointer got zeroed out. In most places, that we access that pointer, the srv_mutex is held. There's only one spot that I see that the server->ssocket pointer gets set and the srv_mutex isn't held. This patch corrects that. The upstream bug report was here: https://bugzilla.kernel.org/show_bug.cgi?id=60557 Cc: <[email protected]> Reported-by: Oleksii Shevchuk <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
duncanje
pushed a commit
to bluca/mptcp
that referenced
this issue
May 20, 2014
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: multipath-tcp#1 multipath-tcp#2 multipath-tcp#3 multipath-tcp#4 multipath-tcp#5 multipath-tcp#6 multipath-tcp#7 OK [ 0.644008] smpboot: Booting Node 1, Processors: multipath-tcp#8 multipath-tcp#9 multipath-tcp#10 multipath-tcp#11 multipath-tcp#12 multipath-tcp#13 multipath-tcp#14 multipath-tcp#15 OK [ 1.245006] smpboot: Booting Node 2, Processors: multipath-tcp#16 multipath-tcp#17 multipath-tcp#18 multipath-tcp#19 multipath-tcp#20 multipath-tcp#21 multipath-tcp#22 multipath-tcp#23 OK [ 1.864005] smpboot: Booting Node 3, Processors: multipath-tcp#24 multipath-tcp#25 multipath-tcp#26 multipath-tcp#27 multipath-tcp#28 multipath-tcp#29 multipath-tcp#30 multipath-tcp#31 OK [ 2.489005] smpboot: Booting Node 4, Processors: multipath-tcp#32 multipath-tcp#33 multipath-tcp#34 multipath-tcp#35 multipath-tcp#36 multipath-tcp#37 multipath-tcp#38 multipath-tcp#39 OK [ 3.093005] smpboot: Booting Node 5, Processors: multipath-tcp#40 multipath-tcp#41 multipath-tcp#42 multipath-tcp#43 multipath-tcp#44 multipath-tcp#45 multipath-tcp#46 multipath-tcp#47 OK [ 3.698005] smpboot: Booting Node 6, Processors: multipath-tcp#48 multipath-tcp#49 multipath-tcp#50 multipath-tcp#51 multipath-tcp#52 multipath-tcp#53 multipath-tcp#54 multipath-tcp#55 OK [ 4.304005] smpboot: Booting Node 7, Processors: multipath-tcp#56 multipath-tcp#57 multipath-tcp#58 multipath-tcp#59 multipath-tcp#60 multipath-tcp#61 multipath-tcp#62 multipath-tcp#63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: multipath-tcp#1 multipath-tcp#2 multipath-tcp#3 multipath-tcp#4 multipath-tcp#5 multipath-tcp#6 multipath-tcp#7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
duncanje
pushed a commit
to bluca/mptcp
that referenced
this issue
May 20, 2014
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: multipath-tcp#1 multipath-tcp#2 multipath-tcp#3 multipath-tcp#4 multipath-tcp#5 multipath-tcp#6 multipath-tcp#7 [ 0.603005] .... node multipath-tcp#1, CPUs: multipath-tcp#8 multipath-tcp#9 multipath-tcp#10 multipath-tcp#11 multipath-tcp#12 multipath-tcp#13 multipath-tcp#14 multipath-tcp#15 [ 1.200005] .... node multipath-tcp#2, CPUs: multipath-tcp#16 multipath-tcp#17 multipath-tcp#18 multipath-tcp#19 multipath-tcp#20 multipath-tcp#21 multipath-tcp#22 multipath-tcp#23 [ 1.796005] .... node multipath-tcp#3, CPUs: multipath-tcp#24 multipath-tcp#25 multipath-tcp#26 multipath-tcp#27 multipath-tcp#28 multipath-tcp#29 multipath-tcp#30 multipath-tcp#31 [ 2.393005] .... node multipath-tcp#4, CPUs: multipath-tcp#32 multipath-tcp#33 multipath-tcp#34 multipath-tcp#35 multipath-tcp#36 multipath-tcp#37 multipath-tcp#38 multipath-tcp#39 [ 2.996005] .... node multipath-tcp#5, CPUs: multipath-tcp#40 multipath-tcp#41 multipath-tcp#42 multipath-tcp#43 multipath-tcp#44 multipath-tcp#45 multipath-tcp#46 multipath-tcp#47 [ 3.600005] .... node multipath-tcp#6, CPUs: multipath-tcp#48 multipath-tcp#49 multipath-tcp#50 multipath-tcp#51 multipath-tcp#52 multipath-tcp#53 multipath-tcp#54 multipath-tcp#55 [ 4.202005] .... node multipath-tcp#7, CPUs: multipath-tcp#56 multipath-tcp#57 multipath-tcp#58 multipath-tcp#59 multipath-tcp#60 multipath-tcp#61 multipath-tcp#62 multipath-tcp#63 [ 4.811005] .... node multipath-tcp#8, CPUs: multipath-tcp#64 multipath-tcp#65 multipath-tcp#66 multipath-tcp#67 multipath-tcp#68 multipath-tcp#69 multipath-tcp#70 multipath-tcp#71 [ 5.421006] .... node multipath-tcp#9, CPUs: multipath-tcp#72 multipath-tcp#73 multipath-tcp#74 multipath-tcp#75 multipath-tcp#76 multipath-tcp#77 multipath-tcp#78 multipath-tcp#79 [ 6.032005] .... node multipath-tcp#10, CPUs: multipath-tcp#80 multipath-tcp#81 multipath-tcp#82 multipath-tcp#83 multipath-tcp#84 multipath-tcp#85 multipath-tcp#86 multipath-tcp#87 [ 6.648006] .... node multipath-tcp#11, CPUs: multipath-tcp#88 multipath-tcp#89 multipath-tcp#90 multipath-tcp#91 multipath-tcp#92 multipath-tcp#93 multipath-tcp#94 multipath-tcp#95 [ 7.262005] .... node multipath-tcp#12, CPUs: multipath-tcp#96 multipath-tcp#97 multipath-tcp#98 multipath-tcp#99 multipath-tcp#100 multipath-tcp#101 multipath-tcp#102 multipath-tcp#103 [ 7.865005] .... node multipath-tcp#13, CPUs: multipath-tcp#104 multipath-tcp#105 multipath-tcp#106 multipath-tcp#107 multipath-tcp#108 multipath-tcp#109 multipath-tcp#110 multipath-tcp#111 [ 8.466005] .... node multipath-tcp#14, CPUs: multipath-tcp#112 multipath-tcp#113 multipath-tcp#114 multipath-tcp#115 multipath-tcp#116 multipath-tcp#117 multipath-tcp#118 multipath-tcp#119 [ 9.073006] .... node multipath-tcp#15, CPUs: multipath-tcp#120 multipath-tcp#121 multipath-tcp#122 multipath-tcp#123 multipath-tcp#124 multipath-tcp#125 multipath-tcp#126 multipath-tcp#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
duncanje
pushed a commit
to bluca/mptcp
that referenced
this issue
May 20, 2014
The 'driver' field of the i2c_client struct is redundant. The same data can be accessed through to_i2c_driver(client->dev.driver). The generated code for both approaches in more or less the same. E.g. on ARM the expression client->driver->command(...) generates ... ldr r3, [r0, multipath-tcp#28] ldr r3, [r3, multipath-tcp#32] blx r3 ... and the expression to_i2c_driver(client->dev.driver)->command(...) generates ... ldr r3, [r0, multipath-tcp#160] ldr r3, [r3, #-4] blx r3 ... Other architectures will generate similar code. All users of the 'driver' field outside of the I2C core have already been converted. So this only leaves the core itself. This patch converts the remaining few users in the I2C core and then removes the 'driver' field from the i2c_client struct. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
duncanje
pushed a commit
to bluca/mptcp
that referenced
this issue
May 20, 2014
We need to defer the free request until the object/vma is capable of being freed - or else we have a problem when we try to destroy the context. The exact same issue is described and fixed here: commit e207804 Author: Ben Widawsky <[email protected]> Date: Fri Dec 6 14:11:22 2013 -0800 drm/i915: Defer request freeing I had this fix previously, but decided not to keep it for some reason I can no longer remember. gem_reset_stats is a really good test at hitting the problem. For the inquisitive: [ 170.516392] ------------[ cut here ]------------ [ 170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]() [ 170.518064] Memory manager not clean during takedown. [ 170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ multipath-tcp#28 [ 170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012 [ 170.520662] Call Trace: [ 170.521517] [<ffffffff814f0589>] dump_stack+0x4e/0x7a [ 170.522373] [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0 [ 170.523227] [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50 [ 170.524079] [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm] [ 170.524934] [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110 [i915] [ 170.525777] [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29 [i915] [ 170.526603] [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0 [i915] [ 170.527423] [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0 [i915] [ 170.528247] [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915] [ 170.529065] [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915] [ 170.529870] [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120 [i915] [ 170.530666] [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0 [ 170.531453] [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0 [ 170.532230] [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0 [ 170.532996] [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0 [ 170.533771] [<ffffffff810743ef>] kthread+0xff/0x120 [ 170.534548] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.535322] [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0 [ 170.536089] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.536847] ---[ end trace 3d4c12892e42d58f ]--- v2: Whitespace fix. (Chris) Note: This is a bug that only hits the ppgtt topic branch but I've figured that doing the request cleanup in this order is generally the right thing to do. Signed-off-by: Ben Widawsky <[email protected]> [danvet: Add a code comment to clarify what's actually going on since the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also add a note about why we need this.] Signed-off-by: Daniel Vetter <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Jan 13, 2016
[ Upstream commit ecf5fc6 ] Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: [email protected] # 3.9+ [[email protected]: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Jan 13, 2016
[ Upstream commit eefd6b0 ] We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS and once on KVM_FAST_MMIO_BUS but with a single iodev instance. This will lead to an issue: kvm_io_bus_destroy() knows nothing about the devices on two buses pointing to a single dev. Which will lead to double free[1] during exit. Fix this by allocating two instances of iodevs then registering one on KVM_MMIO_BUS and another on KVM_FAST_MMIO_BUS. CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000 RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292 RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900 RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000 R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880 FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0 Stack: ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622 ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80 0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8 Call Trace: [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm] [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm] [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff811f69f7>] __fput+0xe7/0x250 [<ffffffff811f6bae>] ____fput+0xe/0x10 [<ffffffff81093f04>] task_work_run+0xd4/0xf0 [<ffffffff81079358>] do_exit+0x368/0xa50 [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60 [<ffffffff81079ad5>] do_group_exit+0x45/0xb0 [<ffffffff81085c71>] get_signal+0x291/0x750 [<ffffffff810144d8>] do_signal+0x28/0xab0 [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0 [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20 [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170 [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0 [<ffffffff817cb9af>] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP <ffff88020e7f3bc8> Cc: [email protected] Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Jason Wang <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Mar 19, 2016
commit ecf5fc6 upstream. Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: [email protected] # 3.9+ [[email protected]: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Mar 19, 2016
commit eefd6b0 upstream. We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS and once on KVM_FAST_MMIO_BUS but with a single iodev instance. This will lead to an issue: kvm_io_bus_destroy() knows nothing about the devices on two buses pointing to a single dev. Which will lead to double free[1] during exit. Fix this by allocating two instances of iodevs then registering one on KVM_MMIO_BUS and another on KVM_FAST_MMIO_BUS. CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000 RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292 RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900 RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000 R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880 FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0 Stack: ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622 ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80 0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8 Call Trace: [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm] [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm] [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff811f69f7>] __fput+0xe7/0x250 [<ffffffff811f6bae>] ____fput+0xe/0x10 [<ffffffff81093f04>] task_work_run+0xd4/0xf0 [<ffffffff81079358>] do_exit+0x368/0xa50 [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60 [<ffffffff81079ad5>] do_group_exit+0x45/0xb0 [<ffffffff81085c71>] get_signal+0x291/0x750 [<ffffffff810144d8>] do_signal+0x28/0xab0 [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0 [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20 [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170 [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0 [<ffffffff817cb9af>] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP <ffff88020e7f3bc8> Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Jason Wang <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Feb 6, 2018
commit b888fb6 upstream. Move the workaround from stmpe_gpio_irq_unmask() which is executed in atomic context to stmpe_gpio_irq_sync_unlock() which is not. It fixes the following issue: [ 1.500000] BUG: scheduling while atomic: swapper/1/0x00000002 [ 1.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc2-00020-gbd4301f-dirty multipath-tcp#28 [ 1.520000] Hardware name: STM32 (Device Tree Support) [ 1.520000] [<0000bfc9>] (unwind_backtrace) from [<0000b347>] (show_stack+0xb/0xc) [ 1.530000] [<0000b347>] (show_stack) from [<0001fc49>] (__schedule_bug+0x39/0x58) [ 1.530000] [<0001fc49>] (__schedule_bug) from [<00168211>] (__schedule+0x23/0x2b2) [ 1.550000] [<00168211>] (__schedule) from [<001684f7>] (schedule+0x57/0x64) [ 1.550000] [<001684f7>] (schedule) from [<0016a513>] (schedule_timeout+0x137/0x164) [ 1.550000] [<0016a513>] (schedule_timeout) from [<00168b91>] (wait_for_common+0x8d/0xfc) [ 1.570000] [<00168b91>] (wait_for_common) from [<00139753>] (stm32f4_i2c_xfer+0xe9/0xfe) [ 1.580000] [<00139753>] (stm32f4_i2c_xfer) from [<00138545>] (__i2c_transfer+0x111/0x148) [ 1.590000] [<00138545>] (__i2c_transfer) from [<001385cf>] (i2c_transfer+0x53/0x70) [ 1.590000] [<001385cf>] (i2c_transfer) from [<001388a5>] (i2c_smbus_xfer+0x12f/0x36e) [ 1.600000] [<001388a5>] (i2c_smbus_xfer) from [<00138b49>] (i2c_smbus_read_byte_data+0x1f/0x2a) [ 1.610000] [<00138b49>] (i2c_smbus_read_byte_data) from [<00124fdd>] (__stmpe_reg_read+0xd/0x24) [ 1.620000] [<00124fdd>] (__stmpe_reg_read) from [<001252b3>] (stmpe_reg_read+0x19/0x24) [ 1.630000] [<001252b3>] (stmpe_reg_read) from [<0002c4d1>] (unmask_irq+0x17/0x22) [ 1.640000] [<0002c4d1>] (unmask_irq) from [<0002c57f>] (irq_startup+0x6f/0x78) [ 1.650000] [<0002c57f>] (irq_startup) from [<0002b7a1>] (__setup_irq+0x319/0x47c) [ 1.650000] [<0002b7a1>] (__setup_irq) from [<0002bad3>] (request_threaded_irq+0x6b/0xe8) [ 1.660000] [<0002bad3>] (request_threaded_irq) from [<0002d0b9>] (devm_request_threaded_irq+0x3b/0x6a) [ 1.670000] [<0002d0b9>] (devm_request_threaded_irq) from [<001446e7>] (mmc_gpiod_request_cd_irq+0x49/0x8a) [ 1.680000] [<001446e7>] (mmc_gpiod_request_cd_irq) from [<0013d45d>] (mmc_start_host+0x49/0x60) [ 1.690000] [<0013d45d>] (mmc_start_host) from [<0013e40b>] (mmc_add_host+0x3b/0x54) [ 1.700000] [<0013e40b>] (mmc_add_host) from [<00148119>] (mmci_probe+0x4d1/0x60c) [ 1.710000] [<00148119>] (mmci_probe) from [<000f903b>] (amba_probe+0x7b/0xbe) [ 1.720000] [<000f903b>] (amba_probe) from [<001170e5>] (driver_probe_device+0x169/0x1f8) [ 1.730000] [<001170e5>] (driver_probe_device) from [<001171b7>] (__driver_attach+0x43/0x5c) [ 1.740000] [<001171b7>] (__driver_attach) from [<0011618d>] (bus_for_each_dev+0x3d/0x46) [ 1.740000] [<0011618d>] (bus_for_each_dev) from [<001165cd>] (bus_add_driver+0xcd/0x124) [ 1.740000] [<001165cd>] (bus_add_driver) from [<00117713>] (driver_register+0x4d/0x7a) [ 1.760000] [<00117713>] (driver_register) from [<001fc765>] (do_one_initcall+0xbd/0xe8) [ 1.770000] [<001fc765>] (do_one_initcall) from [<001fc88b>] (kernel_init_freeable+0xfb/0x134) [ 1.780000] [<001fc88b>] (kernel_init_freeable) from [<00167ee3>] (kernel_init+0x7/0x9c) [ 1.790000] [<00167ee3>] (kernel_init) from [<00009b65>] (ret_from_fork+0x11/0x2c) Signed-off-by: Alexandre TORGUE <[email protected]> Signed-off-by: Patrice Chotard <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Feb 27, 2018
commit b888fb6 upstream. Move the workaround from stmpe_gpio_irq_unmask() which is executed in atomic context to stmpe_gpio_irq_sync_unlock() which is not. It fixes the following issue: [ 1.500000] BUG: scheduling while atomic: swapper/1/0x00000002 [ 1.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc2-00020-gbd4301f-dirty #28 [ 1.520000] Hardware name: STM32 (Device Tree Support) [ 1.520000] [<0000bfc9>] (unwind_backtrace) from [<0000b347>] (show_stack+0xb/0xc) [ 1.530000] [<0000b347>] (show_stack) from [<0001fc49>] (__schedule_bug+0x39/0x58) [ 1.530000] [<0001fc49>] (__schedule_bug) from [<00168211>] (__schedule+0x23/0x2b2) [ 1.550000] [<00168211>] (__schedule) from [<001684f7>] (schedule+0x57/0x64) [ 1.550000] [<001684f7>] (schedule) from [<0016a513>] (schedule_timeout+0x137/0x164) [ 1.550000] [<0016a513>] (schedule_timeout) from [<00168b91>] (wait_for_common+0x8d/0xfc) [ 1.570000] [<00168b91>] (wait_for_common) from [<00139753>] (stm32f4_i2c_xfer+0xe9/0xfe) [ 1.580000] [<00139753>] (stm32f4_i2c_xfer) from [<00138545>] (__i2c_transfer+0x111/0x148) [ 1.590000] [<00138545>] (__i2c_transfer) from [<001385cf>] (i2c_transfer+0x53/0x70) [ 1.590000] [<001385cf>] (i2c_transfer) from [<001388a5>] (i2c_smbus_xfer+0x12f/0x36e) [ 1.600000] [<001388a5>] (i2c_smbus_xfer) from [<00138b49>] (i2c_smbus_read_byte_data+0x1f/0x2a) [ 1.610000] [<00138b49>] (i2c_smbus_read_byte_data) from [<00124fdd>] (__stmpe_reg_read+0xd/0x24) [ 1.620000] [<00124fdd>] (__stmpe_reg_read) from [<001252b3>] (stmpe_reg_read+0x19/0x24) [ 1.630000] [<001252b3>] (stmpe_reg_read) from [<0002c4d1>] (unmask_irq+0x17/0x22) [ 1.640000] [<0002c4d1>] (unmask_irq) from [<0002c57f>] (irq_startup+0x6f/0x78) [ 1.650000] [<0002c57f>] (irq_startup) from [<0002b7a1>] (__setup_irq+0x319/0x47c) [ 1.650000] [<0002b7a1>] (__setup_irq) from [<0002bad3>] (request_threaded_irq+0x6b/0xe8) [ 1.660000] [<0002bad3>] (request_threaded_irq) from [<0002d0b9>] (devm_request_threaded_irq+0x3b/0x6a) [ 1.670000] [<0002d0b9>] (devm_request_threaded_irq) from [<001446e7>] (mmc_gpiod_request_cd_irq+0x49/0x8a) [ 1.680000] [<001446e7>] (mmc_gpiod_request_cd_irq) from [<0013d45d>] (mmc_start_host+0x49/0x60) [ 1.690000] [<0013d45d>] (mmc_start_host) from [<0013e40b>] (mmc_add_host+0x3b/0x54) [ 1.700000] [<0013e40b>] (mmc_add_host) from [<00148119>] (mmci_probe+0x4d1/0x60c) [ 1.710000] [<00148119>] (mmci_probe) from [<000f903b>] (amba_probe+0x7b/0xbe) [ 1.720000] [<000f903b>] (amba_probe) from [<001170e5>] (driver_probe_device+0x169/0x1f8) [ 1.730000] [<001170e5>] (driver_probe_device) from [<001171b7>] (__driver_attach+0x43/0x5c) [ 1.740000] [<001171b7>] (__driver_attach) from [<0011618d>] (bus_for_each_dev+0x3d/0x46) [ 1.740000] [<0011618d>] (bus_for_each_dev) from [<001165cd>] (bus_add_driver+0xcd/0x124) [ 1.740000] [<001165cd>] (bus_add_driver) from [<00117713>] (driver_register+0x4d/0x7a) [ 1.760000] [<00117713>] (driver_register) from [<001fc765>] (do_one_initcall+0xbd/0xe8) [ 1.770000] [<001fc765>] (do_one_initcall) from [<001fc88b>] (kernel_init_freeable+0xfb/0x134) [ 1.780000] [<001fc88b>] (kernel_init_freeable) from [<00167ee3>] (kernel_init+0x7/0x9c) [ 1.790000] [<00167ee3>] (kernel_init) from [<00009b65>] (ret_from_fork+0x11/0x2c) Signed-off-by: Alexandre TORGUE <[email protected]> Signed-off-by: Patrice Chotard <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Apr 24, 2018
commit 28b0f8a upstream. A tty is hung up by __tty_hangup() setting file->f_op to hung_up_tty_fops, which is skipped on ttys whose write operation isn't tty_write(). This means that, for example, /dev/console whose write op is redirected_tty_write() is never actually marked hung up. Because n_tty_read() uses the hung up status to decide whether to abort the waiting readers, the lack of hung-up marking can lead to the following scenario. 1. A session contains two processes. The leader and its child. The child ignores SIGHUP. 2. The leader exits and starts disassociating from the controlling terminal (/dev/console). 3. __tty_hangup() skips setting f_op to hung_up_tty_fops. 4. SIGHUP is delivered and ignored. 5. tty_ldisc_hangup() is invoked. It wakes up the waits which should clear the read lockers of tty->ldisc_sem. 6. The reader wakes up but because tty_hung_up_p() is false, it doesn't abort and goes back to sleep while read-holding tty->ldisc_sem. 7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup() and is now stuck in D sleep indefinitely waiting for tty->ldisc_sem. The following is Alan's explanation on why some ttys aren't hung up. http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop 1. It broke the serial consoles because they would hang up and close down the hardware. With tty_port that *should* be fixable properly for any cases remaining. 2. The console layer was (and still is) completely broken and doens't refcount properly. So if you turn on console hangups it breaks (as indeed does freeing consoles and half a dozen other things). As neither can be fixed quickly, this patch works around the problem by introducing a new flag, TTY_HUPPING, which is used solely to tell n_tty_read() that hang-up is in progress for the console and the readers should be aborted regardless of the hung-up status of the device. The following is a sample hung task warning caused by this issue. INFO: task agetty:2662 blocked for more than 120 seconds. Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty multipath-tcp#28 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 0 2662 1 0x00000086 Call Trace: __schedule+0x267/0x890 schedule+0x36/0x80 schedule_timeout+0x23c/0x2e0 ldsem_down_write+0xce/0x1f6 tty_ldisc_lock+0x16/0x30 tty_ldisc_hangup+0xb3/0x1b0 __tty_hangup+0x300/0x410 disassociate_ctty+0x6c/0x290 do_exit+0x7ef/0xb00 do_group_exit+0x3f/0xa0 get_signal+0x1b3/0x5d0 do_signal+0x28/0x660 exit_to_usermode_loop+0x46/0x86 do_syscall_64+0x9c/0xb0 entry_SYSCALL64_slow_path+0x25/0x25 The following is the repro. Run "$PROG /dev/console". The parent process hangs in D state. #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <sys/ioctl.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <signal.h> #include <time.h> #include <termios.h> int main(int argc, char **argv) { struct sigaction sact = { .sa_handler = SIG_IGN }; struct timespec ts1s = { .tv_sec = 1 }; pid_t pid; int fd; if (argc < 2) { fprintf(stderr, "test-hung-tty /dev/$TTY\n"); return 1; } /* fork a child to ensure that it isn't already the session leader */ pid = fork(); if (pid < 0) { perror("fork"); return 1; } if (pid > 0) { /* top parent, wait for everyone */ while (waitpid(-1, NULL, 0) >= 0) ; if (errno != ECHILD) perror("waitpid"); return 0; } /* new session, start a new session and set the controlling tty */ if (setsid() < 0) { perror("setsid"); return 1; } fd = open(argv[1], O_RDWR); if (fd < 0) { perror("open"); return 1; } if (ioctl(fd, TIOCSCTTY, 1) < 0) { perror("ioctl"); return 1; } /* fork a child, sleep a bit and exit */ pid = fork(); if (pid < 0) { perror("fork"); return 1; } if (pid > 0) { nanosleep(&ts1s, NULL); printf("Session leader exiting\n"); exit(0); } /* * The child ignores SIGHUP and keeps reading from the controlling * tty. Because SIGHUP is ignored, the child doesn't get killed on * parent exit and the bug in n_tty makes the read(2) block the * parent's control terminal hangup attempt. The parent ends up in * D sleep until the child is explicitly killed. */ sigaction(SIGHUP, &sact, NULL); printf("Child reading tty\n"); while (1) { char buf[1024]; if (read(fd, buf, sizeof(buf)) < 0) { perror("read"); return 1; } } return 0; } Signed-off-by: Tejun Heo <[email protected]> Cc: Alan Cox <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
May 1, 2018
A tty is hung up by __tty_hangup() setting file->f_op to hung_up_tty_fops, which is skipped on ttys whose write operation isn't tty_write(). This means that, for example, /dev/console whose write op is redirected_tty_write() is never actually marked hung up. Because n_tty_read() uses the hung up status to decide whether to abort the waiting readers, the lack of hung-up marking can lead to the following scenario. 1. A session contains two processes. The leader and its child. The child ignores SIGHUP. 2. The leader exits and starts disassociating from the controlling terminal (/dev/console). 3. __tty_hangup() skips setting f_op to hung_up_tty_fops. 4. SIGHUP is delivered and ignored. 5. tty_ldisc_hangup() is invoked. It wakes up the waits which should clear the read lockers of tty->ldisc_sem. 6. The reader wakes up but because tty_hung_up_p() is false, it doesn't abort and goes back to sleep while read-holding tty->ldisc_sem. 7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup() and is now stuck in D sleep indefinitely waiting for tty->ldisc_sem. The following is Alan's explanation on why some ttys aren't hung up. http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop 1. It broke the serial consoles because they would hang up and close down the hardware. With tty_port that *should* be fixable properly for any cases remaining. 2. The console layer was (and still is) completely broken and doens't refcount properly. So if you turn on console hangups it breaks (as indeed does freeing consoles and half a dozen other things). As neither can be fixed quickly, this patch works around the problem by introducing a new flag, TTY_HUPPING, which is used solely to tell n_tty_read() that hang-up is in progress for the console and the readers should be aborted regardless of the hung-up status of the device. The following is a sample hung task warning caused by this issue. INFO: task agetty:2662 blocked for more than 120 seconds. Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty #28 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 0 2662 1 0x00000086 Call Trace: __schedule+0x267/0x890 schedule+0x36/0x80 schedule_timeout+0x23c/0x2e0 ldsem_down_write+0xce/0x1f6 tty_ldisc_lock+0x16/0x30 tty_ldisc_hangup+0xb3/0x1b0 __tty_hangup+0x300/0x410 disassociate_ctty+0x6c/0x290 do_exit+0x7ef/0xb00 do_group_exit+0x3f/0xa0 get_signal+0x1b3/0x5d0 do_signal+0x28/0x660 exit_to_usermode_loop+0x46/0x86 do_syscall_64+0x9c/0xb0 entry_SYSCALL64_slow_path+0x25/0x25 The following is the repro. Run "$PROG /dev/console". The parent process hangs in D state. #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <sys/ioctl.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <signal.h> #include <time.h> #include <termios.h> int main(int argc, char **argv) { struct sigaction sact = { .sa_handler = SIG_IGN }; struct timespec ts1s = { .tv_sec = 1 }; pid_t pid; int fd; if (argc < 2) { fprintf(stderr, "test-hung-tty /dev/$TTY\n"); return 1; } /* fork a child to ensure that it isn't already the session leader */ pid = fork(); if (pid < 0) { perror("fork"); return 1; } if (pid > 0) { /* top parent, wait for everyone */ while (waitpid(-1, NULL, 0) >= 0) ; if (errno != ECHILD) perror("waitpid"); return 0; } /* new session, start a new session and set the controlling tty */ if (setsid() < 0) { perror("setsid"); return 1; } fd = open(argv[1], O_RDWR); if (fd < 0) { perror("open"); return 1; } if (ioctl(fd, TIOCSCTTY, 1) < 0) { perror("ioctl"); return 1; } /* fork a child, sleep a bit and exit */ pid = fork(); if (pid < 0) { perror("fork"); return 1; } if (pid > 0) { nanosleep(&ts1s, NULL); printf("Session leader exiting\n"); exit(0); } /* * The child ignores SIGHUP and keeps reading from the controlling * tty. Because SIGHUP is ignored, the child doesn't get killed on * parent exit and the bug in n_tty makes the read(2) block the * parent's control terminal hangup attempt. The parent ends up in * D sleep until the child is explicitly killed. */ sigaction(SIGHUP, &sact, NULL); printf("Child reading tty\n"); while (1) { char buf[1024]; if (read(fd, buf, sizeof(buf)) < 0) { perror("read"); return 1; } } return 0; } Signed-off-by: Tejun Heo <[email protected]> Cc: Alan Cox <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
May 16, 2018
commit 28b0f8a upstream. A tty is hung up by __tty_hangup() setting file->f_op to hung_up_tty_fops, which is skipped on ttys whose write operation isn't tty_write(). This means that, for example, /dev/console whose write op is redirected_tty_write() is never actually marked hung up. Because n_tty_read() uses the hung up status to decide whether to abort the waiting readers, the lack of hung-up marking can lead to the following scenario. 1. A session contains two processes. The leader and its child. The child ignores SIGHUP. 2. The leader exits and starts disassociating from the controlling terminal (/dev/console). 3. __tty_hangup() skips setting f_op to hung_up_tty_fops. 4. SIGHUP is delivered and ignored. 5. tty_ldisc_hangup() is invoked. It wakes up the waits which should clear the read lockers of tty->ldisc_sem. 6. The reader wakes up but because tty_hung_up_p() is false, it doesn't abort and goes back to sleep while read-holding tty->ldisc_sem. 7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup() and is now stuck in D sleep indefinitely waiting for tty->ldisc_sem. The following is Alan's explanation on why some ttys aren't hung up. http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop 1. It broke the serial consoles because they would hang up and close down the hardware. With tty_port that *should* be fixable properly for any cases remaining. 2. The console layer was (and still is) completely broken and doens't refcount properly. So if you turn on console hangups it breaks (as indeed does freeing consoles and half a dozen other things). As neither can be fixed quickly, this patch works around the problem by introducing a new flag, TTY_HUPPING, which is used solely to tell n_tty_read() that hang-up is in progress for the console and the readers should be aborted regardless of the hung-up status of the device. The following is a sample hung task warning caused by this issue. INFO: task agetty:2662 blocked for more than 120 seconds. Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty #28 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 0 2662 1 0x00000086 Call Trace: __schedule+0x267/0x890 schedule+0x36/0x80 schedule_timeout+0x23c/0x2e0 ldsem_down_write+0xce/0x1f6 tty_ldisc_lock+0x16/0x30 tty_ldisc_hangup+0xb3/0x1b0 __tty_hangup+0x300/0x410 disassociate_ctty+0x6c/0x290 do_exit+0x7ef/0xb00 do_group_exit+0x3f/0xa0 get_signal+0x1b3/0x5d0 do_signal+0x28/0x660 exit_to_usermode_loop+0x46/0x86 do_syscall_64+0x9c/0xb0 entry_SYSCALL64_slow_path+0x25/0x25 The following is the repro. Run "$PROG /dev/console". The parent process hangs in D state. #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <sys/ioctl.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <signal.h> #include <time.h> #include <termios.h> int main(int argc, char **argv) { struct sigaction sact = { .sa_handler = SIG_IGN }; struct timespec ts1s = { .tv_sec = 1 }; pid_t pid; int fd; if (argc < 2) { fprintf(stderr, "test-hung-tty /dev/$TTY\n"); return 1; } /* fork a child to ensure that it isn't already the session leader */ pid = fork(); if (pid < 0) { perror("fork"); return 1; } if (pid > 0) { /* top parent, wait for everyone */ while (waitpid(-1, NULL, 0) >= 0) ; if (errno != ECHILD) perror("waitpid"); return 0; } /* new session, start a new session and set the controlling tty */ if (setsid() < 0) { perror("setsid"); return 1; } fd = open(argv[1], O_RDWR); if (fd < 0) { perror("open"); return 1; } if (ioctl(fd, TIOCSCTTY, 1) < 0) { perror("ioctl"); return 1; } /* fork a child, sleep a bit and exit */ pid = fork(); if (pid < 0) { perror("fork"); return 1; } if (pid > 0) { nanosleep(&ts1s, NULL); printf("Session leader exiting\n"); exit(0); } /* * The child ignores SIGHUP and keeps reading from the controlling * tty. Because SIGHUP is ignored, the child doesn't get killed on * parent exit and the bug in n_tty makes the read(2) block the * parent's control terminal hangup attempt. The parent ends up in * D sleep until the child is explicitly killed. */ sigaction(SIGHUP, &sact, NULL); printf("Child reading tty\n"); while (1) { char buf[1024]; if (read(fd, buf, sizeof(buf)) < 0) { perror("read"); return 1; } } return 0; } Signed-off-by: Tejun Heo <[email protected]> Cc: Alan Cox <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
May 24, 2018
[ Upstream commit af50e4b ] syzbot caught an infinite recursion in nsh_gso_segment(). Problem here is that we need to make sure the NSH header is of reasonable length. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by syz-executor0/10189: #0: (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517 #1: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #1: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#2: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#2: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#3: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#3: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#4: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#4: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#5: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#5: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#6: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#6: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#7: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#7: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#8: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#8: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#9: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#9: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#10: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#10: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#11: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#11: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#12: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#12: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#13: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#13: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#14: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#14: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#15: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#15: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#16: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#16: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#17: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#17: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#18: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#18: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#19: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#19: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#20: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#20: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#21: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#21: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#22: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#22: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#23: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#23: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#24: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#24: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#25: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#25: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#26: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#26: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#27: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#27: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#28: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#28: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#29: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#29: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#30: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#30: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#31: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#31: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 dccp_close: ABORT with 65423 bytes unread multipath-tcp#32: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#32: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#33: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#33: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#34: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#34: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#35: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#35: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#36: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#36: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#37: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#37: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#38: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#38: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#39: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#39: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#40: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#40: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#41: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#41: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#42: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#42: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#43: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#43: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#44: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#44: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#45: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#45: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#46: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#46: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 multipath-tcp#47: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] multipath-tcp#47: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 INFO: lockdep is turned off. CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ multipath-tcp#26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920 rcu_lock_acquire include/linux/rcupdate.h:246 [inline] rcu_read_lock include/linux/rcupdate.h:632 [inline] skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4025 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312 qdisc_restart net/sched/sch_generic.c:399 [inline] __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410 __dev_xmit_skb net/core/dev.c:3243 [inline] __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616 packet_snd net/packet/af_packet.c:2951 [inline] packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 __sys_sendto+0x3d7/0x670 net/socket.c:1789 __do_sys_sendto net/socket.c:1801 [inline] __se_sys_sendto net/socket.c:1797 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: c411ed8 ("nsh: add GSO support") Signed-off-by: Eric Dumazet <[email protected]> Cc: Jiri Benc <[email protected]> Reported-by: syzbot <[email protected]> Acked-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Oct 12, 2018
commit 16c4cb1 upstream. Applying dynamic usbcore quirks in early booting when the slab is not yet ready would cause kernel panic of null pointer dereference because the quirk_count has been counted as 1 while the quirk_list was failed to allocate. i.e., [ 1.044970] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1.044995] IP: [<ffffffffb0953ec7>] usb_detect_quirks+0x88/0xd1 [ 1.045016] PGD 0 [ 1.045026] Oops: 0000 [#1] PREEMPT SMP [ 1.046986] gsmi: Log Shutdown Reason 0x03 [ 1.046995] Modules linked in: [ 1.047008] CPU: 0 PID: 81 Comm: kworker/0:3 Not tainted 4.4.154 multipath-tcp#28 [ 1.047016] Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.27.0 12/04/2017 [ 1.047028] Workqueue: usb_hub_wq hub_event [ 1.047037] task: ffff88017a321c80 task.stack: ffff88017a384000 [ 1.047044] RIP: 0010:[<ffffffffb0953ec7>] [<ffffffffb0953ec7>] usb_detect_quirks+0x88/0xd1 To tackle this odd, let's balance the quirk_count to 0 when the kcalloc call fails, and defer the quirk setting into a lower level callback which ensures that the kernel memory management has been initialized. Fixes: 027bd6c ("usb: core: Add "quirks" parameter for usbcore") Signed-off-by: Harry Pan <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
teto
pushed a commit
to teto/mptcp
that referenced
this issue
Apr 14, 2019
Applying dynamic usbcore quirks in early booting when the slab is not yet ready would cause kernel panic of null pointer dereference because the quirk_count has been counted as 1 while the quirk_list was failed to allocate. i.e., [ 1.044970] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1.044995] IP: [<ffffffffb0953ec7>] usb_detect_quirks+0x88/0xd1 [ 1.045016] PGD 0 [ 1.045026] Oops: 0000 [multipath-tcp#1] PREEMPT SMP [ 1.046986] gsmi: Log Shutdown Reason 0x03 [ 1.046995] Modules linked in: [ 1.047008] CPU: 0 PID: 81 Comm: kworker/0:3 Not tainted 4.4.154 multipath-tcp#28 [ 1.047016] Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.27.0 12/04/2017 [ 1.047028] Workqueue: usb_hub_wq hub_event [ 1.047037] task: ffff88017a321c80 task.stack: ffff88017a384000 [ 1.047044] RIP: 0010:[<ffffffffb0953ec7>] [<ffffffffb0953ec7>] usb_detect_quirks+0x88/0xd1 To tackle this odd, let's balance the quirk_count to 0 when the kcalloc call fails, and defer the quirk setting into a lower level callback which ensures that the kernel memory management has been initialized. Fixes: 027bd6c ("usb: core: Add "quirks" parameter for usbcore") Signed-off-by: Harry Pan <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Jun 22, 2019
[ Upstream commit 12e750b ] If alloc_workqueue fails in alua_init, it should return -ENOMEM, otherwise it will trigger null-ptr-deref while unloading module which calls destroy_workqueue dereference wq->lock like this: BUG: KASAN: null-ptr-deref in __lock_acquire+0x6b4/0x1ee0 Read of size 8 at addr 0000000000000080 by task syz-executor.0/7045 CPU: 0 PID: 7045 Comm: syz-executor.0 Tainted: G C 5.1.0+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 Call Trace: dump_stack+0xa9/0x10e __kasan_report+0x171/0x18d ? __lock_acquire+0x6b4/0x1ee0 kasan_report+0xe/0x20 __lock_acquire+0x6b4/0x1ee0 lock_acquire+0xb4/0x1b0 __mutex_lock+0xd8/0xb90 drain_workqueue+0x25/0x290 destroy_workqueue+0x1f/0x3f0 __x64_sys_delete_module+0x244/0x330 do_syscall_64+0x72/0x2a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Hulk Robot <[email protected]> Fixes: 03197b6 ("scsi_dh_alua: Use workqueue for RTPG") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Jul 12, 2019
[ Upstream commit 12e750b ] If alloc_workqueue fails in alua_init, it should return -ENOMEM, otherwise it will trigger null-ptr-deref while unloading module which calls destroy_workqueue dereference wq->lock like this: BUG: KASAN: null-ptr-deref in __lock_acquire+0x6b4/0x1ee0 Read of size 8 at addr 0000000000000080 by task syz-executor.0/7045 CPU: 0 PID: 7045 Comm: syz-executor.0 Tainted: G C 5.1.0+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 Call Trace: dump_stack+0xa9/0x10e __kasan_report+0x171/0x18d ? __lock_acquire+0x6b4/0x1ee0 kasan_report+0xe/0x20 __lock_acquire+0x6b4/0x1ee0 lock_acquire+0xb4/0x1b0 __mutex_lock+0xd8/0xb90 drain_workqueue+0x25/0x290 destroy_workqueue+0x1f/0x3f0 __x64_sys_delete_module+0x244/0x330 do_syscall_64+0x72/0x2a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Hulk Robot <[email protected]> Fixes: 03197b6 ("scsi_dh_alua: Use workqueue for RTPG") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Jul 12, 2019
[ Upstream commit 5caaf29 ] If spi_register_master fails in spi_bitbang_start because device_add failure, We should return the error code other than 0, otherwise calling spi_bitbang_stop may trigger NULL pointer dereference like this: BUG: KASAN: null-ptr-deref in __list_del_entry_valid+0x45/0xd0 Read of size 8 at addr 0000000000000000 by task syz-executor.0/3661 CPU: 0 PID: 3661 Comm: syz-executor.0 Not tainted 5.1.0+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0xa9/0x10e ? __list_del_entry_valid+0x45/0xd0 ? __list_del_entry_valid+0x45/0xd0 __kasan_report+0x171/0x18d ? __list_del_entry_valid+0x45/0xd0 kasan_report+0xe/0x20 __list_del_entry_valid+0x45/0xd0 spi_unregister_controller+0x99/0x1b0 spi_lm70llp_attach+0x3ae/0x4b0 [spi_lm70llp] ? 0xffffffffc1128000 ? klist_next+0x131/0x1e0 ? driver_detach+0x40/0x40 [parport] port_check+0x3b/0x50 [parport] bus_for_each_dev+0x115/0x180 ? subsys_dev_iter_exit+0x20/0x20 __parport_register_driver+0x1f0/0x210 [parport] ? 0xffffffffc1150000 do_one_initcall+0xb9/0x3b5 ? perf_trace_initcall_level+0x270/0x270 ? kasan_unpoison_shadow+0x30/0x40 ? kasan_unpoison_shadow+0x30/0x40 do_init_module+0xe0/0x330 load_module+0x38eb/0x4270 ? module_frob_arch_sections+0x20/0x20 ? kernel_read_file+0x188/0x3f0 ? find_held_lock+0x6d/0xd0 ? fput_many+0x1a/0xe0 ? __do_sys_finit_module+0x162/0x190 __do_sys_finit_module+0x162/0x190 ? __ia32_sys_init_module+0x40/0x40 ? __mutex_unlock_slowpath+0xb4/0x3f0 ? wait_for_completion+0x240/0x240 ? vfs_write+0x160/0x2a0 ? lockdep_hardirqs_off+0xb5/0x100 ? mark_held_locks+0x1a/0x90 ? do_syscall_64+0x14/0x2a0 do_syscall_64+0x72/0x2a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Hulk Robot <[email protected]> Fixes: 702a487 ("spi: bitbang: Let spi_bitbang_start() take a reference to master") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Axel Lin <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Jul 16, 2019
[ Upstream commit 5caaf29 ] If spi_register_master fails in spi_bitbang_start because device_add failure, We should return the error code other than 0, otherwise calling spi_bitbang_stop may trigger NULL pointer dereference like this: BUG: KASAN: null-ptr-deref in __list_del_entry_valid+0x45/0xd0 Read of size 8 at addr 0000000000000000 by task syz-executor.0/3661 CPU: 0 PID: 3661 Comm: syz-executor.0 Not tainted 5.1.0+ multipath-tcp#28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0xa9/0x10e ? __list_del_entry_valid+0x45/0xd0 ? __list_del_entry_valid+0x45/0xd0 __kasan_report+0x171/0x18d ? __list_del_entry_valid+0x45/0xd0 kasan_report+0xe/0x20 __list_del_entry_valid+0x45/0xd0 spi_unregister_controller+0x99/0x1b0 spi_lm70llp_attach+0x3ae/0x4b0 [spi_lm70llp] ? 0xffffffffc1128000 ? klist_next+0x131/0x1e0 ? driver_detach+0x40/0x40 [parport] port_check+0x3b/0x50 [parport] bus_for_each_dev+0x115/0x180 ? subsys_dev_iter_exit+0x20/0x20 __parport_register_driver+0x1f0/0x210 [parport] ? 0xffffffffc1150000 do_one_initcall+0xb9/0x3b5 ? perf_trace_initcall_level+0x270/0x270 ? kasan_unpoison_shadow+0x30/0x40 ? kasan_unpoison_shadow+0x30/0x40 do_init_module+0xe0/0x330 load_module+0x38eb/0x4270 ? module_frob_arch_sections+0x20/0x20 ? kernel_read_file+0x188/0x3f0 ? find_held_lock+0x6d/0xd0 ? fput_many+0x1a/0xe0 ? __do_sys_finit_module+0x162/0x190 __do_sys_finit_module+0x162/0x190 ? __ia32_sys_init_module+0x40/0x40 ? __mutex_unlock_slowpath+0xb4/0x3f0 ? wait_for_completion+0x240/0x240 ? vfs_write+0x160/0x2a0 ? lockdep_hardirqs_off+0xb5/0x100 ? mark_held_locks+0x1a/0x90 ? do_syscall_64+0x14/0x2a0 do_syscall_64+0x72/0x2a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Hulk Robot <[email protected]> Fixes: 702a487 ("spi: bitbang: Let spi_bitbang_start() take a reference to master") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Axel Lin <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Sep 21, 2019
commit 3dd550a upstream. The syzbot fuzzer provoked a slab-out-of-bounds error in the USB core: BUG: KASAN: slab-out-of-bounds in memcmp+0xa6/0xb0 lib/string.c:904 Read of size 1 at addr ffff8881d175bed6 by task kworker/0:3/2746 CPU: 0 PID: 2746 Comm: kworker/0:3 Not tainted 5.3.0-rc5+ #28 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x6a/0x32c mm/kasan/report.c:351 __kasan_report.cold+0x1a/0x33 mm/kasan/report.c:482 kasan_report+0xe/0x12 mm/kasan/common.c:612 memcmp+0xa6/0xb0 lib/string.c:904 memcmp include/linux/string.h:400 [inline] descriptors_changed drivers/usb/core/hub.c:5579 [inline] usb_reset_and_verify_device+0x564/0x1300 drivers/usb/core/hub.c:5729 usb_reset_device+0x4c1/0x920 drivers/usb/core/hub.c:5898 rt2x00usb_probe+0x53/0x7af drivers/net/wireless/ralink/rt2x00/rt2x00usb.c:806 The error occurs when the descriptors_changed() routine (called during a device reset) attempts to compare the old and new BOS and capability descriptors. The length it uses for the comparison is the wTotalLength value stored in BOS descriptor, but this value is not necessarily the same as the length actually allocated for the descriptors. If it is larger the routine will call memcmp() with a length that is too big, thus reading beyond the end of the allocated region and leading to this fault. The kernel reads the BOS descriptor twice: first to get the total length of all the capability descriptors, and second to read it along with all those other descriptors. A malicious (or very faulty) device may send different values for the BOS descriptor fields each time. The memory area will be allocated using the wTotalLength value read the first time, but stored within it will be the value read the second time. To prevent this possibility from causing any errors, this patch modifies the BOS descriptor after it has been read the second time: It sets the wTotalLength field to the actual length of the descriptors that were read in and validated. Then the memcpy() call, or any other code using these descriptors, will be able to rely on wTotalLength being valid. Reported-and-tested-by: [email protected] Signed-off-by: Alan Stern <[email protected]> CC: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Oct 28, 2019
commit 3dd550a upstream. The syzbot fuzzer provoked a slab-out-of-bounds error in the USB core: BUG: KASAN: slab-out-of-bounds in memcmp+0xa6/0xb0 lib/string.c:904 Read of size 1 at addr ffff8881d175bed6 by task kworker/0:3/2746 CPU: 0 PID: 2746 Comm: kworker/0:3 Not tainted 5.3.0-rc5+ #28 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x6a/0x32c mm/kasan/report.c:351 __kasan_report.cold+0x1a/0x33 mm/kasan/report.c:482 kasan_report+0xe/0x12 mm/kasan/common.c:612 memcmp+0xa6/0xb0 lib/string.c:904 memcmp include/linux/string.h:400 [inline] descriptors_changed drivers/usb/core/hub.c:5579 [inline] usb_reset_and_verify_device+0x564/0x1300 drivers/usb/core/hub.c:5729 usb_reset_device+0x4c1/0x920 drivers/usb/core/hub.c:5898 rt2x00usb_probe+0x53/0x7af drivers/net/wireless/ralink/rt2x00/rt2x00usb.c:806 The error occurs when the descriptors_changed() routine (called during a device reset) attempts to compare the old and new BOS and capability descriptors. The length it uses for the comparison is the wTotalLength value stored in BOS descriptor, but this value is not necessarily the same as the length actually allocated for the descriptors. If it is larger the routine will call memcmp() with a length that is too big, thus reading beyond the end of the allocated region and leading to this fault. The kernel reads the BOS descriptor twice: first to get the total length of all the capability descriptors, and second to read it along with all those other descriptors. A malicious (or very faulty) device may send different values for the BOS descriptor fields each time. The memory area will be allocated using the wTotalLength value read the first time, but stored within it will be the value read the second time. To prevent this possibility from causing any errors, this patch modifies the BOS descriptor after it has been read the second time: It sets the wTotalLength field to the actual length of the descriptors that were read in and validated. Then the memcpy() call, or any other code using these descriptors, will be able to rely on wTotalLength being valid. Reported-and-tested-by: [email protected] Signed-off-by: Alan Stern <[email protected]> CC: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Oct 28, 2019
commit 98375b8 upstream. The syzbot fuzzer provoked a general protection fault in the hid-prodikeys driver: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.3.0-rc5+ #28 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event RIP: 0010:pcmidi_submit_output_report drivers/hid/hid-prodikeys.c:300 [inline] RIP: 0010:pcmidi_set_operational drivers/hid/hid-prodikeys.c:558 [inline] RIP: 0010:pcmidi_snd_initialise drivers/hid/hid-prodikeys.c:686 [inline] RIP: 0010:pk_probe+0xb51/0xfd0 drivers/hid/hid-prodikeys.c:836 Code: 0f 85 50 04 00 00 48 8b 04 24 4c 89 7d 10 48 8b 58 08 e8 b2 53 e4 fc 48 8b 54 24 20 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 13 04 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b The problem is caused by the fact that pcmidi_get_output_report() will return an error if the HID device doesn't provide the right sort of output report, but pcmidi_set_operational() doesn't bother to check the return code and assumes the function call always succeeds. This patch adds the missing check and aborts the probe operation if necessary. Reported-and-tested-by: [email protected] Signed-off-by: Alan Stern <[email protected]> CC: <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Oct 28, 2019
[ Upstream commit 0ce772f ] The p9_tag_alloc() does not initialize the transport error t_err field. The struct p9_req_t *req is allocated and stored in a struct p9_client variable. The field t_err is never initialized before p9_conn_cancel() checks its value. KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool) reports this bug. ================================================================== BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0 Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216 CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events p9_write_work Call Trace: dump_stack+0x75/0xae __kumsan_report+0x17c/0x3e6 kumsan_report+0xe/0x20 p9_conn_cancel+0x2d9/0x3b0 p9_write_work+0x183/0x4a0 process_one_work+0x4d1/0x8c0 worker_thread+0x6e/0x780 kthread+0x1ca/0x1f0 ret_from_fork+0x35/0x40 Allocated by task 1979: save_stack+0x19/0x80 __kumsan_kmalloc.constprop.3+0xbc/0x120 kmem_cache_alloc+0xa7/0x170 p9_client_prepare_req.part.9+0x3b/0x380 p9_client_rpc+0x15e/0x880 p9_client_create+0x3d0/0xac0 v9fs_session_init+0x192/0xc80 v9fs_mount+0x67/0x470 legacy_get_tree+0x70/0xd0 vfs_get_tree+0x4a/0x1c0 do_mount+0xba9/0xf90 ksys_mount+0xa8/0x120 __x64_sys_mount+0x62/0x70 do_syscall_64+0x6d/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff88805f9b6008 which belongs to the cache p9_req_t of size 144 The buggy address is located 4 bytes inside of 144-byte region [ffff88805f9b6008, ffff88805f9b6098) The buggy address belongs to the page: page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0 flags: 0x100000000010200(slab|head) raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740 raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000 page dumped because: kumsan: bad access detected ================================================================== Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lu Shuaibing <[email protected]> [[email protected]: grouped the added init with the others] Signed-off-by: Dominique Martinet <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
cpaasch
pushed a commit
that referenced
this issue
Oct 28, 2019
commit 98375b8 upstream. The syzbot fuzzer provoked a general protection fault in the hid-prodikeys driver: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.3.0-rc5+ #28 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event RIP: 0010:pcmidi_submit_output_report drivers/hid/hid-prodikeys.c:300 [inline] RIP: 0010:pcmidi_set_operational drivers/hid/hid-prodikeys.c:558 [inline] RIP: 0010:pcmidi_snd_initialise drivers/hid/hid-prodikeys.c:686 [inline] RIP: 0010:pk_probe+0xb51/0xfd0 drivers/hid/hid-prodikeys.c:836 Code: 0f 85 50 04 00 00 48 8b 04 24 4c 89 7d 10 48 8b 58 08 e8 b2 53 e4 fc 48 8b 54 24 20 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 13 04 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b The problem is caused by the fact that pcmidi_get_output_report() will return an error if the HID device doesn't provide the right sort of output report, but pcmidi_set_operational() doesn't bother to check the return code and assumes the function call always succeeds. This patch adds the missing check and aborts the probe operation if necessary. Reported-and-tested-by: [email protected] Signed-off-by: Alan Stern <[email protected]> CC: <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Aug 4, 2020
[ Upstream commit d9d5420 ] We should not trigger a warning when a memory allocation fails. Remove the WARN_ON(). The warning is constantly triggered by syzkaller when it is injecting faults: [ 2230.758664] FAULT_INJECTION: forcing a failure. [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0 [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ multipath-tcp#28 ... [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0 [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ... [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ multipath-tcp#28 [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Fixes: 3057224 ("mlxsw: spectrum_router: Implement FIB offload in deferred work") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Oct 22, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 multipath-tcp#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 multipath-tcp#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 multipath-tcp#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 multipath-tcp#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 multipath-tcp#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 multipath-tcp#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 multipath-tcp#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 multipath-tcp#32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 multipath-tcp#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 multipath-tcp#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 multipath-tcp#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 multipath-tcp#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 multipath-tcp#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 multipath-tcp#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 multipath-tcp#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 multipath-tcp#32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 multipath-tcp#33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 multipath-tcp#35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Nov 10, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Jun 24, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... multipath-tcp#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 multipath-tcp#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 multipath-tcp#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... multipath-tcp#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... multipath-tcp#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 multipath-tcp#45 0x00007f8990968b85 in __getgrnam_r [...] multipath-tcp#46 0x00007f89909d6b77 in grantpt [...] multipath-tcp#47 0x00007f8990a9394e in __GI_openpty [...] multipath-tcp#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 multipath-tcp#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 multipath-tcp#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 multipath-tcp#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 multipath-tcp#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-By: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
pk910
pushed a commit
to pk910/mptcp
that referenced
this issue
Jun 28, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... multipath-tcp#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 multipath-tcp#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 multipath-tcp#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... multipath-tcp#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... multipath-tcp#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 multipath-tcp#45 0x00007f8990968b85 in __getgrnam_r [...] multipath-tcp#46 0x00007f89909d6b77 in grantpt [...] multipath-tcp#47 0x00007f8990a9394e in __GI_openpty [...] multipath-tcp#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 multipath-tcp#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 multipath-tcp#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 multipath-tcp#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 multipath-tcp#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-By: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Jul 5, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 #45 0x00007f8990968b85 in __getgrnam_r [...] #46 0x00007f89909d6b77 in grantpt [...] #47 0x00007f8990a9394e in __GI_openpty [...] #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-By: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Sep 6, 2021
[ Upstream commit b5befe8 ] An srcu_struct structure that is initialized before rcu_init_geometry() will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once rcu_init_geometry() is called, this hierarchy is compressed as needed for the actual maximum number of CPUs for this system. Later on, that srcu_struct structure is confused, sometimes referring to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead to the new num_possible_cpus() hierarchy. For example, each of its ->mynode fields continues to reference the original leaf rcu_node structures, some of which might no longer exist. On the other hand, srcu_for_each_node_breadth_first() traverses to the new node hierarchy. There are at least two bad possible outcomes to this: 1) a) A callback enqueued early on an srcu_data structure (call it *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf (say 3 levels). b) The grace period ends after rcu_init_geometry() shrinks the nodes level to a single one. srcu_gp_end() walks through the new srcu_node hierarchy without ever reaching the old leaves so the callback is never executed. This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests verification never completes and the boot hangs: [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds. [ 5413.147564] Not tainted 5.12.0-rc4+ #28 [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000 [ 5413.168099] Call Trace: [ 5413.170555] __schedule+0x36c/0x930 [ 5413.174057] ? wait_for_completion+0x88/0x110 [ 5413.178423] schedule+0x46/0xf0 [ 5413.181575] schedule_timeout+0x284/0x380 [ 5413.185591] ? wait_for_completion+0x88/0x110 [ 5413.189957] ? mark_held_locks+0x61/0x80 [ 5413.193882] ? mark_held_locks+0x61/0x80 [ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50 [ 5413.202173] ? wait_for_completion+0x88/0x110 [ 5413.206535] wait_for_completion+0xb4/0x110 [ 5413.210724] ? srcu_torture_stats_print+0x110/0x110 [ 5413.215610] srcu_barrier+0x187/0x200 [ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50 [ 5413.224244] ? rdinit_setup+0x2b/0x2b [ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40 [ 5413.232700] do_one_initcall+0x63/0x310 [ 5413.236541] ? rdinit_setup+0x2b/0x2b [ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80 [ 5413.244912] kernel_init_freeable+0x253/0x28f [ 5413.249273] ? rest_init+0x250/0x250 [ 5413.252846] kernel_init+0xa/0x110 [ 5413.256257] ret_from_fork+0x22/0x30 2) An srcu_struct structure that is initialized before rcu_init_geometry() and used afterward will always have stale rdp->mynode references, resulting in callbacks to be missed in srcu_gp_end(), just like in the previous scenario. This commit therefore causes init_srcu_struct_nodes to initialize the geometry, if needed. This ensures that the srcu_node hierarchy is properly built and distributed from the get-go. Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Uladzislau Rezki <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Aug 5, 2022
commit d0be8347c623e0ac4202a1d4e0373882821f56b0 upstream. This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: [email protected] Reported-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Tested-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Aug 24, 2022
commit d0be8347c623e0ac4202a1d4e0373882821f56b0 upstream. This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty multipath-tcp#28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: [email protected] Reported-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Tested-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Sep 2, 2022
commit d0be8347c623e0ac4202a1d4e0373882821f56b0 upstream. This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: [email protected] Reported-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Tested-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Sep 9, 2022
commit 9c80e79906b4ca440d09e7f116609262bb747909 upstream. The assumption in __disable_kprobe() is wrong, and it could try to disarm an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can easily reproduce this issue. 1. Write 0 to /sys/kernel/debug/kprobes/enabled. # echo 0 > /sys/kernel/debug/kprobes/enabled 2. Run execsnoop. At this time, one kprobe is disabled. # /usr/share/bcc/tools/execsnoop & [1] 2460 PCOMM PID PPID RET ARGS # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes kprobes_all_disarmed to false but does not arm the disabled kprobe. # echo 1 > /sys/kernel/debug/kprobes/enabled # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace(). # fg /usr/share/bcc/tools/execsnoop ^C Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses some cleanups and leaves the aggregated kprobe in the hash table. Then, __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an infinite loop like this. aggregated kprobe.list -> kprobe.list -. ^ | '.__.' In this situation, these commands fall into the infinite loop and result in RCU stall or soft lockup. cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the infinite loop with RCU. /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex, and __get_valid_kprobe() is stuck in the loop. To avoid the issue, make sure we don't call disarm_kprobe() for disabled kprobes. [0] Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2) WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Modules linked in: ena CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ multipath-tcp#28 Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94 RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40 R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000 FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> __disable_kprobe (kernel/kprobes.c:1716) disable_kprobe (kernel/kprobes.c:2392) __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340) disable_trace_kprobe (kernel/trace/trace_kprobe.c:429) perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168) perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295) _free_event (kernel/events/core.c:4971) perf_event_release_kernel (kernel/events/core.c:5176) perf_release (kernel/events/core.c:5186) __fput (fs/file_table.c:321) task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1)) exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201) syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:87) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7fe7ff210654 Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008 RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30 R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560 </TASK> Link: https://lkml.kernel.org/r/[email protected] Fixes: 69d54b9 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reported-by: Ayushman Dutta <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Wang Nan <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Ayushman Dutta <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh
pushed a commit
to dreibh/mptcp
that referenced
this issue
Sep 9, 2022
commit 9c80e79906b4ca440d09e7f116609262bb747909 upstream. The assumption in __disable_kprobe() is wrong, and it could try to disarm an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can easily reproduce this issue. 1. Write 0 to /sys/kernel/debug/kprobes/enabled. # echo 0 > /sys/kernel/debug/kprobes/enabled 2. Run execsnoop. At this time, one kprobe is disabled. # /usr/share/bcc/tools/execsnoop & [1] 2460 PCOMM PID PPID RET ARGS # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes kprobes_all_disarmed to false but does not arm the disabled kprobe. # echo 1 > /sys/kernel/debug/kprobes/enabled # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace(). # fg /usr/share/bcc/tools/execsnoop ^C Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses some cleanups and leaves the aggregated kprobe in the hash table. Then, __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an infinite loop like this. aggregated kprobe.list -> kprobe.list -. ^ | '.__.' In this situation, these commands fall into the infinite loop and result in RCU stall or soft lockup. cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the infinite loop with RCU. /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex, and __get_valid_kprobe() is stuck in the loop. To avoid the issue, make sure we don't call disarm_kprobe() for disabled kprobes. [0] Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2) WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Modules linked in: ena CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ multipath-tcp#28 Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94 RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40 R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000 FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> __disable_kprobe (kernel/kprobes.c:1716) disable_kprobe (kernel/kprobes.c:2392) __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340) disable_trace_kprobe (kernel/trace/trace_kprobe.c:429) perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168) perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295) _free_event (kernel/events/core.c:4971) perf_event_release_kernel (kernel/events/core.c:5176) perf_release (kernel/events/core.c:5186) __fput (fs/file_table.c:321) task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1)) exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201) syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:87) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7fe7ff210654 Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008 RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30 R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560 </TASK> Link: https://lkml.kernel.org/r/[email protected] Fixes: 69d54b9 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reported-by: Ayushman Dutta <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Wang Nan <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Ayushman Dutta <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
matttbe
pushed a commit
that referenced
this issue
Oct 7, 2022
commit 9c80e79906b4ca440d09e7f116609262bb747909 upstream. The assumption in __disable_kprobe() is wrong, and it could try to disarm an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can easily reproduce this issue. 1. Write 0 to /sys/kernel/debug/kprobes/enabled. # echo 0 > /sys/kernel/debug/kprobes/enabled 2. Run execsnoop. At this time, one kprobe is disabled. # /usr/share/bcc/tools/execsnoop & [1] 2460 PCOMM PID PPID RET ARGS # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes kprobes_all_disarmed to false but does not arm the disabled kprobe. # echo 1 > /sys/kernel/debug/kprobes/enabled # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace(). # fg /usr/share/bcc/tools/execsnoop ^C Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses some cleanups and leaves the aggregated kprobe in the hash table. Then, __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an infinite loop like this. aggregated kprobe.list -> kprobe.list -. ^ | '.__.' In this situation, these commands fall into the infinite loop and result in RCU stall or soft lockup. cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the infinite loop with RCU. /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex, and __get_valid_kprobe() is stuck in the loop. To avoid the issue, make sure we don't call disarm_kprobe() for disabled kprobes. [0] Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2) WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Modules linked in: ena CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28 Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94 RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40 R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000 FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> __disable_kprobe (kernel/kprobes.c:1716) disable_kprobe (kernel/kprobes.c:2392) __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340) disable_trace_kprobe (kernel/trace/trace_kprobe.c:429) perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168) perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295) _free_event (kernel/events/core.c:4971) perf_event_release_kernel (kernel/events/core.c:5176) perf_release (kernel/events/core.c:5186) __fput (fs/file_table.c:321) task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1)) exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201) syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:87) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7fe7ff210654 Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008 RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30 R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560 </TASK> Link: https://lkml.kernel.org/r/[email protected] Fixes: 69d54b9 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reported-by: Ayushman Dutta <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Wang Nan <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Ayushman Dutta <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
matttbe
pushed a commit
that referenced
this issue
May 19, 2023
[ Upstream commit e8b74453555872851bdd7ea43a7c0ec39659834f ] For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b42961 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <[email protected]> Co-developed-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
May 19, 2023
[ Upstream commit e8b74453555872851bdd7ea43a7c0ec39659834f ] For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b42961 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <[email protected]> Co-developed-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
matttbe
pushed a commit
that referenced
this issue
May 19, 2023
[ Upstream commit e8b74453555872851bdd7ea43a7c0ec39659834f ] For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b42961 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <[email protected]> Co-developed-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
[ 2465.647829] TCP: mptcp_fallback_infinite 0x462ef097 will fallback - pi 1, src 122.228.228.223 dst 220.181.112.207 from tcp_rcv_established+0x439/0x543
[ 5342.803548] perf samples too long (5007 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[ 8588.902154] device eth0 entered promiscuous mode
[ 8593.551460] device eth0 left promiscuous mode
[ 8605.082309] device eth1 entered promiscuous mode
[ 8606.025251] device eth1 left promiscuous mode
[ 8655.332253] device eth0 entered promiscuous mode
[ 8656.104308] device eth0 left promiscuous mode
[22277.587875] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
[22277.588588] IP: [] mptcp_reqsk_find_tk+0x73/0x83
[22277.588960] PGD 0
[22277.589310] Oops: 0000 [#1] SMP
[22277.589667] Modules linked in: xt_TCPMSS xt_tcpudp iptable_mangle ip_tables x_tables autofs4 ipmi_devintf ipmi_si ipmi_msghandler cpufreq_ondemand be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi ext3 jbd dm_mirror dm_multipath scsi_dh video sbs sbshc battery ac mptcp_fullmesh tcp_cubic parport_pc lp parport sg pcspkr igb i2c_algo_bit i2c_core serio_raw button tpm_tis tpm tpm_bios rtc_cmos i7core_edac ehci_pci edac_core acpi_cpufreq freq_table mperf dm_region_hash dm_log dm_mod ahci libahci libata cciss shpchp hpsa sd_mod scsi_mod ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
[22277.593543] CPU: 6 PID: 0 Comm: swapper/6 Tainted: P 3.11.3 #25
[22277.593902] Hardware name: HP ProLiant DL160 G6 , BIOS O33 02/01/2012
[22277.594262] task: ffff880625926440 ti: ffff880625928000 task.ti: ffff880625928000
[22277.594949] RIP: 0010:[] [] mptcp_reqsk_find_tk+0x73/0x83
[22277.595631] RSP: 0018:ffff880c3e6037a8 EFLAGS: 00010246
[22277.595982] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000110bb7ac
[22277.596333] RDX: ffff880c3e60377c RSI: 00000000f77447d7 RDI: 00000000383f1ffe
[22277.596687] RBP: ffff880c3e6037b8 R08: 000000009fac8640 R09: 0000000052583c62
[22277.597042] R10: 0000000071b68f3c R11: 00000000ab7c7690 R12: 00000000383f1ffe
[22277.597403] R13: ffff880c3e603828 R14: 0000000000000000 R15: ffff880c16e590c0
[22277.597764] FS: 0000000000000000(0000) GS:ffff880c3e600000(0000) knlGS:0000000000000000
[22277.598448] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[22277.598803] CR2: 0000000000000014 CR3: 00000006080dd000 CR4: 00000000000007e0
[22277.599156] Stack:
[22277.607685] ffff880c22c3ff00 ffff880c16e590c0 ffff880c3e6037e8 ffffffff81456ef8
[22277.608394] ffffffff81456df5 ffff880c16e590e8 ffff880c3e603828 ffff880c22c3ff00
[22277.609103] ffff880c3e603938 ffffffff813d5c74 ffff880c250b38a0 ffff880c20740040
[22277.609813] Call Trace:
[22277.610156]
[22277.610169]
[22277.610518] [] mptcp_reqsk_new_mptcp+0x103/0x1db
[22277.610874] [] ? mptcp_reqsk_destructor+0x1af/0x1af
[22277.611236] [] tcp_v4_conn_request+0x3b2/0xbec
[22277.611588] [] ? _raw_spin_lock+0x6c/0x75
[22277.611945] [] ? trace_hardirqs_on+0xd/0xf
[22277.612304] [] ? local_bh_enable+0xb2/0xba
[22277.612659] [] ? dev_queue_xmit+0x6fe/0x76f
[22277.613008] [] ? dev_add_pack+0x56/0x56
[22277.613356] [] ? ip_finish_output+0x5da/0x659
[22277.613703] [] ? ip_finish_output+0x2b9/0x659
[22277.614061] [] tcp_v6_conn_request+0x48/0x66d
[22277.614414] [] ? check_leaf+0x98/0x22b
[22277.614769] [] ? fib_table_lookup+0x460/0x480
[22277.615129] [] tcp_rcv_state_process+0x80/0xe26
[22277.615485] [] ? __inet_lookup_established+0x37a/0x39b
[22277.615844] [] ? __inet_check_established+0x29e/0x29e
[22277.616206] [] ? tcp_v4_hnd_req+0x101/0x194
[22277.616564] [] ? tcp_v4_rcv+0x55b/0xaa2
[22277.616916] [] tcp_v4_do_rcv+0x59e/0x610
[22277.617266] [] ? _raw_spin_lock_nested+0x6a/0x73
[22277.617614] [] tcp_v4_rcv+0x583/0xaa2
[22277.617962] [] ? ip_local_deliver_finish+0x3e/0x3c1
[22277.618316] [] ip_local_deliver_finish+0x258/0x3c1
[22277.618664] [] ? ip_local_deliver_finish+0x3e/0x3c1
[22277.619019] [] ip_local_deliver+0x73/0x7b
[22277.619374] [] ip_rcv_finish+0x58f/0x5a7
[22277.619726] [] ip_rcv+0x28c/0x2cc
[22277.620082] [] __netif_receive_skb_core+0x988/0xa53
[22277.620436] [] ? __netif_receive_skb_core+0x25e/0xa53
[22277.620784] [] __netif_receive_skb+0x5c/0x65
[22277.621129] [] netif_receive_skb+0x1a8/0x1b3
[22277.621474] [] napi_gro_receive+0x3b/0x78
[22277.621828] [] igb_poll+0xae3/0xdab [igb]
[22277.622180] [] net_rx_action+0xbe/0x280
[22277.622532] [] __do_softirq+0x180/0x359
[22277.622887] [] irq_exit+0x54/0x62
[22277.623246] [] do_IRQ+0x9b/0xb2
[22277.623603] [] common_interrupt+0x6f/0x6f
[22277.623957]
[22277.623970]
[22277.624316] [] ? cpuidle_enter_state+0x43/0xb0
[22277.624671] [] ? cpuidle_enter_state+0x47/0xb0
[22277.625029] [] ? cpuidle_enter_state+0x43/0xb0
[22277.625388] [] cpuidle_idle_call+0x190/0x2e5
[22277.625748] [] arch_cpu_idle+0x9/0x1d
[22277.626098] [] cpu_startup_entry+0x225/0x2d8
[22277.626447] [] start_secondary+0x1c0/0x1c5
[22277.626792] Code: 74 1e e8 b0 4f c6 ff 85 c0 75 15 e8 db 4f c6 ff 84 c0 74 0c 48 c7 c7 80 64 a3 81 e8 ac 3f c3 ff 48 89 d8 a8 01 74 04 31 c0 eb 0b <44> 3b 63 14 75 b3 b8 01 00 00 00 5b 41 5c c9 c3 55 48 8b 97 90
[22277.628158] RIP [] mptcp_reqsk_find_tk+0x73/0x83
[22277.628517] RSP
[22277.628863] CR2: 0000000000000014
crash>
The text was updated successfully, but these errors were encountered: