Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report split between the beginning and end of the buffer #16

Closed
matt-auld opened this issue Feb 16, 2016 · 1 comment
Closed

Report split between the beginning and end of the buffer #16

matt-auld opened this issue Feb 16, 2016 · 1 comment

Comments

@matt-auld
Copy link

Seems to be pretty damn rare, and only tends to happen when reading OA reports for a prolonged period of time. Reproducible on oa-nextrunning on BDW.

[  534.949540] kernel BUG at drivers/gpu/drm/i915/i915_perf.c:193!
[  534.949558] invalid opcode: 0000 [#1] SMP 
[  534.949573] Modules linked in: rfcomm fuse ccm cmac xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_security ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_security iptable_mangle bnep joydev hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_als hid_sensor_rotation hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer arc4 snd_hda_codec_hdmi hid_multitouch hid_sensor_hub iwlmvm snd_soc_sst_broadwell intel_rapl
[  534.949819]  x86_pkg_temp_thermal coretemp kvm_intel snd_soc_sst_haswell_pcm vfat mac80211 fat kvm iTCO_wdt iTCO_vendor_support snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_rt286 ppdev snd_soc_rl6347a snd_soc_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel iwlwifi snd_hda_intel snd_hda_codec cfg80211 snd_hwdep snd_hda_core uvcvideo snd_seq btusb videobuf2_vmalloc snd_seq_device videobuf2_memops btrtl i2c_i801 videobuf2_v4l2 videobuf2_core btbcm snd_pcm v4l2_common btintel bluetooth lpc_ich videodev mei_me mei media snd_timer cdc_acm snd processor_thermal_device intel_soc_dts_iosf iosf_mbi shpchp soundcore acpi_als kfifo_buf industrialio winbond_cir soc_button_array rc_core int3403_thermal tpm_crb dw_dmac tpm parport_pc parport int3402_thermal rfkill_gpio i2c_designware_platform int3400_thermal
[  534.950076]  i2c_designware_core rfkill snd_soc_sst_acpi dw_dmac_core acpi_thermal_rel acpi_pad int340x_thermal_zone nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 i2c_algo_bit drm_kms_helper drm e1000e serio_raw sdhci_pci ptp sdhci_acpi pps_core sdhci 8021q mmc_core garp stp llc video mrp i2c_hid cdc_mbim cdc_wdm cdc_ncm usbnet
[  534.950190] CPU: 3 PID: 3312 Comm: glxgears Not tainted 4.4.0-drm-intel-matthew-old+ #9
[  534.950213] Hardware name: Intel Corporation Broadwell Client platform/Wilson Beach SDS, BIOS BDW-E2R1.86C.0095.R09.1410300006 10/30/2014
[  534.950246] task: ffff88023c055700 ti: ffff88008cb04000 task.ti: ffff88008cb04000
[  534.950266] RIP: 0010:[<ffffffffa02732e6>]  [<ffffffffa02732e6>] gen8_oa_read+0x186/0x260 [i915]
[  534.950314] RSP: 0018:ffff88008cb07d88  EFLAGS: 00010287
[  534.950329] RAX: 00000000000000c0 RBX: 00000000000f28dd RCX: 0000000000000000
[  534.950349] RDX: 0000000000000100 RSI: 0000000000000000 RDI: 000055d11d540b98
[  534.950369] RBP: ffff88008cb07dd0 R08: 0000000000ffff40 R09: ffff88023ef45a00
[  534.950388] R10: ffffc90004801000 R11: ffff88023c00cb00 R12: 00000000ff000000
[  534.950408] R13: 0000000000000100 R14: 0000000000ffff40 R15: ffff88023d550000
[  534.950428] FS:  00007f15cd36c700(0000) GS:ffff88024dcc0000(0000) knlGS:0000000000000000
[  534.950450] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  534.950466] CR2: 00007f7bd2de7018 CR3: 0000000200cd0000 CR4: 00000000003406e0
[  534.950485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  534.950505] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  534.950524] Stack:
[  534.950531]  ffff88023d550000 ffff88008cb07df8 ffffc90004801000 ffff88023ef45a00
[  534.950555]  ffff88023ef45a00 ffff880090a68f00 ffff88023d55a028 ffff88023d550000
[  534.950580]  000055d11d540880 ffff88008cb07de0 ffffffffa0272582 ffff88008cb07e40
[  534.950604] Call Trace:
[  534.950629]  [<ffffffffa0272582>] i915_oa_read+0x12/0x20 [i915]
[  534.950661]  [<ffffffffa0272d2c>] i915_perf_read+0x7c/0xf0 [i915]
[  534.950681]  [<ffffffff81222567>] __vfs_read+0x37/0x100
[  534.950697]  [<ffffffff81328590>] ? security_file_permission+0xa0/0xc0
[  534.950716]  [<ffffffff81222e96>] vfs_read+0x86/0x130
[  534.950731]  [<ffffffff81223be5>] SyS_read+0x55/0xc0
[  534.950746]  [<ffffffff81235e63>] ? SyS_ioctl+0x63/0x90
[  534.950763]  [<ffffffff817d1cee>] entry_SYSCALL_64_fastpath+0x12/0x71
[  534.950781] Code: 00 25 ff ff ff 00 74 25 44 39 e8 72 20 45 89 f0 b8 00 00 00 01 45 29 ec 41 81 e0 ff ff ff 00 44 29 c0 44 39 e8 0f 83 60 ff ff ff <0f> 0b 45 89 f5 41 8b 97 38 a1 00 00 4c 8b 75 b8 44 01 ea 4c 89 
[  534.950896] RIP  [<ffffffffa02732e6>] gen8_oa_read+0x186/0x260 [i915]
[  534.950929]  RSP <ffff88008cb07d88>
matt-auld pushed a commit to matt-auld/linux that referenced this issue Feb 29, 2016
The current bridge code calls switchdev_port_obj_del on a VLAN port even
if the corresponding switchdev_port_obj_add call returned -EOPNOTSUPP.

If the DSA driver doesn't return -EOPNOTSUPP for a software port VLAN in
its port_vlan_del function, the VLAN is not deleted. Unbridging the port
also generates a stack trace for the same reason.

This can be quickly tested on a VLAN filtering enabled system with:

    # brctl addbr br0
    # brctl addif br0 lan0
    # brctl addbr br1
    # brctl addif br1 lan1
    # brctl delif br1 lan1

Both bridges have a default default_pvid set to 1. lan0 uses the
hardware VLAN 1 while lan1 falls back to the software VLAN 1.

Unbridging lan1 does not delete its software VLAN, and thus generates
the following stack trace:

    [ 2991.681705] device lan1 left promiscuous mode
    [ 2991.686237] br1: port 1(lan1) entered disabled state
    [ 2991.725094] ------------[ cut here ]------------
    [ 2991.729761] WARNING: CPU: 0 PID: 869 at net/bridge/br_vlan.c:314 __vlan_group_free+0x4c/0x50()
    [ 2991.738437] Modules linked in:
    [ 2991.741546] CPU: 0 PID: 869 Comm: ip Not tainted 4.4.0 rib#16
    [ 2991.747039] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
    [ 2991.753511] Backtrace:
    [ 2991.756008] [<80014450>] (dump_backtrace) from [<8001469c>] (show_stack+0x20/0x24)
    [ 2991.763604]  r6:80512644 r5:00000009 r4:00000000 r3:00000000
    [ 2991.769343] [<8001467c>] (show_stack) from [<80268e44>] (dump_stack+0x24/0x28)
    [ 2991.776618] [<80268e20>] (dump_stack) from [<80025568>] (warn_slowpath_common+0x98/0xc4)
    [ 2991.784750] [<800254d0>] (warn_slowpath_common) from [<80025650>] (warn_slowpath_null+0x2c/0x34)
    [ 2991.793557]  r8:00000000 r7:9f786a8c r6:9f76c440 r5:9f786a00 r4:9f68ac00
    [ 2991.800366] [<80025624>] (warn_slowpath_null) from [<80512644>] (__vlan_group_free+0x4c/0x50)
    [ 2991.808946] [<805125f8>] (__vlan_group_free) from [<80514488>] (nbp_vlan_flush+0x44/0x68)
    [ 2991.817147]  r4:9f68ac00 r3:9ec70000
    [ 2991.820772] [<80514444>] (nbp_vlan_flush) from [<80506f08>] (del_nbp+0xac/0x130)
    [ 2991.828201]  r5:9f56f800 r4:9f786a00
    [ 2991.831841] [<80506e5c>] (del_nbp) from [<8050774c>] (br_del_if+0x40/0xbc)
    [ 2991.838724]  r7:80590f68 r6:00000000 r5:9ec71c38 r4:9f76c440
    [ 2991.844475] [<8050770c>] (br_del_if) from [<80503dc0>] (br_del_slave+0x1c/0x20)
    [ 2991.851802]  r5:9ec71c38 r4:9f56f800
    [ 2991.855428] [<80503da4>] (br_del_slave) from [<80484a34>] (do_setlink+0x324/0x7b8)
    [ 2991.863043] [<80484710>] (do_setlink) from [<80485e90>] (rtnl_newlink+0x508/0x6f4)
    [ 2991.870616]  r10:00000000 r9:9ec71ba8 r8:00000000 r7:00000000 r6:9f6b0400 r5:9f56f800
    [ 2991.878548]  r4:8076278c
    [ 2991.881110] [<80485988>] (rtnl_newlink) from [<80484048>] (rtnetlink_rcv_msg+0x18c/0x22c)
    [ 2991.889315]  r10:9f7d4e40 r9:00000000 r8:00000000 r7:00000000 r6:9f7d4e40 r5:9f6b0400
    [ 2991.897250]  r4:00000000
    [ 2991.899814] [<80483ebc>] (rtnetlink_rcv_msg) from [<80497c74>] (netlink_rcv_skb+0xb0/0xcc)
    [ 2991.908104]  r8:00000000 r7:9f7d4e40 r6:9f7d4e40 r5:80483ebc r4:9f6b0400
    [ 2991.914928] [<80497bc4>] (netlink_rcv_skb) from [<80483eb4>] (rtnetlink_rcv+0x34/0x3c)
    [ 2991.922874]  r6:9f5ea000 r5:00000028 r4:9f7d4e40 r3:80483e80
    [ 2991.928622] [<80483e80>] (rtnetlink_rcv) from [<80497604>] (netlink_unicast+0x180/0x200)
    [ 2991.936742]  r4:9f4edc00 r3:80483e80
    [ 2991.940362] [<80497484>] (netlink_unicast) from [<80497a88>] (netlink_sendmsg+0x33c/0x350)
    [ 2991.948648]  r8:00000000 r7:00000028 r6:00000000 r5:9f5ea000 r4:9ec71f4c
    [ 2991.955481] [<8049774c>] (netlink_sendmsg) from [<80457ff0>] (sock_sendmsg+0x24/0x34)
    [ 2991.963342]  r10:00000000 r9:9ec71e28 r8:00000000 r7:9f1e2140 r6:00000000 r5:00000000
    [ 2991.971276]  r4:9ec71f4c
    [ 2991.973849] [<80457fcc>] (sock_sendmsg) from [<80458af0>] (___sys_sendmsg+0x1fc/0x204)
    [ 2991.981809] [<804588f4>] (___sys_sendmsg) from [<804598d0>] (__sys_sendmsg+0x4c/0x7c)
    [ 2991.989640]  r10:00000000 r9:9ec70000 r8:80010824 r7:00000128 r6:7ee946c4 r5:00000000
    [ 2991.997572]  r4:9f1e2140
    [ 2992.000128] [<80459884>] (__sys_sendmsg) from [<80459918>] (SyS_sendmsg+0x18/0x1c)
    [ 2992.007725]  r6:00000000 r5:7ee9c7b8 r4:7ee946e0
    [ 2992.012430] [<80459900>] (SyS_sendmsg) from [<80010660>] (ret_fast_syscall+0x0/0x3c)
    [ 2992.020182] ---[ end trace 5d4bc29f4da04280 ]---

To fix this, return -EOPNOTSUPP in _mv88e6xxx_port_vlan_del instead of
-ENOENT if the hardware VLAN doesn't exist or the port is not a member.

Signed-off-by: Vivien Didelot <[email protected]>
Tested-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
@rib
Copy link
Owner

rib commented Mar 24, 2016

With the number of i915 perf changes since this was reported, I'm closing this for now while I at least haven't seen anything like this on SKL. Might need to re-open if this crops up a gain with more BDW testing.

@rib rib closed this as completed Mar 24, 2016
matt-auld pushed a commit to matt-auld/linux that referenced this issue Jan 26, 2017
Andrey Konovalov reports that fuzz testing with syzkaller causes a
KASAN use-after-free bug report in gadgetfs:

BUG: KASAN: use-after-free in gadgetfs_setup+0x208a/0x20e0 at addr ffff88003dfe5bf2
Read of size 2 by task syz-executor0/22994
CPU: 3 PID: 22994 Comm: syz-executor0 Not tainted 4.9.0-rc7+ rib#16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88006df06a18 ffffffff81f96aba ffffffffe0528500 1ffff1000dbe0cd6
 ffffed000dbe0cce ffff88006df068f0 0000000041b58ab3 ffffffff8598b4c8
 ffffffff81f96828 1ffff1000dbe0ccd ffff88006df06708 ffff88006df06748
Call Trace:
 <IRQ> [  201.343209]  [<     inline     >] __dump_stack lib/dump_stack.c:15
 <IRQ> [  201.343209]  [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
 [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
 [<     inline     >] print_address_description mm/kasan/report.c:197
 [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
 [<     inline     >] kasan_report mm/kasan/report.c:306
 [<ffffffff817e562a>] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:337
 [<     inline     >] config_buf drivers/usb/gadget/legacy/inode.c:1298
 [<ffffffff8322c8fa>] gadgetfs_setup+0x208a/0x20e0 drivers/usb/gadget/legacy/inode.c:1368
 [<ffffffff830fdcd0>] dummy_timer+0x11f0/0x36d0 drivers/usb/gadget/udc/dummy_hcd.c:1858
 [<ffffffff814807c1>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
 [<     inline     >] expire_timers kernel/time/timer.c:1348
 [<ffffffff81482de6>] __run_timers+0xa06/0xec0 kernel/time/timer.c:1641
 [<ffffffff814832c1>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
 [<ffffffff84f4af8b>] __do_softirq+0x2fb/0xb63 kernel/softirq.c:284

The cause of the bug is subtle.  The dev_config() routine gets called
twice by the fuzzer.  The first time, the user data contains both a
full-speed configuration descriptor and a high-speed config
descriptor, causing dev->hs_config to be set.  But it also contains an
invalid device descriptor, so the buffer containing the descriptors is
deallocated and dev_config() returns an error.

The second time dev_config() is called, the user data contains only a
full-speed config descriptor.  But dev->hs_config still has the stale
pointer remaining from the first call, causing the routine to think
that there is a valid high-speed config.  Later on, when the driver
dereferences the stale pointer to copy that descriptor, we get a
use-after-free access.

The fix is simple: Clear dev->hs_config if the passed-in data does not
contain a high-speed config descriptor.

Signed-off-by: Alan Stern <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
CC: <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
rib pushed a commit that referenced this issue Feb 10, 2017
Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
(4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
to mount the server with krb5 where the server doesn't have the
rpcsec_gss_krb5 module built."

The problem is that rsci.cred is copied from a svc_cred structure that
gss_proxy didn't properly initialize.  Fix that.

[120408.542387] general protection fault: 0000 [#1] SMP
...
[120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
[120408.567037] Hardware name: VMware, Inc. VMware Virtual =
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
[120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
...
[120408.584946]  ? rsc_free+0x55/0x90 [auth_rpcgss]
[120408.585901]  gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
[120408.587017]  svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
[120408.588257]  ? __enqueue_entity+0x6c/0x70
[120408.589101]  svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
[120408.590212]  ? try_to_wake_up+0x4a/0x360
[120408.591036]  ? wake_up_process+0x15/0x20
[120408.592093]  ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
[120408.593177]  svc_authenticate+0xe1/0x100 [sunrpc]
[120408.594168]  svc_process_common+0x203/0x710 [sunrpc]
[120408.595220]  svc_process+0x105/0x1c0 [sunrpc]
[120408.596278]  nfsd+0xe9/0x160 [nfsd]
[120408.597060]  kthread+0x101/0x140
[120408.597734]  ? nfsd_destroy+0x60/0x60 [nfsd]
[120408.598626]  ? kthread_park+0x90/0x90
[120408.599448]  ret_from_fork+0x22/0x30

Fixes: 1d65833 "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
Cc: [email protected]
Cc: Simo Sorce <[email protected]>
Reported-by: Olga Kornievskaia <[email protected]>
Tested-by: Olga Kornievskaia <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this issue Mar 17, 2017
The system may panic when initialisation is done when almost all the
memory is assigned to the huge pages using the kernel command line
parameter hugepage=xxxx.  Panic may occur like this:

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc000000000302b88
  Oops: Kernel access of bad area, sig: 11 [rib#1]
  SMP NR_CPUS=2048 [    0.082424] NUMA
  pSeries
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-15-generic rib#16-Ubuntu
  task: c00000021ed01600 task.stack: c00000010d108000
  NIP: c000000000302b88 LR: c000000000270e04 CTR: c00000000016cfd0
  REGS: c00000010d10b2c0 TRAP: 0300   Not tainted (4.9.0-15-generic)
  MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>[ 0.082770]   CR: 28424422  XER: 00000000
  CFAR: c0000000003d28b8 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
  GPR00: c000000000270e04 c00000010d10b540 c00000000141a300 c00000010fff6300
  GPR04: 0000000000000000 00000000026012c0 c00000010d10b630 0000000487ab0000
  GPR08: 000000010ee90000 c000000001454fd8 0000000000000000 0000000000000000
  GPR12: 0000000000004400 c00000000fb80000 00000000026012c0 00000000026012c0
  GPR16: 00000000026012c0 0000000000000000 0000000000000000 0000000000000002
  GPR20: 000000000000000c 0000000000000000 0000000000000000 00000000024200c0
  GPR24: c0000000016eef48 0000000000000000 c00000010fff7d00 00000000026012c0
  GPR28: 0000000000000000 c00000010fff7d00 c00000010fff6300 c00000010d10b6d0
  NIP mem_cgroup_soft_limit_reclaim+0xf8/0x4f0
  LR do_try_to_free_pages+0x1b4/0x450
  Call Trace:
    do_try_to_free_pages+0x1b4/0x450
    try_to_free_pages+0xf8/0x270
    __alloc_pages_nodemask+0x7a8/0xff0
    new_slab+0x104/0x8e0
    ___slab_alloc+0x620/0x700
    __slab_alloc+0x34/0x60
    kmem_cache_alloc_node_trace+0xdc/0x310
    mem_cgroup_init+0x158/0x1c8
    do_one_initcall+0x68/0x1d0
    kernel_init_freeable+0x278/0x360
    kernel_init+0x24/0x170
    ret_from_kernel_thread+0x5c/0x74
  Instruction dump:
  eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 3d220004
  3929acd8 794a1f24 7d295214 eac90100 <e9360000> 2fa90000 419eff74 3b200000
  ---[ end trace 342f5208b00d01b6 ]---

This is a chicken and egg issue where the kernel try to get free memory
when allocating per node data in mem_cgroup_init(), but in that path
mem_cgroup_soft_limit_reclaim() is called which assumes that these data
are allocated.

As mem_cgroup_soft_limit_reclaim() is best effort, it should return when
these data are not yet allocated.

This patch also fixes potential null pointer access in
mem_cgroup_remove_from_trees() and mem_cgroup_update_tree().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laurent Dufour <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
rib pushed a commit that referenced this issue Mar 23, 2017
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
torvalds#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Apr 26, 2017
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 rib#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
rib#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
rib#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
rib#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
rib#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
rib#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
rib#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
rib#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
rib#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
rib#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
rib#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
rib#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
rib#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
rib#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Apr 26, 2017
commit 9f88eb4 upstream.

When re-adding crash kernel memory within setup_resources() the
function memblock_add() is used. That function will add memory by
default to node "MAX_NUMNODES" instead of node 0, like the memory
detection code does. In case of !NUMA this will trigger this warning
when the kernel generates the vmemmap:

Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead
WARNING: CPU: 0 PID: 0 at mm/memblock.c:1261 memblock_virt_alloc_internal+0x76/0x220
CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc6 rib#16
Call Trace:
 [<0000000000d0b2e8>] memblock_virt_alloc_try_nid+0x88/0xc8
 [<000000000083c8ea>] __earlyonly_bootmem_alloc.constprop.1+0x42/0x50
 [<000000000083e7f4>] vmemmap_populate+0x1ac/0x1e0
 [<0000000000840136>] sparse_mem_map_populate+0x46/0x68
 [<0000000000d0c59c>] sparse_init+0x184/0x238
 [<0000000000cf45f6>] paging_init+0xbe/0xf8
 [<0000000000cf1d4a>] setup_arch+0xa02/0xae0
 [<0000000000ced75a>] start_kernel+0x72/0x450
 [<0000000000100020>] _stext+0x20/0x80

If NUMA is selected numa_setup_memory() will fix the node assignments
before the vmemmap will be populated; so this warning will only appear
if NUMA is not selected.

To fix this simply use memblock_add_node() and re-add crash kernel
memory explicitly to node 0.

Reported-and-tested-by: Christian Borntraeger <[email protected]>
Fixes: 4e042af ("s390/kexec: fix crash on resize of reserved memory")
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Apr 26, 2017
commit add333a upstream.

Andrey Konovalov reports that fuzz testing with syzkaller causes a
KASAN use-after-free bug report in gadgetfs:

BUG: KASAN: use-after-free in gadgetfs_setup+0x208a/0x20e0 at addr ffff88003dfe5bf2
Read of size 2 by task syz-executor0/22994
CPU: 3 PID: 22994 Comm: syz-executor0 Not tainted 4.9.0-rc7+ rib#16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88006df06a18 ffffffff81f96aba ffffffffe0528500 1ffff1000dbe0cd6
 ffffed000dbe0cce ffff88006df068f0 0000000041b58ab3 ffffffff8598b4c8
 ffffffff81f96828 1ffff1000dbe0ccd ffff88006df06708 ffff88006df06748
Call Trace:
 <IRQ> [  201.343209]  [<     inline     >] __dump_stack lib/dump_stack.c:15
 <IRQ> [  201.343209]  [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
 [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
 [<     inline     >] print_address_description mm/kasan/report.c:197
 [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
 [<     inline     >] kasan_report mm/kasan/report.c:306
 [<ffffffff817e562a>] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:337
 [<     inline     >] config_buf drivers/usb/gadget/legacy/inode.c:1298
 [<ffffffff8322c8fa>] gadgetfs_setup+0x208a/0x20e0 drivers/usb/gadget/legacy/inode.c:1368
 [<ffffffff830fdcd0>] dummy_timer+0x11f0/0x36d0 drivers/usb/gadget/udc/dummy_hcd.c:1858
 [<ffffffff814807c1>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
 [<     inline     >] expire_timers kernel/time/timer.c:1348
 [<ffffffff81482de6>] __run_timers+0xa06/0xec0 kernel/time/timer.c:1641
 [<ffffffff814832c1>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
 [<ffffffff84f4af8b>] __do_softirq+0x2fb/0xb63 kernel/softirq.c:284

The cause of the bug is subtle.  The dev_config() routine gets called
twice by the fuzzer.  The first time, the user data contains both a
full-speed configuration descriptor and a high-speed config
descriptor, causing dev->hs_config to be set.  But it also contains an
invalid device descriptor, so the buffer containing the descriptors is
deallocated and dev_config() returns an error.

The second time dev_config() is called, the user data contains only a
full-speed config descriptor.  But dev->hs_config still has the stale
pointer remaining from the first call, causing the routine to think
that there is a valid high-speed config.  Later on, when the driver
dereferences the stale pointer to copy that descriptor, we get a
use-after-free access.

The fix is simple: Clear dev->hs_config if the passed-in data does not
contain a high-speed config descriptor.

Signed-off-by: Alan Stern <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Apr 26, 2017
commit 034dd34 upstream.

Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
(4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
to mount the server with krb5 where the server doesn't have the
rpcsec_gss_krb5 module built."

The problem is that rsci.cred is copied from a svc_cred structure that
gss_proxy didn't properly initialize.  Fix that.

[120408.542387] general protection fault: 0000 [rib#1] SMP
...
[120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ rib#16
[120408.567037] Hardware name: VMware, Inc. VMware Virtual =
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
[120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
...
[120408.584946]  ? rsc_free+0x55/0x90 [auth_rpcgss]
[120408.585901]  gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
[120408.587017]  svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
[120408.588257]  ? __enqueue_entity+0x6c/0x70
[120408.589101]  svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
[120408.590212]  ? try_to_wake_up+0x4a/0x360
[120408.591036]  ? wake_up_process+0x15/0x20
[120408.592093]  ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
[120408.593177]  svc_authenticate+0xe1/0x100 [sunrpc]
[120408.594168]  svc_process_common+0x203/0x710 [sunrpc]
[120408.595220]  svc_process+0x105/0x1c0 [sunrpc]
[120408.596278]  nfsd+0xe9/0x160 [nfsd]
[120408.597060]  kthread+0x101/0x140
[120408.597734]  ? nfsd_destroy+0x60/0x60 [nfsd]
[120408.598626]  ? kthread_park+0x90/0x90
[120408.599448]  ret_from_fork+0x22/0x30

Fixes: 1d65833 "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
Cc: Simo Sorce <[email protected]>
Reported-by: Olga Kornievskaia <[email protected]>
Tested-by: Olga Kornievskaia <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Apr 26, 2017
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 rib#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 rib#9 [] tcp_rcv_established at ffffffff81580b64
rib#10 [] tcp_v4_do_rcv at ffffffff8158b54a
rib#11 [] tcp_v4_rcv at ffffffff8158cd02
rib#12 [] ip_local_deliver_finish at ffffffff815668f4
rib#13 [] ip_local_deliver at ffffffff81566bd9
rib#14 [] ip_rcv_finish at ffffffff8156656d
rib#15 [] ip_rcv at ffffffff81566f06
rib#16 [] __netif_receive_skb_core at ffffffff8152b3a2
rib#17 [] __netif_receive_skb at ffffffff8152b608
rib#18 [] netif_receive_skb at ffffffff8152b690
rib#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
rib#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
rib#21 [] net_rx_action at ffffffff8152bac2
rib#22 [] __do_softirq at ffffffff81084b4f
rib#23 [] call_softirq at ffffffff8164845c
rib#24 [] do_softirq at ffffffff81016fc5
rib#25 [] irq_exit at ffffffff81084ee5
torvalds#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue May 25, 2017
commit f094446 upstream.

Commit d52c975 ("coresight: reset "enable_sink" flag when need be")
caused a kernel panic because of the using of an invalid value: after
'for_each_cpu(cpu, mask)', value of local variable 'cpu' become invalid,
causes following 'cpu_to_node' access invalid memory area.

This patch brings the deleted 'cpu = cpumask_first(mask)' back.

Panic log:

 $ perf record -e cs_etm// ls

 Unable to handle kernel paging request at virtual address fffe801804af4f10
 pgd = ffff8017ce031600
 [fffe801804af4f10] *pgd=0000000000000000, *pud=0000000000000000
 Internal error: Oops: 96000004 [rib#1] SMP
 Modules linked in:
 CPU: 33 PID: 1619 Comm: perf Not tainted 4.7.1+ rib#16
 Hardware name: Huawei Taishan 2280 /CH05TEVBA, BIOS 1.10 11/24/2016
 task: ffff8017cb0c8400 ti: ffff8017cb154000 task.ti: ffff8017cb154000
 PC is at tmc_alloc_etf_buffer+0x60/0xd4
 LR is at tmc_alloc_etf_buffer+0x44/0xd4
 pc : [<ffff000008633df8>] lr : [<ffff000008633ddc>] pstate: 60000145
 sp : ffff8017cb157b40
 x29: ffff8017cb157b40 x28: 0000000000000000
 ...skip...
 7a60: ffff000008c64dc8 0000000000000006 0000000000000253 ffffffffffffffff
 7a80: 0000000000000000 0000000000000000 ffff0000080872cc 0000000000000001
 [<ffff000008633df8>] tmc_alloc_etf_buffer+0x60/0xd4
 [<ffff000008632b9c>] etm_setup_aux+0x1dc/0x1e8
 [<ffff00000816eed4>] rb_alloc_aux+0x2b0/0x338
 [<ffff00000816a5e4>] perf_mmap+0x414/0x568
 [<ffff0000081ab694>] mmap_region+0x324/0x544
 [<ffff0000081abbe8>] do_mmap+0x334/0x3e0
 [<ffff000008191150>] vm_mmap_pgoff+0xa4/0xc8
 [<ffff0000081a9a30>] SyS_mmap_pgoff+0xb0/0x22c
 [<ffff0000080872e4>] sys_mmap+0x18/0x28
 [<ffff0000080843f0>] el0_svc_naked+0x24/0x28
 Code: 912040a5 d0001c00 f873d821 911c6000 (b8656822)
 ---[ end trace 98933da8f92b0c9a ]---

Signed-off-by: Wang Nan <[email protected]>
Cc: Xia Kaixu <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: d52c975 ("coresight: reset "enable_sink" flag when need be")
Signed-off-by: Mathieu Poirier <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue May 25, 2017
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 rib#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 rib#9 [] tcp_rcv_established at ffffffff81580b64
rib#10 [] tcp_v4_do_rcv at ffffffff8158b54a
rib#11 [] tcp_v4_rcv at ffffffff8158cd02
rib#12 [] ip_local_deliver_finish at ffffffff815668f4
rib#13 [] ip_local_deliver at ffffffff81566bd9
rib#14 [] ip_rcv_finish at ffffffff8156656d
rib#15 [] ip_rcv at ffffffff81566f06
rib#16 [] __netif_receive_skb_core at ffffffff8152b3a2
rib#17 [] __netif_receive_skb at ffffffff8152b608
rib#18 [] netif_receive_skb at ffffffff8152b690
rib#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
rib#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
rib#21 [] net_rx_action at ffffffff8152bac2
rib#22 [] __do_softirq at ffffffff81084b4f
rib#23 [] call_softirq at ffffffff8164845c
rib#24 [] do_softirq at ffffffff81016fc5
rib#25 [] irq_exit at ffffffff81084ee5
torvalds#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue May 25, 2017
[ Upstream commit ddc665a ]

When the instruction right before the branch destination is
a 64 bit load immediate, we currently calculate the wrong
jump offset in the ctx->offset[] array as we only account
one instruction slot for the 64 bit load immediate although
it uses two BPF instructions. Fix it up by setting the offset
into the right slot after we incremented the index.

Before (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // rib#3
  00000028:  d2800041  mov x1, #0x2 // rib#2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  54ffff82  b.cs 0x00000020
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl rib#16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl torvalds#32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl torvalds#48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl rib#16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl torvalds#32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl torvalds#48
  [...]

After (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // rib#3
  00000028:  d2800041  mov x1, #0x2 // rib#2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  540000a2  b.cs 0x00000044
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl rib#16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl torvalds#32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl torvalds#48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl rib#16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl torvalds#32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl torvalds#48
  [...]

Also, add a couple of test cases to make sure JITs pass
this test. Tested on Cavium ThunderX ARMv8. The added
test cases all pass after the fix.

Fixes: 8eee539 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
Reported-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Xi Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Aug 2, 2017
Preempt can occur in the preemption timer expiration handler:

          CPU0                    CPU1

  preemption timer vmexit
  handle_preemption_timer(vCPU0)
    kvm_lapic_expired_hv_timer
      hv_timer_is_use == true
  sched_out
                           sched_in
                           kvm_arch_vcpu_load
                             kvm_lapic_restart_hv_timer
                               restart_apic_timer
                                 start_hv_timer
                                   already-expired timer or sw timer triggerd in the window
                                 start_sw_timer
                                   cancel_hv_timer
                           /* back in kvm_lapic_expired_hv_timer */
                           cancel_hv_timer
                             WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops

This can be reproduced if CONFIG_PREEMPT is enabled.

------------[ cut here ]------------
 WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
 CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ rib#16
 RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
Call Trace:
  handle_preemption_timer+0xe/0x20 [kvm_intel]
  vmx_handle_exit+0xb8/0xd70 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25
 ------------[ cut here ]------------
 WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
 CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ rib#16
 RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
Call Trace:
  kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
  handle_preemption_timer+0xe/0x20 [kvm_intel]
  vmx_handle_exit+0xb8/0xd70 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
and start_sw_timer be in preemption-disabled regions, which trivially
avoid any reentrancy issue with preempt notifier.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
[Add more WARNs. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this issue Aug 23, 2017
syzkaller got crashes with CONFIG_HARDENED_USERCOPY=y configs.

Issue here is that recvfrom() can be used with user buffer of Z bytes,
and SO_PEEK_OFF of X bytes, from a skb with Y bytes, and following
condition :

Z < X < Y

kernel BUG at mm/usercopy.c:72!
invalid opcode: 0000 [rib#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 2917 Comm: syzkaller842281 Not tainted 4.13.0-rc3+ rib#16
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
task: ffff8801d2fa40c0 task.stack: ffff8801d1fe8000
RIP: 0010:report_usercopy mm/usercopy.c:64 [inline]
RIP: 0010:__check_object_size+0x3ad/0x500 mm/usercopy.c:264
RSP: 0018:ffff8801d1fef8a8 EFLAGS: 00010286
RAX: 0000000000000078 RBX: ffffffff847102c0 RCX: 0000000000000000
RDX: 0000000000000078 RSI: 1ffff1003a3fded5 RDI: ffffed003a3fdf09
RBP: ffff8801d1fef998 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d1ea480e
R13: fffffffffffffffa R14: ffffffff84710280 R15: dffffc0000000000
FS:  0000000001360880(0000) GS:ffff8801dc000000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000202ecfe4 CR3: 00000001d1ff8000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 check_object_size include/linux/thread_info.h:108 [inline]
 check_copy_size include/linux/thread_info.h:139 [inline]
 copy_to_iter include/linux/uio.h:105 [inline]
 copy_linear_skb include/net/udp.h:371 [inline]
 udpv6_recvmsg+0x1040/0x1af0 net/ipv6/udp.c:395
 inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
 sock_recvmsg_nosec net/socket.c:792 [inline]
 sock_recvmsg+0xc9/0x110 net/socket.c:799
 SYSC_recvfrom+0x2d6/0x570 net/socket.c:1788
 SyS_recvfrom+0x40/0x50 net/socket.c:1760
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: b65ac44 ("udp: try to avoid 2 cache miss on dequeue")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Dec 14, 2017
According to LS1021A RM, the value of PAL can be set so that the start of the
IP header in the receive data buffer is aligned to a 32-bit boundary. Normally,
setting PAL = 2 provides minimal padding to ensure such alignment of the IP
header.

However every incoming packet's 8-byte time stamp will be inserted into the
packet data buffer as padding alignment bytes when hardware time stamping is
enabled.

So we set the padding 8+2 here to avoid the flooded alignment faults:

root@128:~# cat /proc/cpu/alignment
User:           0
System:         17539 (inet_gro_receive+0x114/0x2c0)
Skipped:        0
Half:           0
Word:           0
DWord:          0
Multi:          17539
User faults:    2 (fixup)

Also shown when exception report enablement

CPU: 0 PID: 161 Comm: irq/66-eth1_g0_ Not tainted 4.1.21-rt13-WR8.0.0.0_preempt-rt rib#16
Hardware name: Freescale LS1021A
[<8001b420>] (unwind_backtrace) from [<8001476c>] (show_stack+0x20/0x24)
[<8001476c>] (show_stack) from [<807cfb48>] (dump_stack+0x94/0xac)
[<807cfb48>] (dump_stack) from [<80025d70>] (do_alignment+0x720/0x958)
[<80025d70>] (do_alignment) from [<80009224>] (do_DataAbort+0x40/0xbc)
[<80009224>] (do_DataAbort) from [<80015398>] (__dabt_svc+0x38/0x60)
Exception stack(0x86ad1cc0 to 0x86ad1d08)
1cc0: f9b3e080 86b3d072 2d78d287 00000000 866816c0 86b3d05e 86e785d0 00000000
1ce0: 00000011 0000000e 80840ab0 86ad1d3c 86ad1d08 86ad1d08 806d7fc0 806d806c
1d00: 40070013 ffffffff
[<80015398>] (__dabt_svc) from [<806d806c>] (inet_gro_receive+0x114/0x2c0)
[<806d806c>] (inet_gro_receive) from [<80660eec>] (dev_gro_receive+0x21c/0x3c0)
[<80660eec>] (dev_gro_receive) from [<8066133c>] (napi_gro_receive+0x44/0x17c)
[<8066133c>] (napi_gro_receive) from [<804f0538>] (gfar_clean_rx_ring+0x39c/0x7d4)
[<804f0538>] (gfar_clean_rx_ring) from [<804f0bf4>] (gfar_poll_rx_sq+0x58/0xe0)
[<804f0bf4>] (gfar_poll_rx_sq) from [<80660b10>] (net_rx_action+0x27c/0x43c)
[<80660b10>] (net_rx_action) from [<80033638>] (do_current_softirqs+0x1e0/0x3dc)
[<80033638>] (do_current_softirqs) from [<800338c4>] (__local_bh_enable+0x90/0xa8)
[<800338c4>] (__local_bh_enable) from [<8008025c>] (irq_forced_thread_fn+0x70/0x84)
[<8008025c>] (irq_forced_thread_fn) from [<800805e8>] (irq_thread+0x16c/0x244)
[<800805e8>] (irq_thread) from [<8004e490>] (kthread+0xe8/0x104)
[<8004e490>] (kthread) from [<8000fda8>] (ret_from_fork+0x14/0x2c)

Signed-off-by: Zumeng Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Dec 6, 2018
Reported by syzkaller:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
 PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
 Oops: 0000 [rib#1] PREEMPT SMP PTI
 CPU: 3 PID: 2567 Comm: poc Tainted: G           OE     4.19.0-rc5 rib#16
 RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
 Call Trace:
  kvm_emulate_hypercall+0x3cc/0x700 [kvm]
  handle_vmcall+0xe/0x10 [kvm_intel]
  vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
  vcpu_enter_guest+0x9fb/0x1910 [kvm]
  kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
  kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
  do_vfs_ioctl+0xa5/0x690
  ksys_ioctl+0x6d/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x83/0x6e0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that the apic map has not yet been initialized, the testcase
triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
is dereferenced. This patch fixes it by checking whether or not apic map is
NULL and bailing out immediately if that is the case.

Fixes: 4180bf1 (KVM: X86: Implement "send IPI" hypercall)
Reported-by: Wei Wu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Wei Wu <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Dec 6, 2018
Reported by syzkaller:

 BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
 PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
 Oops: 0000 [rib#1] PREEMPT SMP PTI
 CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 rib#16
 RIP: 0010:__lock_acquire+0x1a6/0x1990
 Call Trace:
  lock_acquire+0xdb/0x210
  _raw_spin_lock+0x38/0x70
  kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
  vcpu_enter_guest+0x167e/0x1910 [kvm]
  kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
  kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
  do_vfs_ioctl+0xa5/0x690
  ksys_ioctl+0x6d/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x83/0x6e0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.
This can be triggered by the following program:

    #define _GNU_SOURCE

    #include <endian.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/syscall.h>
    #include <sys/types.h>
    #include <unistd.h>

    uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};

    int main(void)
    {
    	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
    	long res = 0;
    	memcpy((void*)0x20000040, "/dev/kvm", 9);
    	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
    	if (res != -1)
    		r[0] = res;
    	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
    	if (res != -1)
    		r[1] = res;
    	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
    	if (res != -1)
    		r[2] = res;
    	memcpy(
    			(void*)0x20000080,
    			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
    			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
    			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
    			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
    			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
    			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
    			106);
    	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
    	syscall(__NR_ioctl, r[2], 0xae80, 0);
    	return 0;
    }

This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
kernel.

Reported-by: Wei Wu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Wei Wu <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Feb 14, 2019
memcpy_fromio() doesn't provide any control over access size.  For example,
on arm64, it is implemented using readb and readq.  This may trigger a
synchronous external abort:

[    3.729943] Internal error: synchronous external abort: 96000210 [rib#1] PREEMPT SMP
[    3.737000] Modules linked in:
[    3.744371] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S                4.20.0-rc4 rib#16
[    3.747413] Hardware name: Qualcomm Technologies, Inc. MSM8998 v1 MTP (DT)
[    3.755295] pstate: 00000005 (nzcv daif -PAN -UAO)
[    3.761978] pc : __memcpy_fromio+0x68/0x80
[    3.766718] lr : ufshcd_dump_regs+0x50/0xb0
[    3.770767] sp : ffff00000807ba00
[    3.774830] x29: ffff00000807ba00 x28: 00000000fffffffb
[    3.778344] x27: ffff0000089db068 x26: ffff8000f6e58000
[    3.783728] x25: 000000000000000e x24: 0000000000000800
[    3.789023] x23: ffff8000f6e587c8 x22: 0000000000000800
[    3.794319] x21: ffff000008908368 x20: ffff8000f6e1ab80
[    3.799615] x19: 000000000000006c x18: ffffffffffffffff
[    3.804910] x17: 0000000000000000 x16: 0000000000000000
[    3.810206] x15: ffff000009199648 x14: ffff000089244187
[    3.815502] x13: ffff000009244195 x12: ffff0000091ab000
[    3.820797] x11: 0000000005f5e0ff x10: ffff0000091998a0
[    3.826093] x9 : 0000000000000000 x8 : ffff8000f6e1ac00
[    3.831389] x7 : 0000000000000000 x6 : 0000000000000068
[    3.836676] x5 : ffff8000f6e1abe8 x4 : 0000000000000000
[    3.841971] x3 : ffff00000928c868 x2 : ffff8000f6e1abec
[    3.847267] x1 : ffff00000928c868 x0 : ffff8000f6e1abe8
[    3.852567] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[    3.857900] Call trace:
[    3.864473]  __memcpy_fromio+0x68/0x80
[    3.866683]  ufs_qcom_dump_dbg_regs+0x1c0/0x370
[    3.870522]  ufshcd_print_host_regs+0x168/0x190
[    3.874946]  ufshcd_init+0xd4c/0xde0
[    3.879459]  ufshcd_pltfrm_init+0x3c8/0x550
[    3.883264]  ufs_qcom_probe+0x24/0x60
[    3.887188]  platform_drv_probe+0x50/0xa0

Assuming aligned 32-bit registers, let's use readl, after making sure
that 'offset' and 'len' are indeed multiples of 4.

Fixes: ba80917 ("scsi: ufs: ufshcd_dump_regs to use memcpy_fromio")
Cc: <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Acked-by: Tomas Winkler <[email protected]>
Reviewed-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Tested-by: Evan Green <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this issue Mar 29, 2019
…r-free issue

The evlist should be destroyed before the perf session.

Detected with gcc's ASan:

  =================================================================
  ==27350==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000002e38 at pc 0x5611da276999 bp 0x7ffce8f1d1a0 sp 0x7ffce8f1d190
  WRITE of size 8 at 0x62b000002e38 thread T0
      #0 0x5611da276998 in __list_del /home/work/linux/tools/include/linux/list.h:89
      rib#1 0x5611da276d4a in __list_del_entry /home/work/linux/tools/include/linux/list.h:102
      rib#2 0x5611da276e77 in list_del_init /home/work/linux/tools/include/linux/list.h:145
      rib#3 0x5611da2781cd in thread__put util/thread.c:130
      rib#4 0x5611da2cc0a8 in __thread__zput util/thread.h:68
      rib#5 0x5611da2d2dcb in hist_entry__delete util/hist.c:1148
      rib#6 0x5611da2cdf91 in hists__delete_entry util/hist.c:337
      rib#7 0x5611da2ce19e in hists__delete_entries util/hist.c:365
      rib#8 0x5611da2db2ab in hists__delete_all_entries util/hist.c:2639
      rib#9 0x5611da2db325 in hists_evsel__exit util/hist.c:2651
      rib#10 0x5611da1c5352 in perf_evsel__exit util/evsel.c:1304
      rib#11 0x5611da1c5390 in perf_evsel__delete util/evsel.c:1309
      rib#12 0x5611da1b35f0 in perf_evlist__purge util/evlist.c:124
      rib#13 0x5611da1b38e2 in perf_evlist__delete util/evlist.c:148
      rib#14 0x5611da069781 in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1645
      rib#15 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#16 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#17 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#18 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#19 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
      rib#20 0x5611d9ff35c9 in _start (/home/work/linux/tools/perf/perf+0x3e95c9)

  0x62b000002e38 is located 11320 bytes inside of 27448-byte region [0x62b000000200,0x62b000006d38)
  freed by thread T0 here:
      #0 0x7fdccb04ab70 in free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedb70)
      rib#1 0x5611da260df4 in perf_session__delete util/session.c:201
      rib#2 0x5611da063de5 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1300
      rib#3 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642
      rib#4 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#5 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#6 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#7 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#8 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  previously allocated by thread T0 here:
      #0 0x7fdccb04b138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x5611da26010c in zalloc util/util.h:23
      rib#2 0x5611da260824 in perf_session__new util/session.c:118
      rib#3 0x5611da0633a6 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1192
      rib#4 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642
      rib#5 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#6 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#7 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#8 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#9 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  SUMMARY: AddressSanitizer: heap-use-after-free /home/work/linux/tools/include/linux/list.h:89 in __list_del
  Shadow bytes around the buggy address:
    0x0c567fff8570: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  =>0x0c567fff85c0: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd
    0x0c567fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff85f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c567fff8610: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  ==27350==ABORTING

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this issue Mar 29, 2019
Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x5625e5330a5e in zalloc util/util.h:23
      rib#2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      rib#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      rib#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      rib#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      rib#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      rib#7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      rib#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      rib#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      rib#10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      rib#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      rib#1 0x5625e532560d in zalloc util/util.h:23
      rib#2 0x5625e532566b in xyarray__new util/xyarray.c:10
      rib#3 0x5625e5330aba in perf_counts__new util/counts.c:15
      rib#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      rib#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      rib#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      rib#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      rib#8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      rib#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      rib#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      rib#11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      rib#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      rib#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      rib#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      rib#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      rib#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this issue Apr 2, 2019
Calling kvm_is_visible_gfn() implies that we're parsing the memslots,
and doing this without the srcu lock is frown upon:

[12704.164532] =============================
[12704.164544] WARNING: suspicious RCU usage
[12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty rib#16 Tainted: G        W
[12704.164573] -----------------------------
[12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[12704.164602] other info that might help us debug this:
[12704.164616] rcu_scheduler_active = 2, debug_locks = 1
[12704.164631] 6 locks held by qemu-system-aar/13968:
[12704.164644]  #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[12704.164691]  rib#1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[12704.164726]  rib#2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164761]  rib#3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164794]  rib#4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164827]  rib#5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164861] stack backtrace:
[12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G        W         5.1.0-rc1-00008-g600025238f51-dirty rib#16
[12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[12704.164896] Call trace:
[12704.164910]  dump_backtrace+0x0/0x138
[12704.164920]  show_stack+0x24/0x30
[12704.164934]  dump_stack+0xbc/0x104
[12704.164946]  lockdep_rcu_suspicious+0xcc/0x110
[12704.164958]  gfn_to_memslot+0x174/0x190
[12704.164969]  kvm_is_visible_gfn+0x28/0x70
[12704.164980]  vgic_its_check_id.isra.0+0xec/0x1e8
[12704.164991]  vgic_its_save_tables_v0+0x1ac/0x330
[12704.165001]  vgic_its_set_attr+0x298/0x3a0
[12704.165012]  kvm_device_ioctl_attr+0x9c/0xd8
[12704.165022]  kvm_device_ioctl+0x8c/0xf8
[12704.165035]  do_vfs_ioctl+0xc8/0x960
[12704.165045]  ksys_ioctl+0x8c/0xa0
[12704.165055]  __arm64_sys_ioctl+0x28/0x38
[12704.165067]  el0_svc_common+0xd8/0x138
[12704.165078]  el0_svc_handler+0x38/0x78
[12704.165089]  el0_svc+0x8/0xc

Make sure the lock is taken when doing this.

Fixes: bf30824 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Reviewed-by: Eric Auger <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Jul 9, 2019
ifmsh->csa is an RCU-protected pointer. The writer context
in ieee80211_mesh_finish_csa() is already mutually
exclusive with wdev->sdata.mtx, but the RCU checker did
not know this. Use rcu_dereference_protected() to avoid a
warning.

fixes the following warning:

[   12.519089] =============================
[   12.520042] WARNING: suspicious RCU usage
[   12.520652] 5.1.0-rc7-wt+ rib#16 Tainted: G        W
[   12.521409] -----------------------------
[   12.521972] net/mac80211/mesh.c:1223 suspicious rcu_dereference_check() usage!
[   12.522928] other info that might help us debug this:
[   12.523984] rcu_scheduler_active = 2, debug_locks = 1
[   12.524855] 5 locks held by kworker/u8:2/152:
[   12.525438]  #0: 00000000057be08c ((wq_completion)phy0){+.+.}, at: process_one_work+0x1a2/0x620
[   12.526607]  rib#1: 0000000059c6b07a ((work_completion)(&sdata->csa_finalize_work)){+.+.}, at: process_one_work+0x1a2/0x620
[   12.528001]  rib#2: 00000000f184ba7d (&wdev->mtx){+.+.}, at: ieee80211_csa_finalize_work+0x2f/0x90
[   12.529116]  rib#3: 00000000831a1f54 (&local->mtx){+.+.}, at: ieee80211_csa_finalize_work+0x47/0x90
[   12.530233]  rib#4: 00000000fd06f988 (&local->chanctx_mtx){+.+.}, at: ieee80211_csa_finalize_work+0x51/0x90

Signed-off-by: Thomas Pedersen <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Mar 17, 2020
drm_minor_unregister will invoke drm_debugfs_cleanup
to clean all the child node under primary minor node.
We don't need to invoke amdgpu_debugfs_fini and
amdgpu_debugfs_regs_cleanup to clean agian.
Otherwise, it will raise the NULL pointer like below.
[   45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[   45.047256] PGD 0 P4D 0
[   45.047713] Oops: 0002 [rib#1] SMP PTI
[   45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G        W  OE     4.18.0-15-generic rib#16~18.04.1-Ubuntu
[   45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   45.050651] RIP: 0010:down_write+0x1f/0x40
[   45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
[   45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
[   45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
[   45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
[   45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
[   45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
[   45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
[   45.059221] FS:  00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
[   45.060809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
[   45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.065897] Call Trace:
[   45.066426]  debugfs_remove+0x36/0xa0
[   45.067131]  amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
[   45.068019]  amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
[   45.068756]  amdgpu_pci_remove+0x49/0x70 [amdgpu]
[   45.069439]  pci_device_remove+0x3e/0xc0
[   45.070037]  device_release_driver_internal+0x18a/0x260
[   45.070842]  driver_detach+0x3f/0x80
[   45.071325]  bus_remove_driver+0x59/0xd0
[   45.071850]  driver_unregister+0x2c/0x40
[   45.072377]  pci_unregister_driver+0x22/0xa0
[   45.073043]  amdgpu_exit+0x15/0x57c [amdgpu]
[   45.073683]  __x64_sys_delete_module+0x146/0x280
[   45.074369]  do_syscall_64+0x5a/0x120
[   45.074916]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

v2: remove all debugfs cleanup/fini code at amdgpu
v3: squash in unused variable removal

Signed-off-by: Yintian Tao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this issue Mar 17, 2020
If we release drm_device before amdgpu_driver_unload_kms,
then it will raise the error below. Therefore, we need to
place it before amdgpu_driver_unload_kms.
[   43.055736] Memory manager not clean during takedown.
[   43.055777] WARNING: CPU: 1 PID: 2807 at /build/linux-hwe-9KJ07q/linux-hwe-4.18.0/drivers/gpu/drm/drm_mm.c:913 drm_mm_takedown+0x24/0x30 [drm]
[   43.055778] Modules linked in: amdgpu(OE-) amd_sched(OE) amdttm(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hda_codec_generic nfit kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_seq_midi snd_seq_midi_event pcbc snd_rawmidi snd_seq snd_seq_device aesni_intel snd_timer joydev aes_x86_64 crypto_simd cryptd glue_helper snd soundcore input_leds mac_hid serio_raw qemu_fw_cfg binfmt_misc sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic floppy usbhid psmouse hid i2c_piix4 e1000 pata_acpi
[   43.055819] CPU: 1 PID: 2807 Comm: modprobe Tainted: G           OE     4.18.0-15-generic rib#16~18.04.1-Ubuntu
[   43.055820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   43.055830] RIP: 0010:drm_mm_takedown+0x24/0x30 [drm]
[   43.055831] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 02 f3 c3 55 48 c7 c7 38 33 80 c0 48 89 e5 e8 1c 41 ec d0 <0f> 0b 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
[   43.055857] RSP: 0018:ffffae33c1393d28 EFLAGS: 00010286
[   43.055859] RAX: 0000000000000000 RBX: ffff9651b4a29800 RCX: 0000000000000006
[   43.055860] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9651bfc964b0
[   43.055861] RBP: ffffae33c1393d28 R08: 00000000000002a6 R09: 0000000000000004
[   43.055861] R10: ffffae33c1393d20 R11: 0000000000000001 R12: ffff9651ba6cb000
[   43.055863] R13: ffff9651b7f40000 R14: ffffffffc0de3a10 R15: ffff9651ba5c6460
[   43.055864] FS:  00007f1d3c08d540(0000) GS:ffff9651bfc80000(0000) knlGS:0000000000000000
[   43.055865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.055866] CR2: 00005630a5831640 CR3: 000000012e274004 CR4: 00000000003606e0
[   43.055870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.055871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   43.055871] Call Trace:
[   43.055885]  drm_vma_offset_manager_destroy+0x1b/0x30 [drm]
[   43.055894]  drm_gem_destroy+0x19/0x40 [drm]
[   43.055903]  drm_dev_fini+0x7f/0x90 [drm]
[   43.055911]  drm_dev_release+0x2b/0x40 [drm]
[   43.055919]  drm_dev_unplug+0x64/0x80 [drm]
[   43.055994]  amdgpu_pci_remove+0x39/0x70 [amdgpu]
[   43.055998]  pci_device_remove+0x3e/0xc0
[   43.056001]  device_release_driver_internal+0x18a/0x260
[   43.056003]  driver_detach+0x3f/0x80
[   43.056004]  bus_remove_driver+0x59/0xd0
[   43.056006]  driver_unregister+0x2c/0x40
[   43.056008]  pci_unregister_driver+0x22/0xa0
[   43.056087]  amdgpu_exit+0x15/0x57c [amdgpu]
[   43.056090]  __x64_sys_delete_module+0x146/0x280
[   43.056094]  do_syscall_64+0x5a/0x120

v2: put drm_dev_put after pci_set_drvdata

Signed-off-by: Yintian Tao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants