Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip/matt auld/dynamic oa configs #2

Conversation

matt-auld
Copy link

As an initial implementation it would be great if you could give me some feedback when you get the time @rib. Thanks.

To smoke test this, I have just registered a known-to-work OA config and opened a query using its id.

Some notes for further discussion

  • When a user adds a new OA config, userland needs some way of knowing the id associated with the OA config. In the current implementation we just fill out the user struct with the id. One alternative here is to have the return value for the ioctl be the id.
  • For the validation of the OA registers I have just bundled together the address ranges from the configs you have included, for each of the MUX + FLEX + BOOL configurations respectively. I haven't constrained the validation for each GEN, I probably should? Are these all the possible addresses, is it possible to create an OA config which uses an address outside of our range?
  • When a duplicate guid is created we just bail. As you mentioned we should maybe look into updating the OA config instead.
  • The format of the OA configs passed from userland should be a block of u32 for each config. Is there a format which could be better here?

The motivation behind this new interface is expose at runtime
the creation of new OA configs which can be used as part of the
i915 perf open interface. This will enable the kernel to learn new
configs which may be experimental, or otherwise not part of the core set
currently available through the i915 perf interface.
@matt-auld matt-auld closed this Jan 22, 2016
rib pushed a commit that referenced this pull request Feb 22, 2016
Returning to delay slot, riding an interrupti, had one loose end.
AUX_USER_SP used for restoring user mode SP upon RTIE was not being
setup from orig task's saved value, causing task to use wrong SP,
leading to ProtV errors.

The reason being:
 - INTERRUPT_EPILOGUE returns to a kernel trampoline, thus not expected to restore it
 - EXCEPTION_EPILOGUE is not used at all

Fix that by restoring AUX_USER_SP explicitly in the trampoline.

This was broken in the original workaround, but the error scenarios got
reduced considerably since v3.14 due to following:

 1. The Linuxthreads.old based userspace at the time caused many more
    exceptions in delay slot than the current NPTL based one.
    Infact with current userspace the error doesn't happen at all.

 2. Return from interrupt (delay slot or otherwise) doesn't get exercised much
    after commit 4de0e52 ("Really Re-enable interrupts to avoid deadlocks")
    since IRQ_ACTIVE.active being clear means most returns are as if from pure
    kernel (even for active interrupts)

Infact the issue only happened in an experimental branch where I was tinkering with
reverted 4de0e52

Cc: [email protected] # v4.2+
Fixes: 4255b07 ("ARCv2: STAR 9000793984: Handle return from intr to Delay Slot")
Signed-off-by: Vineet Gupta <[email protected]>
rib pushed a commit that referenced this pull request Feb 22, 2016
Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Tong Zhang <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected] # v4.4+
Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
rib pushed a commit that referenced this pull request Feb 22, 2016
The starting node for a klist iteration is often passed in from
somewhere way above the klist infrastructure, meaning there's no
guarantee the node is still on the list.  We've seen this in SCSI where
we use bus_find_device() to iterate through a list of devices.  In the
face of heavy hotplug activity, the last device returned by
bus_find_device() can be removed before the next call.  This leads to

Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
Dec  3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
Dec  3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
[...]

And an eventual crash. It can actually occur in any hotplug system
which has a device finder and a starting device.

We can fix this globally by making sure the starting node for
klist_iter_init_node() is actually a member of the list before using it
(and by starting from the beginning if it isn't).

Reported-by: Ewan D. Milne <[email protected]>
Tested-by: Ewan D. Milne <[email protected]>
Cc: [email protected]
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
rib pushed a commit that referenced this pull request Feb 22, 2016
Commit 4b4b451 ("arm/arm64: KVM: Rework the arch timer to use
level-triggered semantics") brought the virtual architected timer
closer to the VGIC. There is one occasion were we don't properly
check for the VGIC actually having been initialized before, but
instead go on to check the active state of some IRQ number.
If userland hasn't instantiated a virtual GIC, we end up with a
kernel NULL pointer dereference:
=========
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc9745c5000
[00000000] *pgd=00000009f631e003, *pud=00000009f631e003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#2] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 2144 Comm: kvm_simplest-ar Tainted: G      D 4.5.0-rc2+ #1300
Hardware name: ARM Juno development board (r1) (DT)
task: ffffffc976da8000 ti: ffffffc976e28000 task.ti: ffffffc976e28000
PC is at vgic_bitmap_get_irq_val+0x78/0x90
LR is at kvm_vgic_map_is_active+0xac/0xc8
pc : [<ffffffc0000b7e28>] lr : [<ffffffc0000b972c>] pstate: 20000145
....
=========

Fix this by bailing out early of kvm_timer_flush_hwstate() if we don't
have a VGIC at all.

Reported-by: Cosmin Gorgovan <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Signed-off-by: Andre Przywara <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Cc: <[email protected]> # 4.4.x
rib pushed a commit that referenced this pull request Feb 22, 2016
…l/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "I've been sitting on some of these fixes for a while.

   - Corner case of returning to delay slot from interrupt
   - Changing default interrupt prioiry level
   - Kconfig'ize support for super pages
   - Other minor fixes"

* tag 'arc-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: mm: Introduce explicit super page size support
  ARCv2: intc: Allow interruption by lowest priority interrupt
  ARCv2: Check for LL-SC livelock only if LLSC is enabled
  ARC: shrink cpuinfo by not saving full timer BCR
  ARCv2: clocksource: Rename GRTC -> GFRC ...
  ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Feb 29, 2016
The mutex lock at rc_register_device() was added by commit 08aeb7c
("[media] rc: add locking to fix register/show race").

It is meant to avoid race issues when trying to open a sysfs file while
the RC register didn't complete.

Adding a lock there causes troubles, as detected by the Kernel lock
debug instrumentation at the Kernel:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    4.5.0-rc3+ torvalds#46 Not tainted
    -------------------------------------------------------
    systemd-udevd/2681 is trying to acquire lock:
     (s_active#171){++++.+}, at: [<ffffffff8171a115>] kernfs_remove_by_name_ns+0x45/0xa0

    but task is already holding lock:
     (&dev->lock){+.+.+.}, at: [<ffffffffa0724def>] rc_register_device+0xb2f/0x1450 [rc_core]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> rib#1 (&dev->lock){+.+.+.}:
           [<ffffffff8124817d>] lock_acquire+0x13d/0x320
           [<ffffffff822de966>] mutex_lock_nested+0xb6/0x860
           [<ffffffffa0721f2b>] show_protocols+0x3b/0x3f0 [rc_core]
           [<ffffffff81cdaba5>] dev_attr_show+0x45/0xc0
           [<ffffffff8171f1b3>] sysfs_kf_seq_show+0x203/0x3c0
           [<ffffffff8171a6a1>] kernfs_seq_show+0x121/0x1b0
           [<ffffffff81617c71>] seq_read+0x2f1/0x1160
           [<ffffffff8171c911>] kernfs_fop_read+0x321/0x460
           [<ffffffff815abc20>] __vfs_read+0xe0/0x3d0
           [<ffffffff815ae90e>] vfs_read+0xde/0x2d0
           [<ffffffff815b1d01>] SyS_read+0x111/0x230
           [<ffffffff822e8636>] entry_SYSCALL_64_fastpath+0x16/0x76

    -> #0 (s_active#171){++++.+}:
           [<ffffffff81244f24>] __lock_acquire+0x4304/0x5990
           [<ffffffff8124817d>] lock_acquire+0x13d/0x320
           [<ffffffff81717d3a>] __kernfs_remove+0x58a/0x810
           [<ffffffff8171a115>] kernfs_remove_by_name_ns+0x45/0xa0
           [<ffffffff81721592>] remove_files.isra.0+0x72/0x190
           [<ffffffff8172174b>] sysfs_remove_group+0x9b/0x150
           [<ffffffff81721854>] sysfs_remove_groups+0x54/0xa0
           [<ffffffff81cd97d0>] device_remove_attrs+0xb0/0x140
           [<ffffffff81cdb27c>] device_del+0x38c/0x6b0
           [<ffffffffa0724b8b>] rc_register_device+0x8cb/0x1450 [rc_core]
           [<ffffffffa1326a7b>] dvb_usb_remote_init+0x66b/0x14d0 [dvb_usb]
           [<ffffffffa1321c81>] dvb_usb_device_init+0xf21/0x1860 [dvb_usb]
           [<ffffffffa13517dc>] dib0700_probe+0x14c/0x410 [dvb_usb_dib0700]
           [<ffffffff81dbb1dd>] usb_probe_interface+0x45d/0x940
           [<ffffffff81ce7e7a>] driver_probe_device+0x21a/0xc30
           [<ffffffff81ce89b1>] __driver_attach+0x121/0x160
           [<ffffffff81ce21bf>] bus_for_each_dev+0x11f/0x1a0
           [<ffffffff81ce6cdd>] driver_attach+0x3d/0x50
           [<ffffffff81ce5df9>] bus_add_driver+0x4c9/0x770
           [<ffffffff81cea39c>] driver_register+0x18c/0x3b0
           [<ffffffff81db6e98>] usb_register_driver+0x1f8/0x440
           [<ffffffffa074001e>] dib0700_driver_init+0x1e/0x1000 [dvb_usb_dib0700]
           [<ffffffff810021b1>] do_one_initcall+0x141/0x300
           [<ffffffff8144d8eb>] do_init_module+0x1d0/0x5ad
           [<ffffffff812f27b6>] load_module+0x6666/0x9ba0
           [<ffffffff812f5fe8>] SyS_finit_module+0x108/0x130
           [<ffffffff822e8636>] entry_SYSCALL_64_fastpath+0x16/0x76

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&dev->lock);
                                   lock(s_active#171);
                                   lock(&dev->lock);
      lock(s_active#171);

     *** DEADLOCK ***

    3 locks held by systemd-udevd/2681:
     #0:  (&dev->mutex){......}, at: [<ffffffff81ce8933>] __driver_attach+0xa3/0x160
     rib#1:  (&dev->mutex){......}, at: [<ffffffff81ce8941>] __driver_attach+0xb1/0x160
     rib#2:  (&dev->lock){+.+.+.}, at: [<ffffffffa0724def>] rc_register_device+0xb2f/0x1450 [rc_core]

In this specific case, some error happened during device init,
causing IR to be disabled.

Let's fix it by adding a var that will tell when the device is
initialized. Any calls before that will return a -EINVAL.

That should prevent the race issues.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Feb 29, 2016
Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 rib#1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: rib#1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: rib#2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: rib#3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.

Reported-by: Ilya Dryomov <[email protected]>
Fixes: ebb516a ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Feb 29, 2016
When slub_debug alloc_calls_show is enabled we will try to track
location and user of slab object on each online node, kmem_cache_node
structure and cpu_cache/cpu_slub shouldn't be freed till there is the
last reference to sysfs file.

This fixes the following panic:

   BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
   IP:  list_locations+0x169/0x4e0
   PGD 257304067 PUD 438456067 PMD 0
   Oops: 0000 [rib#1] SMP
   CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty rib#2 9.30
   Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011
   task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000
   RIP: list_locations+0x169/0x4e0
   Call Trace:
     alloc_calls_show+0x1d/0x30
     slab_attr_show+0x1b/0x30
     sysfs_read_file+0x9a/0x1a0
     vfs_read+0x9c/0x170
     SyS_read+0x58/0xb0
     system_call_fastpath+0x16/0x1b
   Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10
   CR2: 0000000000000020

Separated __kmem_cache_release from __kmem_cache_shutdown which now
called on slab_kmem_cache_release (after the last reference to sysfs
file object has dropped).

Reintroduced locking in free_partial as sysfs file might access cache's
partial list after shutdowning - partial revert of the commit
69cb8e6 ("slub: free slabs without holding locks").  Zap
__remove_partial and use remove_partial (w/o underscores) as
free_partial now takes list_lock which s partial revert for commit
1e4dd94 ("slub: do not assert not having lock in removing freed
partial")

Signed-off-by: Dmitry Safonov <[email protected]>
Suggested-by: Vladimir Davydov <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 15, 2016
If we use USB ID pin as wakeup source, and there is a USB block
device on this USB OTG (ID) cable, the system will be deadlock
after system resume.

The root cause for this problem is: the workqueue ci_otg may try
to remove hcd before the driver resume has finished, and hcd will
disconnect the device on it, then, it will call device_release_driver,
and holds the device lock "dev->mutex", but it is never unlocked since
it waits workqueue writeback to run to flush the block information, but
the workqueue writeback is freezable, it is not thawed before driver
resume has finished.

When the driver (device: sd 0:0:0:0:) resume goes to dpm_complete, it
tries to get its device lock "dev->mutex", but it can't get it forever,
then the deadlock occurs. Below call stacks show the situation.

So, in order to fix this problem, we need to change workqueue ci_otg
as freezable, then the work item in this workqueue will be run after
driver's resume, this workqueue will not be blocked forever like above
case since the workqueue writeback has been thawed too.

Tested at: i.mx6qdl-sabresd and i.mx6sx-sdb.

[  555.178869] kworker/u2:13   D c07de74c     0   826      2 0x00000000
[  555.185310] Workqueue: ci_otg ci_otg_work
[  555.189353] Backtrace:
[  555.191849] [<c07de4fc>] (__schedule) from [<c07dec6c>] (schedule+0x48/0xa0)
[  555.198912]  r10:ee471ba0 r9:00000000 r8:00000000 r7:00000002 r6:ee470000 r5:ee471ba4
[  555.206867]  r4:ee470000
[  555.209453] [<c07dec24>] (schedule) from [<c07e2fc4>] (schedule_timeout+0x15c/0x1e0)
[  555.217212]  r4:7fffffff r3:edc2b000
[  555.220862] [<c07e2e68>] (schedule_timeout) from [<c07df6c8>] (wait_for_common+0x94/0x144)
[  555.229140]  r8:00000000 r7:00000002 r6:ee470000 r5:ee471ba4 r4:7fffffff
[  555.235980] [<c07df634>] (wait_for_common) from [<c07df790>] (wait_for_completion+0x18/0x1c)
[  555.244430]  r10:00000001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c r5:ef131b00
[  555.252383]  r4:00000000
[  555.254970] [<c07df778>] (wait_for_completion) from [<c0043cb8>] (flush_work+0x19c/0x234)
[  555.263177] [<c0043b1c>] (flush_work) from [<c0043fac>] (flush_delayed_work+0x48/0x4c)
[  555.271106]  r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c
[  555.277958] [<c0043f64>] (flush_delayed_work) from [<c00eae18>] (bdi_unregister+0x84/0xec)
[  555.286236]  r4:eea43520 r3:20000153
[  555.289885] [<c00ead94>] (bdi_unregister) from [<c02c2154>] (blk_cleanup_queue+0x180/0x29c)
[  555.298250]  r5:eea43808 r4:eea43400
[  555.301909] [<c02c1fd4>] (blk_cleanup_queue) from [<c0417914>] (__scsi_remove_device+0x48/0xb8)
[  555.310623]  r7:00000000 r6:20000153 r5:ededa950 r4:ededa800
[  555.316403] [<c04178cc>] (__scsi_remove_device) from [<c0415e90>] (scsi_forget_host+0x64/0x68)
[  555.325028]  r5:ededa800 r4:ed5b5000
[  555.328689] [<c0415e2c>] (scsi_forget_host) from [<c0409828>] (scsi_remove_host+0x78/0x104)
[  555.337054]  r5:ed5b5068 r4:ed5b5000
[  555.340709] [<c04097b0>] (scsi_remove_host) from [<c04cdfcc>] (usb_stor_disconnect+0x50/0xb4)
[  555.349247]  r6:ed5b56e4 r5:ed5b5818 r4:ed5b5690 r3:00000008
[  555.355025] [<c04cdf7c>] (usb_stor_disconnect) from [<c04b3bc8>] (usb_unbind_interface+0x78/0x25c)
[  555.363997]  r8:c13919b4 r7:edd3c000 r6:edd3c020 r5:ee551c68 r4:ee551c00 r3:c04cdf7c
[  555.371892] [<c04b3b50>] (usb_unbind_interface) from [<c03dc248>] (__device_release_driver+0x8c/0x118)
[  555.381213]  r10:00000001 r9:edd90c00 r8:c13919b4 r7:ee551c68 r6:c0b546e0 r5:c0b5563c
[  555.389167]  r4:edd3c020
[  555.391752] [<c03dc1bc>] (__device_release_driver) from [<c03dc2fc>] (device_release_driver+0x28/0x34)
[  555.401071]  r5:edd3c020 r4:edd3c054
[  555.404721] [<c03dc2d4>] (device_release_driver) from [<c03db304>] (bus_remove_device+0xe0/0x110)
[  555.413607]  r5:edd3c020 r4:ef17f04c
[  555.417253] [<c03db224>] (bus_remove_device) from [<c03d8128>] (device_del+0x114/0x21c)
[  555.425270]  r6:edd3c028 r5:edd3c020 r4:ee551c00 r3:00000000
[  555.431045] [<c03d8014>] (device_del) from [<c04b1560>] (usb_disable_device+0xa4/0x1e8)
[  555.439061]  r8:edd3c000 r7:eded8000 r6:00000000 r5:00000001 r4:ee551c00
[  555.445906] [<c04b14bc>] (usb_disable_device) from [<c04a8e54>] (usb_disconnect+0x74/0x224)
[  555.454271]  r9:edd90c00 r8:ee551000 r7:ee551c68 r6:ee551c9c r5:ee551c00 r4:00000001
[  555.462156] [<c04a8de0>] (usb_disconnect) from [<c04a8fb8>] (usb_disconnect+0x1d8/0x224)
[  555.470259]  r10:00000001 r9:edd90000 r8:ee471e2c r7:ee551468 r6:ee55149c r5:ee551400
[  555.478213]  r4:00000001
[  555.480797] [<c04a8de0>] (usb_disconnect) from [<c04ae5ec>] (usb_remove_hcd+0xa0/0x1ac)
[  555.488813]  r10:00000001 r9:ee471eb0 r8:00000000 r7:ef3d9500 r6:eded810c r5:eded80b0
[  555.496765]  r4:eded8000
[  555.499351] [<c04ae54c>] (usb_remove_hcd) from [<c04d4158>] (host_stop+0x28/0x64)
[  555.506847]  r6:eeb50010 r5:eded8000 r4:eeb51010
[  555.511563] [<c04d4130>] (host_stop) from [<c04d09b8>] (ci_otg_work+0xc4/0x124)
[  555.518885]  r6:00000001 r5:eeb50010 r4:eeb502a0 r3:c04d4130
[  555.524665] [<c04d08f4>] (ci_otg_work) from [<c00454f0>] (process_one_work+0x194/0x420)
[  555.532682]  r6:ef086000 r5:eeb502a0 r4:edc44480
[  555.537393] [<c004535c>] (process_one_work) from [<c00457b0>] (worker_thread+0x34/0x514)
[  555.545496]  r10:edc44480 r9:ef086000 r8:c0b1a100 r7:ef086034 r6:00000088 r5:edc44498
[  555.553450]  r4:ef086000
[  555.556032] [<c004577c>] (worker_thread) from [<c004bab4>] (kthread+0xdc/0xf8)
[  555.563268]  r10:00000000 r9:00000000 r8:00000000 r7:c004577c r6:edc44480 r5:eddc15c0
[  555.571221]  r4:00000000
[  555.573804] [<c004b9d8>] (kthread) from [<c000fef0>] (ret_from_fork+0x14/0x24)
[  555.581040]  r7:00000000 r6:00000000 r5:c004b9d8 r4:eddc15c0

[  553.429383] sh              D c07de74c     0   694    691 0x00000000
[  553.435801] Backtrace:
[  553.438295] [<c07de4fc>] (__schedule) from [<c07dec6c>] (schedule+0x48/0xa0)
[  553.445358]  r10:edd3c054 r9:edd3c078 r8:edddbd50 r7:edcbbc00 r6:c1377c34 r5:60000153
[  553.453313]  r4:eddda000
[  553.455896] [<c07dec24>] (schedule) from [<c07deff8>] (schedule_preempt_disabled+0x10/0x14)
[  553.464261]  r4:edd3c058 r3:0000000a
[  553.467910] [<c07defe8>] (schedule_preempt_disabled) from [<c07e0bbc>] (mutex_lock_nested+0x1a0/0x3e8)
[  553.477254] [<c07e0a1c>] (mutex_lock_nested) from [<c03e927c>] (dpm_complete+0xc0/0x1b0)
[  553.485358]  r10:00561408 r9:edd3c054 r8:c0b4863c r7:edddbd90 r6:c0b485d8 r5:edd3c020
[  553.493313]  r4:edd3c0d0
[  553.495896] [<c03e91bc>] (dpm_complete) from [<c03e9388>] (dpm_resume_end+0x1c/0x20)
[  553.503652]  r9:00000000 r8:c0b1a9d0 r7:c1334ec0 r6:c1334edc r5:00000003 r4:00000010
[  553.511544] [<c03e936c>] (dpm_resume_end) from [<c0079894>] (suspend_devices_and_enter+0x158/0x504)
[  553.520604]  r4:00000000 r3:c1334efc
[  553.524250] [<c007973c>] (suspend_devices_and_enter) from [<c0079e74>] (pm_suspend+0x234/0x2cc)
[  553.532961]  r10:00561408 r9:ed6b7300 r8:00000004 r7:c1334eec r6:00000000 r5:c1334ee8
[  553.540914]  r4:00000003
[  553.543493] [<c0079c40>] (pm_suspend) from [<c0078a6c>] (state_store+0x6c/0xc0)

[  555.703684] 7 locks held by kworker/u2:13/826:
[  555.708140]  #0:  ("%s""ci_otg"){++++.+}, at: [<c0045484>] process_one_work+0x128/0x420
[  555.716277]  rib#1:  ((&ci->work)){+.+.+.}, at: [<c0045484>] process_one_work+0x128/0x420
[  555.724317]  rib#2:  (usb_bus_list_lock){+.+.+.}, at: [<c04ae5e4>] usb_remove_hcd+0x98/0x1ac
[  555.732626]  rib#3:  (&dev->mutex){......}, at: [<c04a8e28>] usb_disconnect+0x48/0x224
[  555.740403]  rib#4:  (&dev->mutex){......}, at: [<c04a8e28>] usb_disconnect+0x48/0x224
[  555.748179]  rib#5:  (&dev->mutex){......}, at: [<c03dc2f4>] device_release_driver+0x20/0x34
[  555.756487]  rib#6:  (&shost->scan_mutex){+.+.+.}, at: [<c04097d0>] scsi_remove_host+0x20/0x104

Cc: <[email protected]> #v3.14+
Cc: Jun Li <[email protected]>
Signed-off-by: Peter Chen <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 15, 2016
We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:

[...]
[   43.986968] eth0: hw csum failure
[   43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 rib#2
[   43.996193] Hardware name: BCM2709
[   43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14)
[   44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90)
[   44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac)
[   44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178)
[   44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188)
[   44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338)
[   44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00)
[   44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c)
[...]

Fixes: 9afd85c ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <[email protected]>
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Mar 15, 2016
Commit cbce790 ("PCI: designware: Make driver arch-agnostic") changed
the host bridge sysdata pointer from the ARM pci_sys_data to the DesignWare
pcie_port structure, and changed pcie-designware.c to reflect that.  But it
did not change the corresponding code in pci-keystone-dw.c, so it caused
crashes on Keystone:

  Unable to handle kernel NULL pointer dereference at virtual address 00000030
  pgd = c0003000
  [00000030] *pgd=80000800004003, *pmd=00000000
  Internal error: Oops: 206 [rib#1] PREEMPT SMP ARM
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.2-00139-gb74f926 rib#2
  Hardware name: Keystone
  PC is at ks_dw_pcie_msi_irq_unmask+0x24/0x58

Change pci-keystone-dw.c to expect sysdata to be the struct pcie_port
pointer.

[bhelgaas: changelog]
Fixes: cbce790 ("PCI: designware: Make driver arch-agnostic")
Signed-off-by: Murali Karicheri <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
CC: [email protected]	# v4.4+
CC: Zhou Wang <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Apr 12, 2016
While retesting the SRP initiator I ran the command "rmmod mlx4_ib"
while I/O was in progress. That command triggers SCSI device removal
indirectly. Avoid that this action triggers the following deadlock:

=================================
[ INFO: inconsistent lock state ]
4.6.0-rc0-dbg+ rib#2 Tainted: G           O
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
multipathd/484 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&pg->lock)->rlock){+.?...}, at: [<ffffffffa04f50a2>] alua_bus_detach+0x52/0xa0 [scsi_dh_alua]
{IN-SOFTIRQ-W} state was registered at:
  [<ffffffff810a64a9>] __lock_acquire+0x7e9/0x1ad0
  [<ffffffff810a7fd0>] lock_acquire+0x60/0x80
  [<ffffffff8159910e>] _raw_spin_lock_irqsave+0x3e/0x60
  [<ffffffffa04f5131>] alua_rtpg_queue+0x41/0x1d0 [scsi_dh_alua]
  [<ffffffffa04f5531>] alua_check+0xe1/0x220 [scsi_dh_alua]
  [<ffffffffa04f5709>] alua_check_sense+0x99/0xb0 [scsi_dh_alua]
  [<ffffffff813f0d01>] scsi_check_sense+0x71/0x3f0
  [<ffffffff813f2f8b>] scsi_decide_disposition+0x18b/0x1d0
  [<ffffffff813f6e52>] scsi_softirq_done+0x52/0x140
  [<ffffffff812a26f2>] blk_done_softirq+0x52/0x90
  [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230
  [<ffffffff8105bec8>] irq_exit+0xa8/0xb0
  [<ffffffff8101a675>] do_IRQ+0x65/0x110
  [<ffffffff8159a2c9>] ret_from_intr+0x0/0x19
  [<ffffffff811732f1>] kmem_cache_alloc+0x151/0x190
  [<ffffffff8118e534>] create_object+0x34/0x2d0
  [<ffffffff8158eaa6>] kmemleak_alloc_percpu+0x56/0xd0
  [<ffffffff8113ab0d>] pcpu_alloc+0x38d/0x660
  [<ffffffff8113aded>] __alloc_percpu_gfp+0xd/0x10
  [<ffffffff812e56a5>] __percpu_counter_init+0x55/0xb0
  [<ffffffff812b4989>] blkg_alloc+0x79/0x230
  [<ffffffff812b6756>] blkcg_init_queue+0x26/0x1d0
  [<ffffffff81297eed>] blk_alloc_queue_node+0x27d/0x2e0
  [<ffffffffa017766c>] dm_create+0x20c/0x570 [dm_mod]
  [<ffffffffa017e356>] dev_create+0x56/0x2c0 [dm_mod]
  [<ffffffffa017dcae>] ctl_ioctl+0x26e/0x520 [dm_mod]
  [<ffffffffa017df6e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
  [<ffffffff811aa8ee>] do_vfs_ioctl+0x8e/0x660
  [<ffffffff811aaefc>] SyS_ioctl+0x3c/0x70
  [<ffffffff81599929>] entry_SYSCALL_64_fastpath+0x1c/0xac
irq event stamp: 4290931
hardirqs last  enabled at (4290931): [ 1662.892772]
[<ffffffff81599341>] _raw_spin_unlock_irqrestore+0x31/0x50
hardirqs last disabled at (4290930): [<ffffffff815990e7>] _raw_spin_lock_irqsave+0x17/0x60
softirqs last  enabled at (4290774): [<ffffffff8105bcdb>] __do_softirq+0x1cb/0x230
softirqs last disabled at (4289831): [<ffffffff8105bec8>] irq_exit+0xa8/0xb0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&pg->lock)->rlock);
  <Interrupt>
    lock(&(&pg->lock)->rlock);

 *** DEADLOCK ***

2 locks held by multipathd/484:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff811d1cc3>] __blkdev_put+0x33/0x360
 rib#1:  (sd_ref_mutex){+.+...}, at: [<ffffffff81400afc>] scsi_disk_put+0x1c/0x40

stack backtrace:
CPU: 6 PID: 484 Comm: multipathd Tainted: G           O    4.6.0-rc0-dbg+ rib#2
Call Trace:
 [<ffffffff812bd115>] dump_stack+0x67/0x92
 [<ffffffff810a5175>] print_usage_bug+0x215/0x240
 [<ffffffff810a56ea>] mark_lock+0x54a/0x610
 [<ffffffff810a6505>] __lock_acquire+0x845/0x1ad0
 [<ffffffff810a7fd0>] lock_acquire+0x60/0x80
 [<ffffffff81598f23>] _raw_spin_lock+0x33/0x50
 [<ffffffffa04f50a2>] alua_bus_detach+0x52/0xa0 [scsi_dh_alua]
 [<ffffffff813ff6f7>] scsi_dh_release_device+0x17/0x50
 [<ffffffff813fb8da>] scsi_device_dev_release_usercontext+0x2a/0x120
 [<ffffffff810701f0>] execute_in_process_context+0x80/0x90
 [<ffffffff813fb8a7>] scsi_device_dev_release+0x17/0x20
 [<ffffffff813c8cfd>] device_release+0x2d/0x90
 [<ffffffff812bfa8a>] kobject_release+0x7a/0x190
 [<ffffffff812bf946>] kobject_put+0x26/0x50
 [<ffffffff813c8ee2>] put_device+0x12/0x20
 [<ffffffff813edc86>] scsi_device_put+0x26/0x30
 [<ffffffff81400b0d>] scsi_disk_put+0x2d/0x40
 [<ffffffff81400b68>] sd_release+0x48/0xb0
 [<ffffffff811d1f2e>] __blkdev_put+0x29e/0x360
 [<ffffffff811d24b9>] blkdev_put+0x49/0x170
 [<ffffffff811d2600>] blkdev_close+0x20/0x30
 [<ffffffff81198f48>] __fput+0xe8/0x1f0
 [<ffffffff81199089>] ____fput+0x9/0x10
 [<ffffffff81075d9e>] task_work_run+0x6e/0xa0
 [<ffffffff81001119>] exit_to_usermode_loop+0xa9/0xb0
 [<ffffffff81001590>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff815999b7>] entry_SYSCALL_64_fastpath+0xaa/0xac

Fixes: cb0a168 (scsi_dh_alua: update 'access_state' field)
Cc: Hannes Reinecke <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Laurence Oberman <[email protected]>
Reviewed-by: Hannes Reinicke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ewan Milne <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Apr 25, 2016
A fault in a user provided buffer may lead anywhere, and lockdep warns
that we have a potential deadlock between the mm->mmap_sem and the
kernfs file mutex:

[   82.811702] ======================================================
[   82.811705] [ INFO: possible circular locking dependency detected ]
[   82.811709] 4.5.0-rc4-gfxbench+ rib#1 Not tainted
[   82.811711] -------------------------------------------------------
[   82.811714] kms_setmode/5859 is trying to acquire lock:
[   82.811717]  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8150d9c1>] drm_gem_mmap+0x1a1/0x270
[   82.811731]
but task is already holding lock:
[   82.811734]  (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
[   82.811745]
which lock already depends on the new lock.

[   82.811749]
the existing dependency chain (in reverse order) is:
[   82.811752]
-> rib#3 (&mm->mmap_sem){++++++}:
[   82.811761]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[   82.811766]        [<ffffffff8118bc65>] __might_fault+0x75/0xa0
[   82.811771]        [<ffffffff8124da4a>] kernfs_fop_write+0x8a/0x180
[   82.811787]        [<ffffffff811d1023>] __vfs_write+0x23/0xe0
[   82.811792]        [<ffffffff811d1d74>] vfs_write+0xa4/0x190
[   82.811797]        [<ffffffff811d2c14>] SyS_write+0x44/0xb0
[   82.811801]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[   82.811807]
-> rib#2 (s_active#6){++++.+}:
[   82.811814]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[   82.811819]        [<ffffffff8124c070>] __kernfs_remove+0x210/0x2f0
[   82.811823]        [<ffffffff8124d040>] kernfs_remove_by_name_ns+0x40/0xa0
[   82.811828]        [<ffffffff8124e9e0>] sysfs_remove_file_ns+0x10/0x20
[   82.811832]        [<ffffffff815318d4>] device_del+0x124/0x250
[   82.811837]        [<ffffffff81531a19>] device_unregister+0x19/0x60
[   82.811841]        [<ffffffff8153c051>] cpu_cache_sysfs_exit+0x51/0xb0
[   82.811846]        [<ffffffff8153c628>] cacheinfo_cpu_callback+0x38/0x70
[   82.811851]        [<ffffffff8109ae89>] notifier_call_chain+0x39/0xa0
[   82.811856]        [<ffffffff8109aef9>] __raw_notifier_call_chain+0x9/0x10
[   82.811860]        [<ffffffff810786de>] cpu_notify+0x1e/0x40
[   82.811865]        [<ffffffff81078779>] cpu_notify_nofail+0x9/0x20
[   82.811869]        [<ffffffff81078ac3>] _cpu_down+0x233/0x340
[   82.811874]        [<ffffffff81079019>] disable_nonboot_cpus+0xc9/0x350
[   82.811878]        [<ffffffff810d2e11>] suspend_devices_and_enter+0x5a1/0xb50
[   82.811883]        [<ffffffff810d3903>] pm_suspend+0x543/0x8d0
[   82.811888]        [<ffffffff810d1b77>] state_store+0x77/0xe0
[   82.811892]        [<ffffffff813fa68f>] kobj_attr_store+0xf/0x20
[   82.811897]        [<ffffffff8124e740>] sysfs_kf_write+0x40/0x50
[   82.811902]        [<ffffffff8124dafc>] kernfs_fop_write+0x13c/0x180
[   82.811906]        [<ffffffff811d1023>] __vfs_write+0x23/0xe0
[   82.811910]        [<ffffffff811d1d74>] vfs_write+0xa4/0x190
[   82.811914]        [<ffffffff811d2c14>] SyS_write+0x44/0xb0
[   82.811918]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[   82.811923]
-> rib#1 (cpu_hotplug.lock){+.+.+.}:
[   82.811929]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[   82.811933]        [<ffffffff817b6f72>] mutex_lock_nested+0x62/0x3b0
[   82.811940]        [<ffffffff810784c1>] get_online_cpus+0x61/0x80
[   82.811944]        [<ffffffff811170eb>] stop_machine+0x1b/0xe0
[   82.811949]        [<ffffffffa0178edd>] gen8_ggtt_insert_entries__BKL+0x2d/0x30 [i915]
[   82.812009]        [<ffffffffa017d3a6>] ggtt_bind_vma+0x46/0x70 [i915]
[   82.812045]        [<ffffffffa017eb70>] i915_vma_bind+0x140/0x290 [i915]
[   82.812081]        [<ffffffffa01862b9>] i915_gem_object_do_pin+0x899/0xb00 [i915]
[   82.812117]        [<ffffffffa0186555>] i915_gem_object_pin+0x35/0x40 [i915]
[   82.812154]        [<ffffffffa019a23e>] intel_init_pipe_control+0xbe/0x210 [i915]
[   82.812192]        [<ffffffffa0197312>] intel_logical_rings_init+0xe2/0xde0 [i915]
[   82.812232]        [<ffffffffa0186fe3>] i915_gem_init+0xf3/0x130 [i915]
[   82.812278]        [<ffffffffa02097ed>] i915_driver_load+0xf2d/0x1770 [i915]
[   82.812318]        [<ffffffff81512474>] drm_dev_register+0xa4/0xb0
[   82.812323]        [<ffffffff8151467e>] drm_get_pci_dev+0xce/0x1e0
[   82.812328]        [<ffffffffa01472cf>] i915_pci_probe+0x2f/0x50 [i915]
[   82.812360]        [<ffffffff8143f907>] pci_device_probe+0x87/0xf0
[   82.812366]        [<ffffffff81535f89>] driver_probe_device+0x229/0x450
[   82.812371]        [<ffffffff81536233>] __driver_attach+0x83/0x90
[   82.812375]        [<ffffffff81533c61>] bus_for_each_dev+0x61/0xa0
[   82.812380]        [<ffffffff81535879>] driver_attach+0x19/0x20
[   82.812384]        [<ffffffff8153535f>] bus_add_driver+0x1ef/0x290
[   82.812388]        [<ffffffff81536e9b>] driver_register+0x5b/0xe0
[   82.812393]        [<ffffffff8143e83b>] __pci_register_driver+0x5b/0x60
[   82.812398]        [<ffffffff81514866>] drm_pci_init+0xd6/0x100
[   82.812402]        [<ffffffffa027c094>] 0xffffffffa027c094
[   82.812406]        [<ffffffff810003de>] do_one_initcall+0xae/0x1d0
[   82.812412]        [<ffffffff811595a0>] do_init_module+0x5b/0x1cb
[   82.812417]        [<ffffffff81106160>] load_module+0x1c20/0x2480
[   82.812422]        [<ffffffff81106bae>] SyS_finit_module+0x7e/0xa0
[   82.812428]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[   82.812433]
-> #0 (&dev->struct_mutex){+.+.+.}:
[   82.812439]        [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
[   82.812443]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[   82.812456]        [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
[   82.812460]        [<ffffffff81196a14>] mmap_region+0x334/0x580
[   82.812466]        [<ffffffff81196fc4>] do_mmap+0x364/0x410
[   82.812470]        [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
[   82.812474]        [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
[   82.812479]        [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
[   82.812484]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[   82.812489]
other info that might help us debug this:

[   82.812493] Chain exists of:
  &dev->struct_mutex --> s_active#6 --> &mm->mmap_sem

[   82.812502]  Possible unsafe locking scenario:

[   82.812506]        CPU0                    CPU1
[   82.812508]        ----                    ----
[   82.812510]   lock(&mm->mmap_sem);
[   82.812514]                                lock(s_active#6);
[   82.812519]                                lock(&mm->mmap_sem);
[   82.812522]   lock(&dev->struct_mutex);
[   82.812526]
 *** DEADLOCK ***

[   82.812531] 1 lock held by kms_setmode/5859:
[   82.812533]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
[   82.812541]
stack backtrace:
[   82.812547] CPU: 0 PID: 5859 Comm: kms_setmode Not tainted 4.5.0-rc4-gfxbench+ rib#1
[   82.812550] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0040.2015.0814.1353 08/14/2015
[   82.812553]  0000000000000000 ffff880079407bf0 ffffffff813f8505 ffffffff825fb270
[   82.812560]  ffffffff825c4190 ffff880079407c30 ffffffff810c84ac ffff880079407c90
[   82.812566]  ffff8800797ed328 ffff8800797ecb00 0000000000000001 ffff8800797ed350
[   82.812573] Call Trace:
[   82.812578]  [<ffffffff813f8505>] dump_stack+0x67/0x92
[   82.812582]  [<ffffffff810c84ac>] print_circular_bug+0x1fc/0x310
[   82.812586]  [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
[   82.812590]  [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[   82.812594]  [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
[   82.812599]  [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
[   82.812603]  [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
[   82.812608]  [<ffffffff81196a14>] mmap_region+0x334/0x580
[   82.812612]  [<ffffffff81196fc4>] do_mmap+0x364/0x410
[   82.812616]  [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
[   82.812629]  [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
[   82.812633]  [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
[   82.812637]  [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73

Highly unlikely though this scenario is, we can avoid the issue entirely
by moving the copy operation from out under the kernfs_get_active()
tracking by assigning the preallocated buffer its own mutex. The
temporary buffer allocation doesn't require mutex locking as it is
entirely local.

The locked section was extended by the addition of the preallocated buf
to speed up md user operations in

commit 2b75869
Author: NeilBrown <[email protected]>
Date:   Mon Oct 13 16:41:28 2014 +1100

    sysfs/kernfs: allow attributes to request write buffer be pre-allocated.

Reported-by: Ville Syrjälä <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Joonas Lahtinen <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: [email protected]
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Joonas Lahtinen <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Apr 26, 2016
Jon Maloy says:

====================
tipc: name distributor pernet queue

Commit rib#1 fixes a potential issue with deferred binding table
updates being pushed to the wrong namespace.

Commit rib#2 solves a problem with deferred binding table updates
remaining in the the defer queue after the issuing node has gone
down.
====================

Signed-off-by: David S. Miller <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Apr 26, 2016
Tracing a workload that uses transactions gave a seg fault as follows:

  perf record -e intel_pt// workload
  perf report
  Program received signal SIGSEGV, Segmentation fault.
  0x000000000054b58c in intel_pt_reset_last_branch_rb (ptq=0x1a36110)
  	at util/intel-pt.c:929
  929 ptq->last_branch_rb->nr = 0;
  (gdb) p ptq->last_branch_rb
  $1 = (struct branch_stack *) 0x0
  (gdb) up
  1148 intel_pt_reset_last_branch_rb(ptq);
  (gdb) l
  1143 if (ret)
  1144 pr_err("Intel Processor Trace: failed to deliver transaction event
  1145 ret);
  1146
  1147 if (pt->synth_opts.callchain)
  1148 intel_pt_reset_last_branch_rb(ptq);
  1149
  1150 return ret;
  1151 }
  1152
  (gdb) p pt->synth_opts.callchain
  $2 = true
  (gdb)
  (gdb) bt
   #0 0x000000000054b58c in intel_pt_reset_last_branch_rb (ptq=0x1a36110)
   rib#1 0x000000000054c1e0 in intel_pt_synth_transaction_sample (ptq=0x1a36110)
   rib#2 0x000000000054c5b2 in intel_pt_sample (ptq=0x1a36110)

Caused by checking the 'callchain' flag when it should have been the
'last_branch' flag.  Fix that.

Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected] # v4.4+
Fixes: f14445e ("perf intel-pt: Support generating branch stack")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 5, 2016
When I was writing an atomic wrapper for rmfb, I ran into the
following backtrace from lockdep:

=============================================
[ INFO: possible recursive locking detected ]
4.5.0-patser+ #4696 Tainted: G     U
---------------------------------------------
kworker/2:2/2608 is trying to acquire lock:
 (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffc00c9ddc>] drm_modeset_lock+0x7c/0x120 [drm]

but task is already holding lock:
 (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffc00c98cd>] modeset_backoff+0x8d/0x220 [drm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(crtc_ww_class_mutex);
  lock(crtc_ww_class_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by kworker/2:2/2608:
 #0:  ("events"){.+.+.+}, at: [<ffffffff810a5eea>] process_one_work+0x15a/0x6c0
 rib#1:  ((&arg.work)){+.+.+.}, at: [<ffffffff810a5eea>] process_one_work+0x15a/0x6c0
 rib#2:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffc004532a>] drm_atomic_helper_remove_fb+0x4a/0x1d0 [drm_kms_helper]
 rib#3:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffc00c98cd>] modeset_backoff+0x8d/0x220 [drm]

While lockdep probably catches this bug when it happens, it's better
to explicitly warn when state->acquire_ctx is not set.

Signed-off-by: Maarten Lankhorst <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/1462266751-29123-1-git-send-email-maarten.lankhorst@linux.intel.com
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
We must not attempt to send WMI packets while holding the data-lock,
as it may deadlock:

BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant

=============================================
[ INFO: possible recursive locking detected ]
4.4.6+ rib#21 Tainted: G        W  O
---------------------------------------------
wpa_supplicant/2878 is trying to acquire lock:
 (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]

but task is already holding lock:
 (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&ar->data_lock)->rlock);
  lock(&(&ar->data_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by wpa_supplicant/2878:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816493ca>] rtnl_lock+0x12/0x14
 rib#1:  (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0706932>] ath10k_add_interface+0x3b/0xbda [ath10k_core]
 rib#2:  (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
 rib#3:  (rcu_read_lock){......}, at: [<ffffffffa062f304>] rcu_read_lock+0x0/0x66 [mac80211]

stack backtrace:
CPU: 3 PID: 2878 Comm: wpa_supplicant Tainted: G        W  O    4.4.6+ rib#21
Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
 0000000000000000 ffff8801fcadf8f0 ffffffff8137086d ffffffff82681720
 ffffffff82681720 ffff8801fcadf9b0 ffffffff8112e3be ffff8801fcadf920
 0000000100000000 ffffffff82681720 ffffffffa0721500 ffff8801fcb8d348
Call Trace:
 [<ffffffff8137086d>] dump_stack+0x81/0xb6
 [<ffffffff8112e3be>] __lock_acquire+0xc5b/0xde7
 [<ffffffffa0721500>] ? ath10k_wmi_tx_beacons_iter+0x15/0x11a [ath10k_core]
 [<ffffffff8112d0d0>] ? mark_lock+0x24/0x201
 [<ffffffff8112e908>] lock_acquire+0x132/0x1cb
 [<ffffffff8112e908>] ? lock_acquire+0x132/0x1cb
 [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
 [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
 [<ffffffff816f9e2b>] _raw_spin_lock_bh+0x31/0x40
 [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
 [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
 [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
 [<ffffffffa062eb18>] __iterate_interfaces+0x9d/0x13d [mac80211]
 [<ffffffffa062f609>] ieee80211_iterate_active_interfaces_atomic+0x32/0x3e [mac80211]
 [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
 [<ffffffffa071fa9f>] ath10k_wmi_tx_beacons_nowait.isra.13+0x14/0x16 [ath10k_core]
 [<ffffffffa0721676>] ath10k_wmi_cmd_send+0x71/0x242 [ath10k_core]
 [<ffffffffa07023f6>] ath10k_wmi_peer_delete+0x3f/0x42 [ath10k_core]
 [<ffffffffa0702557>] ath10k_peer_create+0x15e/0x1ae [ath10k_core]
 [<ffffffffa0707004>] ath10k_add_interface+0x70d/0xbda [ath10k_core]
 [<ffffffffa05fffcc>] drv_add_interface+0x123/0x1a5 [mac80211]
 [<ffffffffa061554b>] ieee80211_do_open+0x351/0x667 [mac80211]
 [<ffffffffa06158aa>] ieee80211_open+0x49/0x4c [mac80211]
 [<ffffffff8163ecf9>] __dev_open+0x88/0xde
 [<ffffffff8163ef6e>] __dev_change_flags+0xa4/0x13a
 [<ffffffff8163f023>] dev_change_flags+0x1f/0x54
 [<ffffffff816a5532>] devinet_ioctl+0x2b9/0x5c9
 [<ffffffff816514dd>] ? copy_to_user+0x32/0x38
 [<ffffffff816a6115>] inet_ioctl+0x81/0x9d
 [<ffffffff816a6115>] ? inet_ioctl+0x81/0x9d
 [<ffffffff81621cf8>] sock_do_ioctl+0x20/0x3d
 [<ffffffff816223c4>] sock_ioctl+0x222/0x22e
 [<ffffffff8121cf95>] do_vfs_ioctl+0x453/0x4d7
 [<ffffffff81625603>] ? __sys_recvmsg+0x4c/0x5b
 [<ffffffff81225af1>] ? __fget_light+0x48/0x6c
 [<ffffffff8121d06b>] SyS_ioctl+0x52/0x74
 [<ffffffff816fa736>] entry_SYSCALL_64_fastpath+0x16/0x7a

Signed-off-by: Ben Greear <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
Nicolas Dichtel says:

====================
ovs: fix rtnl notifications on interface deletion

There was no rtnl notifications for interfaces (gre, vxlan, geneve) created
by ovs. This problem is fixed by adjusting the creation path.

v1 -> v2:
 - add patch rib#1 and rib#4
 - rework error handling in patch rib#2
====================

Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
When run tipcTS&tipcTC test suite, the following complaint appears:

[   56.926168] ===============================
[   56.926169] [ INFO: suspicious RCU usage. ]
[   56.926171] 4.7.0-rc1+ torvalds#160 Not tainted
[   56.926173] -------------------------------
[   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage!
[   56.926175]
[   56.926175] other info that might help us debug this:
[   56.926175]
[   56.926177]
[   56.926177] rcu_scheduler_active = 1, debug_locks = 1
[   56.926179] 3 locks held by swapper/4/0:
[   56.926180]  #0:  (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340
[   56.926203]  rib#1:  (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc]
[   56.926212]  rib#2:  (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926218]
[   56.926218] stack backtrace:
[   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ torvalds#160
[   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   56.926224]  0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0
[   56.926227]  0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120
[   56.926230]  ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88
[   56.926234] Call Trace:
[   56.926235]  <IRQ>  [<ffffffff813c4423>] dump_stack+0x67/0x94
[   56.926250]  [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120
[   56.926256]  [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc]
[   56.926261]  [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
[   56.926266]  [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926273]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926278]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926283]  [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc]
[   56.926288]  [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340
[   56.926291]  [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340
[   56.926296]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926300]  [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390
[   56.926306]  [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130
[   56.926316]  [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2
[   56.926323]  [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0
[   56.926327]  [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60
[   56.926331]  [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90
[   56.926333]  <EOI>  [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0
[   56.926340]  [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0
[   56.926342]  [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20
[   56.926345]  [<ffffffff810adf0f>] default_idle_call+0x2f/0x50
[   56.926347]  [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0
[   56.926353]  [<ffffffff81040ad9>] start_secondary+0xf9/0x100

The warning appears as rtnl_dereference() is wrongly used in
tipc_l2_send_msg() under RCU read lock protection. Instead the proper
usage should be that rcu_dereference_rtnl() is called here.

Fixes: 5b7066c ("tipc: stricter filtering of packets in bearer layer")
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
On s390, there are two different hardware PMUs for counting and
sampling.  Previously, both PMUs have shared the perf_hw_context
which is not correct and, recently, results in this warning:

    ------------[ cut here ]------------
    WARNING: CPU: 5 PID: 1 at kernel/events/core.c:8485 perf_pmu_register+0x420/0x428
    Modules linked in:
    CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc1+ rib#2
    task: 00000009c5240000 ti: 00000009c5234000 task.ti: 00000009c5234000
    Krnl PSW : 0704c00180000000 0000000000220c50 (perf_pmu_register+0x420/0x428)
               R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    Krnl GPRS: ffffffffffffffff 0000000000b15ac6 0000000000000000 00000009cb440000
               000000000022087a 0000000000000000 0000000000b78fa0 0000000000000000
               0000000000a9aa90 0000000000000084 0000000000000005 000000000088a97a
               0000000000000004 0000000000749dd0 000000000022087a 00000009c5237cc0
    Krnl Code: 0000000000220c44: a7f4ff54            brc     15,220aec
               0000000000220c48: 92011000           mvi     0(%r1),1
              #0000000000220c4c: a7f40001           brc     15,220c4e
              >0000000000220c50: a7f4ff12           brc     15,220a74
               0000000000220c54: 0707               bcr     0,%r7
               0000000000220c56: 0707               bcr     0,%r7
               0000000000220c58: ebdff0800024       stmg    %r13,%r15,128(%r15)
               0000000000220c5e: a7f13fe0           tmll    %r15,16352
    Call Trace:
    ([<000000000022087a>] perf_pmu_register+0x4a/0x428)
    ([<0000000000b2c25c>] init_cpum_sampling_pmu+0x14c/0x1f8)
    ([<0000000000100248>] do_one_initcall+0x48/0x140)
    ([<0000000000b25d26>] kernel_init_freeable+0x1e6/0x2a0)
    ([<000000000072bda4>] kernel_init+0x24/0x138)
    ([<000000000073495e>] kernel_thread_starter+0x6/0xc)
    ([<0000000000734958>] kernel_thread_starter+0x0/0xc)
    Last Breaking-Event-Address:
     [<0000000000220c4c>] perf_pmu_register+0x41c/0x428
    ---[ end trace 0c6ef9f5b771ad97 ]---

Using the perf_sw_context is an option because the cpum_cf PMU does
not use interrupts.  To make this more clear, initialize the
capabilities in the PMU structure.

Signed-off-by: Hendrik Brueckner <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
While testing the deadline scheduler + cgroup setup I hit this
warning.

[  132.612935] ------------[ cut here ]------------
[  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[  132.612952] Modules linked in: (a ton of modules...)
[  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 rib#2
[  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[  132.612986] Call Trace:
[  132.612987]  <IRQ>  [<ffffffff813d229e>] dump_stack+0x63/0x85
[  132.612994]  [<ffffffff810a652b>] __warn+0xcb/0xf0
[  132.612997]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[  132.612999]  [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20
[  132.613000]  [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80
[  132.613008]  [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20
[  132.613010]  [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10
[  132.613015]  [<ffffffff811388ac>] put_css_set+0x5c/0x60
[  132.613016]  [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0
[  132.613017]  [<ffffffff810a3912>] __put_task_struct+0x42/0x140
[  132.613018]  [<ffffffff810e776a>] dl_task_timer+0xca/0x250
[  132.613027]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[  132.613030]  [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270
[  132.613031]  [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190
[  132.613034]  [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60
[  132.613035]  [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50
[  132.613037]  [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0
[  132.613038]  <EOI>  [<ffffffff81063466>] ? native_safe_halt+0x6/0x10
[  132.613043]  [<ffffffff81037a4e>] default_idle+0x1e/0xd0
[  132.613044]  [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20
[  132.613046]  [<ffffffff810e8fda>] default_idle_call+0x2a/0x40
[  132.613047]  [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340
[  132.613048]  [<ffffffff81050235>] start_secondary+0x155/0x190
[  132.613049] ---[ end trace f91934d162ce9977 ]---

The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
this problem - and other problems of sharing a spinlock with an
interrupt.

Cc: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected] # 4.5+
Cc: [email protected]
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: "Luis Claudio R. Goncalves" <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Zefan Li <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
mem_cgroup_migrate() uses local_irq_disable/enable() but can be called
with irq disabled from migrate_page_copy().  This ends up enabling irq
while holding a irq context lock triggering the following lockdep
warning.  Fix it by using irq_save/restore instead.

  =================================
  [ INFO: inconsistent lock state ]
  4.7.0-rc1+ #52 Tainted: G        W
  ---------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  kcompactd0/151 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (&(&ctx->completion_lock)->rlock){+.?.-.}, at: [<000000000038fd96>] aio_migratepage+0x156/0x1e8
  {IN-SOFTIRQ-W} state was registered at:
     __lock_acquire+0x5b6/0x1930
     lock_acquire+0xee/0x270
     _raw_spin_lock_irqsave+0x66/0xb0
     aio_complete+0x98/0x328
     dio_complete+0xe4/0x1e0
     blk_update_request+0xd4/0x450
     scsi_end_request+0x48/0x1c8
     scsi_io_completion+0x272/0x698
     blk_done_softirq+0xca/0xe8
     __do_softirq+0xc8/0x518
     irq_exit+0xee/0x110
     do_IRQ+0x6a/0x88
     io_int_handler+0x11a/0x25c
     __mutex_unlock_slowpath+0x144/0x1d8
     __mutex_unlock_slowpath+0x140/0x1d8
     kernfs_iop_permission+0x64/0x80
     __inode_permission+0x9e/0xf0
     link_path_walk+0x6e/0x510
     path_lookupat+0xc4/0x1a8
     filename_lookup+0x9c/0x160
     user_path_at_empty+0x5c/0x70
     SyS_readlinkat+0x68/0x140
     system_call+0xd6/0x270
  irq event stamp: 971410
  hardirqs last  enabled at (971409):  migrate_page_move_mapping+0x3ea/0x588
  hardirqs last disabled at (971410):  _raw_spin_lock_irqsave+0x3c/0xb0
  softirqs last  enabled at (970526):  __do_softirq+0x460/0x518
  softirqs last disabled at (970519):  irq_exit+0xee/0x110

  other info that might help us debug this:
   Possible unsafe locking scenario:

	 CPU0
	 ----
    lock(&(&ctx->completion_lock)->rlock);
    <Interrupt>
      lock(&(&ctx->completion_lock)->rlock);

    *** DEADLOCK ***

  3 locks held by kcompactd0/151:
   #0:  (&(&mapping->private_lock)->rlock){+.+.-.}, at:  aio_migratepage+0x42/0x1e8
   rib#1:  (&ctx->ring_lock){+.+.+.}, at:  aio_migratepage+0x5a/0x1e8
   rib#2:  (&(&ctx->completion_lock)->rlock){+.?.-.}, at:  aio_migratepage+0x156/0x1e8

  stack backtrace:
  CPU: 20 PID: 151 Comm: kcompactd0 Tainted: G        W       4.7.0-rc1+ #52
  Call Trace:
    show_trace+0xea/0xf0
    show_stack+0x72/0xf0
    dump_stack+0x9a/0xd8
    print_usage_bug.part.27+0x2d4/0x2e8
    mark_lock+0x17e/0x758
    mark_held_locks+0xa2/0xd0
    trace_hardirqs_on_caller+0x140/0x1c0
    mem_cgroup_migrate+0x266/0x370
    aio_migratepage+0x16a/0x1e8
    move_to_new_page+0xb0/0x260
    migrate_pages+0x8f4/0x9f0
    compact_zone+0x4dc/0xdc8
    kcompactd_do_work+0x1aa/0x358
    kcompactd+0xba/0x2c8
    kthread+0x10a/0x110
    kernel_thread_starter+0x6/0xc
    kernel_thread_starter+0x0/0xc
  INFO: lockdep is turned off.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/g/[email protected]
Fixes: 74485cf ("mm: migrate: consolidate mem_cgroup_migrate() calls")
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Christian Borntraeger <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Vladimir Davydov <[email protected]>
Cc: <[email protected]>	[4.5+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
An ENOMEM when creating a pair tty in tty_ldisc_setup causes a null
pointer dereference in devpts_kill_index because tty->link->driver_data
is NULL.  The oops was triggered with the pty stressor in stress-ng when
in a low memory condition.

tty_init_dev tries to clean up a tty_ldisc_setup ENOMEM error by calling
release_tty, however, this ultimately tries to clean up the NULL pair'd
tty in pty_unix98_remove, triggering the Oops.

Add check to pty_unix98_remove to only clean up fsi if it is not NULL.

Ooops:

[   23.020961] Oops: 0000 [rib#1] SMP
[   23.020976] Modules linked in: ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec parport_pc snd_hda_core snd_hwdep parport snd_pcm input_leds joydev snd_timer serio_raw snd soundcore i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel qxl aes_x86_64 ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect psmouse sysimgblt floppy fb_sys_fops drm pata_acpi jitterentropy_rng drbg ansi_cprng
[   23.020978] CPU: 0 PID: 1452 Comm: stress-ng-pty Not tainted 4.7.0-rc4+ rib#2
[   23.020978] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   23.020979] task: ffff88007ba30000 ti: ffff880078ea8000 task.ti: ffff880078ea8000
[   23.020981] RIP: 0010:[<ffffffff813f11ff>]  [<ffffffff813f11ff>] ida_remove+0x1f/0x120
[   23.020981] RSP: 0018:ffff880078eabb60  EFLAGS: 00010a03
[   23.020982] RAX: 4444444444444567 RBX: 0000000000000000 RCX: 000000000000001f
[   23.020982] RDX: 000000000000014c RSI: 000000000000026f RDI: 0000000000000000
[   23.020982] RBP: ffff880078eabb70 R08: 0000000000000004 R09: 0000000000000036
[   23.020983] R10: 000000000000026f R11: 0000000000000000 R12: 000000000000026f
[   23.020983] R13: 000000000000026f R14: ffff88007c944b40 R15: 000000000000026f
[   23.020984] FS:  00007f9a2f3cc700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[   23.020984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.020985] CR2: 0000000000000010 CR3: 000000006c81b000 CR4: 00000000001406f0
[   23.020988] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.020988] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.020988] Stack:
[   23.020989]  0000000000000000 000000000000026f ffff880078eabb90 ffffffff812a5a99
[   23.020990]  0000000000000000 00000000fffffff4 ffff880078eabba8 ffffffff814f9cbe
[   23.020991]  ffff88007965c800 ffff880078eabbc8 ffffffff814eef43 fffffffffffffff4
[   23.020991] Call Trace:
[   23.021000]  [<ffffffff812a5a99>] devpts_kill_index+0x29/0x50
[   23.021002]  [<ffffffff814f9cbe>] pty_unix98_remove+0x2e/0x50
[   23.021006]  [<ffffffff814eef43>] release_tty+0xb3/0x1b0
[   23.021007]  [<ffffffff814f18d4>] tty_init_dev+0xd4/0x1c0
[   23.021011]  [<ffffffff814f9fae>] ptmx_open+0xae/0x190
[   23.021013]  [<ffffffff812254ef>] chrdev_open+0xbf/0x1b0
[   23.021015]  [<ffffffff8121d973>] do_dentry_open+0x203/0x310
[   23.021016]  [<ffffffff81225430>] ? cdev_put+0x30/0x30
[   23.021017]  [<ffffffff8121ee44>] vfs_open+0x54/0x80
[   23.021018]  [<ffffffff8122b8fc>] ? may_open+0x8c/0x100
[   23.021019]  [<ffffffff8122f26b>] path_openat+0x2eb/0x1440
[   23.021020]  [<ffffffff81230534>] ? putname+0x54/0x60
[   23.021022]  [<ffffffff814f6f97>] ? n_tty_ioctl_helper+0x27/0x100
[   23.021023]  [<ffffffff81231651>] do_filp_open+0x91/0x100
[   23.021024]  [<ffffffff81230596>] ? getname_flags+0x56/0x1f0
[   23.021026]  [<ffffffff8123fc66>] ? __alloc_fd+0x46/0x190
[   23.021027]  [<ffffffff8121f1e4>] do_sys_open+0x124/0x210
[   23.021028]  [<ffffffff8121f2ee>] SyS_open+0x1e/0x20
[   23.021035]  [<ffffffff81845576>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[   23.021044] Code: 63 28 45 31 e4 eb dd 0f 1f 44 00 00 55 4c 63 d6 48 ba 89 88 88 88 88 88 88 88 4c 89 d0 b9 1f 00 00 00 48 f7 e2 48 89 e5 41 54 53 <8b> 47 10 48 89 fb 8d 3c c5 00 00 00 00 48 c1 ea 09 b8 01 00 00
[   23.021045] RIP  [<ffffffff813f11ff>] ida_remove+0x1f/0x120
[   23.021045]  RSP <ffff880078eabb60>
[   23.021046] CR2: 0000000000000010

Signed-off-by: Colin Ian King <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.

Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.

Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.

This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()

  Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
  Oops: Bad kernel stack pointer, sig: 6 [rib#1]
  CPU: 0 PID: 2006 Comm: tm-execed Not tainted
  NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c00000003ffefd40 TRAP: 0700   Not tainted
  MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]>  CR: 00000000  XER: 00000000
  CFAR: c0000000000098b4 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
  NIP [c000000000009980] fast_exception_return+0xb0/0xb8
  LR [0000000000000000]           (null)
  Call Trace:
  Instruction dump:
  f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
  e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b

  Kernel BUG at c000000000043e80 [verbose debug info unavailable]
  Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
  Oops: Unrecoverable exception, sig: 6 [rib#2]
  CPU: 0 PID: 2006 Comm: tm-execed Tainted: G      D
  task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
  NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
  REGS: c00000003ffef7e0 TRAP: 0700   Tainted: G      D
  MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]>  CR: 28002828  XER: 00000000
  CFAR: c000000000015a20 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
  GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
  GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
  GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
  GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
  GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
  NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
  LR [c000000000015a24] __switch_to+0x1f4/0x420
  Call Trace:
  Instruction dump:
  7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
  4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020

This fixes CVE-2016-5828.

Fixes: bc2a940 ("powerpc: Hook in new transactional memory code")
Cc: [email protected] # v3.9+
Signed-off-by: Cyril Bur <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jul 18, 2016
The USB core contains a bug that can show up when a USB-3 host
controller is removed.  If the primary (USB-2) hcd structure is
released before the shared (USB-3) hcd, the core will try to do a
double-free of the common bandwidth_mutex.

The problem was described in graphical form by Chung-Geol Kim, who
first reported it:

=================================================
     At *remove USB(3.0) Storage
     sequence <1> --> <5> ((Problem Case))
=================================================
                                  VOLD
------------------------------------|------------
                                 (uevent)
                            ________|_________
                           |<1>               |
                           |dwc3_otg_sm_work  |
                           |usb_put_hcd       |
                           |peer_hcd(kref=2)|
                           |__________________|
                            ________|_________
                           |<2>               |
                           |New USB BUS rib#2    |
                           |                  |
                           |peer_hcd(kref=1)  |
                           |                  |
                         --(Link)-bandXX_mutex|
                         | |__________________|
                         |
    ___________________  |
   |<3>                | |
   |dwc3_otg_sm_work   | |
   |usb_put_hcd        | |
   |primary_hcd(kref=1)| |
   |___________________| |
    _________|_________  |
   |<4>                | |
   |New USB BUS rib#1     | |
   |hcd_release        | |
   |primary_hcd(kref=0)| |
   |                   | |
   |bandXX_mutex(free) |<-
   |___________________|
                               (( VOLD ))
                            ______|___________
                           |<5>               |
                           |      SCSI        |
                           |usb_put_hcd       |
                           |peer_hcd(kref=0)  |
                           |*hcd_release      |
                           |bandXX_mutex(free*)|<- double free
                           |__________________|

=================================================

This happens because hcd_release() frees the bandwidth_mutex whenever
it sees a primary hcd being released (which is not a very good idea
in any case), but in the course of releasing the primary hcd, it
changes the pointers in the shared hcd in such a way that the shared
hcd will appear to be primary when it gets released.

This patch fixes the problem by changing hcd_release() so that it
deallocates the bandwidth_mutex only when the _last_ hcd structure
referencing it is released.  The patch also removes an unnecessary
test, so that when an hcd is released, both the shared_hcd and
primary_hcd pointers in the hcd's peer will be cleared.

Signed-off-by: Alan Stern <[email protected]>
Reported-by: Chung-Geol Kim <[email protected]>
Tested-by: Chung-Geol Kim <[email protected]>
CC: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Sep 2, 2016
The qp init function does a kzalloc() while holding the RCU
lock that encounters the following warning with a debug kernel
when a cat of the qp_stats is done:

[  231.723948] rcu_scheduler_active = 1, debug_locks = 0
[  231.731939] 3 locks held by cat/11355:
[  231.736492]  #0:  (debugfs_srcu){......}, at: [<ffffffff813001a5>] debugfs_use_file_start+0x5/0x90
[  231.746955]  rib#1:  (&p->lock){+.+.+.}, at: [<ffffffff81289a6c>] seq_read+0x4c/0x3c0
[  231.755873]  rib#2:  (rcu_read_lock){......}, at: [<ffffffffa0a0c535>] _qp_stats_seq_start+0x5/0xd0 [hfi1]
[  231.766862]

The init functions do an implicit next which requires the rcu read lock
before the kzalloc().

Fix for both drivers is to change the scope of the init function to only
do the allocation and the initialization of the just allocated iter.

The implict next is moved back into the respective start functions to fix
the issue.

Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
CC: <[email protected]> # 4.6.x-
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Sep 2, 2016
Commit 44f714d ("Btrfs: improve performance on fsync against new
inode after rename/unlink"), which landed in 4.8-rc2, introduced a
possibility for a deadlock due to double locking of an inode's log mutex
by the same task, which lockdep reports with:

[23045.433975] =============================================
[23045.434748] [ INFO: possible recursive locking detected ]
[23045.435426] 4.7.0-rc6-btrfs-next-34+ rib#1 Not tainted
[23045.436044] ---------------------------------------------
[23045.436044] xfs_io/3688 is trying to acquire lock:
[23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               but task is already holding lock:
[23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               other info that might help us debug this:
[23045.436044]  Possible unsafe locking scenario:

[23045.436044]        CPU0
[23045.436044]        ----
[23045.436044]   lock(&ei->log_mutex);
[23045.436044]   lock(&ei->log_mutex);
[23045.436044]
                *** DEADLOCK ***

[23045.436044]  May be due to missing lock nesting notation

[23045.436044] 3 locks held by xfs_io/3688:
[23045.436044]  #0:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs]
[23045.436044]  rib#1:  (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0
[23045.436044]  rib#2:  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               stack backtrace:
[23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ rib#1
[23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[23045.436044]  0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70
[23045.436044]  ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68
[23045.436044]  0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68
[23045.436044] Call Trace:
[23045.436044]  [<ffffffff8127074d>] dump_stack+0x67/0x90
[23045.436044]  [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e
[23045.436044]  [<ffffffff8109155f>] ? mark_lock+0x24/0x201
[23045.436044]  [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74
[23045.436044]  [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3
[23045.436044]  [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs]
[23045.436044]  [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465
[23045.436044]  [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs]
[23045.436044]  [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs]
[23045.436044]  [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs]
[23045.436044]  [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs]
[23045.436044]  [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs]
[23045.436044]  [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e
[23045.436044]  [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e
[23045.436044]  [<ffffffff811acc79>] do_fsync+0x31/0x4a
[23045.436044]  [<ffffffff811ace99>] SyS_fsync+0x10/0x14
[23045.436044]  [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[23045.436044]  [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa

An example reproducer for this is:

   $ mkfs.btrfs -f /dev/sdb
   $ mount /dev/sdb /mnt
   $ mkdir /mnt/dir
   $ touch /mnt/dir/foo
   $ sync
   $ mv /mnt/dir/foo /mnt/dir/bar
   $ touch /mnt/dir/foo
   $ xfs_io -c "fsync" /mnt/dir/bar

This is because while logging the inode of file bar we end up logging its
parent directory (since its inode has an unlink_trans field matching the
current transaction id due to the rename operation), which in turn logs
the inodes for all its new dentries, so that the new inode for the new
file named foo gets logged which in turn triggered another logging attempt
for the inode we are fsync'ing, since that inode had an old name that
corresponds to the name of the new inode.

So fix this by ensuring that when logging the inode for a new dentry that
has a name matching an old name of some other inode, we don't log again
the original inode that we are fsync'ing.

Fixes: 44f714d ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
…utex

Remove circular lock dependency by using atomic version of interfaces
iterate in watch_dog_work(), hence avoid taking local->iflist_mtx
(rtw_vif_watch_dog_iter() only update some data, it can be called from
atomic context). Fixes below LOCKDEP warning:

[ 1157.219415] ======================================================
[ 1157.225772] [ INFO: possible circular locking dependency detected ]
[ 1157.232150] 3.10.0-1043.el7.sgruszka1.x86_64.debug rib#1 Not tainted
[ 1157.238346] -------------------------------------------------------
[ 1157.244635] kworker/u4:2/14490 is trying to acquire lock:
[ 1157.250194]  (&rtwdev->mutex){+.+.+.}, at: [<ffffffffc098322b>] rtw_ops_config+0x2b/0x90 [rtw88]
[ 1157.259151]
but task is already holding lock:
[ 1157.265085]  (&local->iflist_mtx){+.+...}, at: [<ffffffffc0b8ab7a>] ieee80211_mgd_probe_ap.part.28+0xca/0x160 [mac80211]
[ 1157.276169]
which lock already depends on the new lock.

[ 1157.284488]
the existing dependency chain (in reverse order) is:
[ 1157.292101]
-> rib#2 (&local->iflist_mtx){+.+...}:
[ 1157.296919]        [<ffffffffbc741a29>] lock_acquire+0x99/0x1e0
[ 1157.302955]        [<ffffffffbce72793>] mutex_lock_nested+0x93/0x410
[ 1157.309416]        [<ffffffffc0b6038f>] ieee80211_iterate_interfaces+0x2f/0x60 [mac80211]
[ 1157.317730]        [<ffffffffc09811ab>] rtw_watch_dog_work+0xcb/0x130 [rtw88]
[ 1157.325003]        [<ffffffffbc6d77bc>] process_one_work+0x22c/0x720
[ 1157.331481]        [<ffffffffbc6d7dd6>] worker_thread+0x126/0x3b0
[ 1157.337589]        [<ffffffffbc6e107f>] kthread+0xef/0x100
[ 1157.343260]        [<ffffffffbce848b7>] ret_from_fork_nospec_end+0x0/0x39
[ 1157.350091]
-> rib#1 ((&(&rtwdev->watch_dog_work)->work)){+.+...}:
[ 1157.356314]        [<ffffffffbc741a29>] lock_acquire+0x99/0x1e0
[ 1157.362427]        [<ffffffffbc6d570b>] flush_work+0x5b/0x310
[ 1157.368287]        [<ffffffffbc6d740e>] __cancel_work_timer+0xae/0x170
[ 1157.374940]        [<ffffffffbc6d7583>] cancel_delayed_work_sync+0x13/0x20
[ 1157.381930]        [<ffffffffc0982b49>] rtw_core_stop+0x29/0x50 [rtw88]
[ 1157.388679]        [<ffffffffc098bee6>] rtw_enter_ips+0x16/0x20 [rtw88]
[ 1157.395428]        [<ffffffffc0983242>] rtw_ops_config+0x42/0x90 [rtw88]
[ 1157.402173]        [<ffffffffc0b13343>] ieee80211_hw_config+0xc3/0x680 [mac80211]
[ 1157.409854]        [<ffffffffc0b3925b>] ieee80211_do_open+0x69b/0x9c0 [mac80211]
[ 1157.417418]        [<ffffffffc0b395e9>] ieee80211_open+0x69/0x70 [mac80211]
[ 1157.424496]        [<ffffffffbcd03442>] __dev_open+0xe2/0x160
[ 1157.430356]        [<ffffffffbcd03773>] __dev_change_flags+0xa3/0x180
[ 1157.436922]        [<ffffffffbcd03879>] dev_change_flags+0x29/0x60
[ 1157.443224]        [<ffffffffbcda14c4>] devinet_ioctl+0x794/0x890
[ 1157.449331]        [<ffffffffbcda27b5>] inet_ioctl+0x75/0xa0
[ 1157.455087]        [<ffffffffbccd54eb>] sock_do_ioctl+0x2b/0x60
[ 1157.461178]        [<ffffffffbccd5753>] sock_ioctl+0x233/0x310
[ 1157.467109]        [<ffffffffbc8bd820>] do_vfs_ioctl+0x410/0x6c0
[ 1157.473233]        [<ffffffffbc8bdb71>] SyS_ioctl+0xa1/0xc0
[ 1157.478914]        [<ffffffffbce84a5e>] system_call_fastpath+0x25/0x2a
[ 1157.485569]
-> #0 (&rtwdev->mutex){+.+.+.}:
[ 1157.490022]        [<ffffffffbc7409d1>] __lock_acquire+0xec1/0x1630
[ 1157.496305]        [<ffffffffbc741a29>] lock_acquire+0x99/0x1e0
[ 1157.502413]        [<ffffffffbce72793>] mutex_lock_nested+0x93/0x410
[ 1157.508890]        [<ffffffffc098322b>] rtw_ops_config+0x2b/0x90 [rtw88]
[ 1157.515724]        [<ffffffffc0b13343>] ieee80211_hw_config+0xc3/0x680 [mac80211]
[ 1157.523370]        [<ffffffffc0b8a4ca>] ieee80211_recalc_ps.part.27+0x9a/0x180 [mac80211]
[ 1157.531685]        [<ffffffffc0b8abc5>] ieee80211_mgd_probe_ap.part.28+0x115/0x160 [mac80211]
[ 1157.540353]        [<ffffffffc0b8b40d>] ieee80211_beacon_connection_loss_work+0x4d/0x80 [mac80211]
[ 1157.549513]        [<ffffffffbc6d77bc>] process_one_work+0x22c/0x720
[ 1157.555886]        [<ffffffffbc6d7dd6>] worker_thread+0x126/0x3b0
[ 1157.562170]        [<ffffffffbc6e107f>] kthread+0xef/0x100
[ 1157.567765]        [<ffffffffbce848b7>] ret_from_fork_nospec_end+0x0/0x39
[ 1157.574579]
other info that might help us debug this:

[ 1157.582788] Chain exists of:
  &rtwdev->mutex --> (&(&rtwdev->watch_dog_work)->work) --> &local->iflist_mtx

[ 1157.593024]  Possible unsafe locking scenario:

[ 1157.599046]        CPU0                    CPU1
[ 1157.603653]        ----                    ----
[ 1157.608258]   lock(&local->iflist_mtx);
[ 1157.612180]                                lock((&(&rtwdev->watch_dog_work)->work));
[ 1157.620074]                                lock(&local->iflist_mtx);
[ 1157.626555]   lock(&rtwdev->mutex);
[ 1157.630124]
 *** DEADLOCK ***

[ 1157.636148] 4 locks held by kworker/u4:2/14490:
[ 1157.640755]  #0:  (%s#6){.+.+.+}, at: [<ffffffffbc6d774a>] process_one_work+0x1ba/0x720
[ 1157.648965]  rib#1:  ((&ifmgd->beacon_connection_loss_work)){+.+.+.}, at: [<ffffffffbc6d774a>] process_one_work+0x1ba/0x720
[ 1157.659950]  rib#2:  (&wdev->mtx){+.+.+.}, at: [<ffffffffc0b8aad5>] ieee80211_mgd_probe_ap.part.28+0x25/0x160 [mac80211]
[ 1157.670901]  rib#3:  (&local->iflist_mtx){+.+...}, at: [<ffffffffc0b8ab7a>] ieee80211_mgd_probe_ap.part.28+0xca/0x160 [mac80211]
[ 1157.682466]

Fixes: e303748 ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Stanislaw Gruszka <[email protected]>
Acked-by: Yan-Hsuan Chuang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in
a NULL pointer dereference on systems which do not have local MBM support
enabled..

BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: 0000 [rib#1] SMP PTI
CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 rib#2
Workqueue: events mbm_handle_overflow
RIP: 0010:mbm_handle_overflow+0x150/0x2b0

Only enter the bandwith update loop if the system has local MBM enabled.

Fixes: de73f38 ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth")
Signed-off-by: Prarit Bhargava <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Reinette Chatre <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains various fixes for mlxsw.

Patch rib#1 fixes an hash polarization problem when a nexthop device is a
LAG device. This is caused by the fact that the same seed is used for
the LAG and ECMP hash functions.

Patch rib#2 fixes an issue in which the driver fails to refresh a nexthop
neighbour after it becomes dead. This prevents the nexthop from ever
being written to the adjacency table and used to forward traffic. Patch

Patch rib#4 fixes a wrong extraction of TOS value in flower offload code.
Patch rib#5 is a test case.

Patch rib#6 works around a buffer issue in Spectrum-2 by reducing the
default sizes of the shared buffer pools.

Patch rib#7 prevents prio-tagged packets from entering the switch when PVID
is removed from the bridge port.

Please consider patches rib#2, rib#4 and rib#6 for 5.1.y
====================

Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Jul 9, 2019
…nux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.2, take rib#2

- SVE cleanup killing a warning with ancient GCC versions
- Don't report non-existent system registers to userspace
- Fix memory leak when freeing the vgic ITS
- Properly lower the interrupt on the emulated physical timer
djdeath pushed a commit to djdeath/linux that referenced this pull request Sep 17, 2019
syzbot reported:

    BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
    CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ rib#2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0x173/0x1d0 lib/dump_stack.c:113
      kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
      __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
      capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
      do_loop_readv_writev fs/read_write.c:703 [inline]
      do_iter_write+0x83e/0xd80 fs/read_write.c:961
      vfs_writev fs/read_write.c:1004 [inline]
      do_writev+0x397/0x840 fs/read_write.c:1039
      __do_sys_writev fs/read_write.c:1112 [inline]
      __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109
      __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109
      do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
      entry_SYSCALL_64_after_hwframe+0x63/0xe7
    [...]

The problem is that capi_write() is reading past the end of the message.
Fix it by checking the message's length in the needed places.

Reported-and-tested-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Sep 17, 2019
…empts

The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.

When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).

Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:

  [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
  [49887.347059]       Not tainted 5.2.13-gentoo rib#2
  [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
  [49887.347064] Call Trace:
  [49887.347069]  ? __schedule+0x265/0x830
  [49887.347071]  ? bit_wait+0x50/0x50
  [49887.347072]  ? bit_wait+0x50/0x50
  [49887.347074]  schedule+0x24/0x90
  [49887.347075]  io_schedule+0x3c/0x60
  [49887.347077]  bit_wait_io+0x8/0x50
  [49887.347079]  __wait_on_bit+0x6c/0x80
  [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
  [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
  [49887.347084]  ? var_wake_function+0x20/0x20
  [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
  [49887.347089]  btree_write_cache_pages+0x18e/0x340
  [49887.347091]  do_writepages+0x29/0xb0
  [49887.347093]  ? kmem_cache_free+0x132/0x160
  [49887.347095]  ? convert_extent_bit+0x544/0x680
  [49887.347097]  filemap_fdatawrite_range+0x70/0x90
  [49887.347099]  btrfs_write_marked_extents+0x53/0x120
  [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
  [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
  [49887.347103]  ? start_transaction+0x33e/0x500
  [49887.347105]  transaction_kthread+0x139/0x15c

So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).

This is a regression introduced in the 5.2 kernel.

Fixes: 2e3c251 ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f434062 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%[email protected]/T/#u
Reported-by: Stefan Priebe - Profihost AG <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/T/#t
Reported-by: Drazen Kacar <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Oct 9, 2019
As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.

<4>[  723.763281] WARNING: possible circular locking dependency detected
<4>[  723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ rib#1 Tainted: G     U
<4>[  723.763288] ------------------------------------------------------
<4>[  723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[  723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.763378]
                  but task is already holding lock:
<4>[  723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763420]
                  which lock already depends on the new lock.

<4>[  723.763423]
                  the existing dependency chain (in reverse order) is:
<4>[  723.763427]
                  -> rib#2 (&i915_request_get(rq)->submit/1){-.-.}:
<4>[  723.763434]        _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  723.763478]        __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763513]        intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[  723.763600]        cs_irq_handler+0x49/0x50 [i915]
<4>[  723.763659]        gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  723.763690]        gen11_irq_handler+0x54/0xf0 [i915]
<4>[  723.763695]        __handle_irq_event_percpu+0x41/0x2d0
<4>[  723.763699]        handle_irq_event_percpu+0x2b/0x70
<4>[  723.763702]        handle_irq_event+0x2f/0x50
<4>[  723.763706]        handle_edge_irq+0xee/0x1a0
<4>[  723.763709]        do_IRQ+0x7e/0x160
<4>[  723.763712]        ret_from_intr+0x0/0x1d
<4>[  723.763717]        __slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[  723.763720]        kmem_cache_alloc+0x28d/0x2f0
<4>[  723.763724]        vm_area_dup+0x15/0x40
<4>[  723.763727]        dup_mm+0x2dd/0x550
<4>[  723.763730]        copy_process+0xf21/0x1ef0
<4>[  723.763734]        _do_fork+0x71/0x670
<4>[  723.763737]        __se_sys_clone+0x6e/0xa0
<4>[  723.763741]        do_syscall_64+0x4f/0x210
<4>[  723.763744]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.763747]
                  -> rib#1 (&(&rq->lock)->rlock#2){-.-.}:
<4>[  723.763752]        _raw_spin_lock+0x2a/0x40
<4>[  723.763789]        __unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[  723.763825]        __execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[  723.763864]        execlists_submission_tasklet+0x34/0x50 [i915]
<4>[  723.763874]        tasklet_action_common.isra.5+0x47/0xb0
<4>[  723.763878]        __do_softirq+0xd8/0x4ae
<4>[  723.763881]        irq_exit+0xa9/0xc0
<4>[  723.763883]        smp_apic_timer_interrupt+0xb7/0x280
<4>[  723.763887]        apic_timer_interrupt+0xf/0x20
<4>[  723.763892]        cpuidle_enter_state+0xae/0x450
<4>[  723.763895]        cpuidle_enter+0x24/0x40
<4>[  723.763899]        do_idle+0x1e7/0x250
<4>[  723.763902]        cpu_startup_entry+0x14/0x20
<4>[  723.763905]        start_secondary+0x15f/0x1b0
<4>[  723.763908]        secondary_startup_64+0xa4/0xb0
<4>[  723.763911]
                  -> #0 (&engine->active.lock){..-.}:
<4>[  723.763916]        __lock_acquire+0x15d8/0x1ea0
<4>[  723.763919]        lock_acquire+0xa6/0x1c0
<4>[  723.763922]        _raw_spin_lock_irqsave+0x33/0x50
<4>[  723.763956]        execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.764002]        submit_notify+0xa8/0x13c [i915]
<4>[  723.764035]        __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[  723.764054]        i915_sw_fence_wake+0x51/0x64 [i915]
<4>[  723.764054]        __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[  723.764054]        dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[  723.764054]        dma_fence_signal_locked+0x9e/0x1c0
<4>[  723.764054]        dma_fence_signal+0x1f/0x40
<4>[  723.764054]        vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[  723.764054]        drm_ioctl_kernel+0x83/0xf0
<4>[  723.764054]        drm_ioctl+0x2f3/0x3b0
<4>[  723.764054]        do_vfs_ioctl+0xa0/0x6f0
<4>[  723.764054]        ksys_ioctl+0x35/0x60
<4>[  723.764054]        __x64_sys_ioctl+0x11/0x20
<4>[  723.764054]        do_syscall_64+0x4f/0x210
<4>[  723.764054]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.764054]
                  other info that might help us debug this:

<4>[  723.764054] Chain exists of:
                    &engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1

<4>[  723.764054]  Possible unsafe locking scenario:

<4>[  723.764054]        CPU0                    CPU1
<4>[  723.764054]        ----                    ----
<4>[  723.764054]   lock(&i915_request_get(rq)->submit/1);
<4>[  723.764054]                                lock(&(&rq->lock)->rlock#2);
<4>[  723.764054]                                lock(&i915_request_get(rq)->submit/1);
<4>[  723.764054]   lock(&engine->active.lock);
<4>[  723.764054]
                   *** DEADLOCK ***

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Oct 17, 2019
This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ rib#1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&id_priv->handler_mutex);
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
 #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
 rib#1: 000000001efd357b ((work_completion)(&work->work)rib#3){+.+.}, at: process_one_work+0x476/0xac0
 rib#2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ rib#1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
 dump_stack+0x8a/0xd6
 __lock_acquire.cold+0xe1/0x24d
 lock_acquire+0x106/0x240
 __mutex_lock+0x12e/0xcb0
 mutex_lock_nested+0x1f/0x30
 rdma_destroy_id+0x78/0x4a0 [rdma_cm]
 iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
 cm_work_handler+0xe62/0x1100 [iw_cm]
 process_one_work+0x56d/0xac0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

This is not a bug as there are actually two lock classes here.

Link: https://lore.kernel.org/r/[email protected]
Fixes: de910bd ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Oct 17, 2019
A user reported a lockdep splat

 ======================================================
 WARNING: possible circular locking dependency detected
 5.2.11-gentoo rib#2 Not tainted
 ------------------------------------------------------
 kswapd0/711 is trying to acquire lock:
 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500

but task is already holding lock:
 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> rib#1 (fs_reclaim){+.+.}:
 kmem_cache_alloc+0x1f/0x1c0
 btrfs_alloc_inode+0x1f/0x260
 alloc_inode+0x16/0xa0
 new_inode+0xe/0xb0
 btrfs_new_inode+0x70/0x610
 btrfs_symlink+0xd0/0x420
 vfs_symlink+0x9c/0x100
 do_symlinkat+0x66/0xe0
 do_syscall_64+0x55/0x1c0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (sb_internal){.+.+}:
 __sb_start_write+0xf6/0x150
 start_transaction+0x3a8/0x500
 btrfs_commit_inode_delayed_inode+0x59/0x110
 btrfs_evict_inode+0x19e/0x4c0
 evict+0xbc/0x1f0
 inode_lru_isolate+0x113/0x190
 __list_lru_walk_one.isra.4+0x5c/0x100
 list_lru_walk_one+0x32/0x50
 prune_icache_sb+0x36/0x80
 super_cache_scan+0x14a/0x1d0
 do_shrink_slab+0x131/0x320
 shrink_node+0xf7/0x380
 balance_pgdat+0x2d5/0x640
 kswapd+0x2ba/0x5e0
 kthread+0x147/0x160
 ret_from_fork+0x24/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

 CPU0 CPU1
 ---- ----
 lock(fs_reclaim);
 lock(sb_internal);
 lock(fs_reclaim);
 lock(sb_internal);
*** DEADLOCK ***

 3 locks held by kswapd0/711:
 #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
 rib#1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
 rib#2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0

stack backtrace:
 CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo rib#2
 Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
 Call Trace:
 dump_stack+0x85/0xc7
 print_circular_bug.cold.40+0x1d9/0x235
 __lock_acquire+0x18b1/0x1f00
 lock_acquire+0xa6/0x170
 ? start_transaction+0x3a8/0x500
 __sb_start_write+0xf6/0x150
 ? start_transaction+0x3a8/0x500
 start_transaction+0x3a8/0x500
 btrfs_commit_inode_delayed_inode+0x59/0x110
 btrfs_evict_inode+0x19e/0x4c0
 ? var_wake_function+0x20/0x20
 evict+0xbc/0x1f0
 inode_lru_isolate+0x113/0x190
 ? discard_new_inode+0xc0/0xc0
 __list_lru_walk_one.isra.4+0x5c/0x100
 ? discard_new_inode+0xc0/0xc0
 list_lru_walk_one+0x32/0x50
 prune_icache_sb+0x36/0x80
 super_cache_scan+0x14a/0x1d0
 do_shrink_slab+0x131/0x320
 shrink_node+0xf7/0x380
 balance_pgdat+0x2d5/0x640
 kswapd+0x2ba/0x5e0
 ? __wake_up_common_lock+0x90/0x90
 kthread+0x147/0x160
 ? balance_pgdat+0x640/0x640
 ? __kthread_create_on_node+0x160/0x160
 ret_from_fork+0x24/0x30

This is because btrfs_new_inode() calls new_inode() under the
transaction.  We could probably move the new_inode() outside of this but
for now just wrap it in memalloc_nofs_save().

Reported-by: Zdenek Sojka <[email protected]>
Fixes: 712e36c ("btrfs: use GFP_KERNEL in btrfs_alloc_inode")
CC: [email protected] # 4.16+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Oct 17, 2019
Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
invalidate userptr objects which also happen to be pulled into GGTT
mmaps. That is when we unbind the userptr object (on mmu invalidation),
we revoke all CPU mmaps, which may then recurse into mmu invalidation.

We looked for ways of breaking the cycle, but the revocation on
invalidation is required and cannot be avoided. The only solution we
could see was to not allow such GGTT bindings of userptr objects in the
first place. In practice, no one really wants to use a GGTT mmapping of
a CPU pointer...

Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got
a genuine blip from CI:

<4>[  246.793958] ======================================================
<4>[  246.793972] WARNING: possible circular locking dependency detected
<4>[  246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ rib#1 Tainted: G     U
<4>[  246.794003] ------------------------------------------------------
<4>[  246.794017] kswapd0/145 is trying to acquire lock:
<4>[  246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794250]
                  but task is already holding lock:
<4>[  246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4>[  246.794291]
                  which lock already depends on the new lock.

<4>[  246.794307]
                  the existing dependency chain (in reverse order) is:
<4>[  246.794322]
                  -> rib#3 (&anon_vma->rwsem){++++}:
<4>[  246.794344]        down_write+0x33/0x70
<4>[  246.794357]        __vma_adjust+0x3d9/0x7b0
<4>[  246.794370]        __split_vma+0x16a/0x180
<4>[  246.794385]        mprotect_fixup+0x2a5/0x320
<4>[  246.794399]        do_mprotect_pkey+0x208/0x2e0
<4>[  246.794413]        __x64_sys_mprotect+0x16/0x20
<4>[  246.794429]        do_syscall_64+0x55/0x1c0
<4>[  246.794443]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794456]
                  -> rib#2 (&mapping->i_mmap_rwsem){++++}:
<4>[  246.794478]        down_write+0x33/0x70
<4>[  246.794493]        unmap_mapping_pages+0x48/0x130
<4>[  246.794519]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4>[  246.794519]        i915_vma_unbind+0x11d/0x4a0 [i915]
<4>[  246.794519]        i915_vma_destroy+0x31/0x300 [i915]
<4>[  246.794519]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4>[  246.794519]        drm_file_free.part.0+0x1e6/0x290
<4>[  246.794519]        drm_release+0xa6/0xe0
<4>[  246.794519]        __fput+0xc2/0x250
<4>[  246.794519]        task_work_run+0x82/0xb0
<4>[  246.794519]        do_exit+0x35b/0xdb0
<4>[  246.794519]        do_group_exit+0x34/0xb0
<4>[  246.794519]        __x64_sys_exit_group+0xf/0x10
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> rib#1 (&vm->mutex){+.+.}:
<4>[  246.794519]        i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4>[  246.794519]        i915_address_space_init+0x9f/0x160 [i915]
<4>[  246.794519]        i915_ggtt_init_hw+0x55/0x170 [i915]
<4>[  246.794519]        i915_driver_probe+0xc9f/0x1620 [i915]
<4>[  246.794519]        i915_pci_probe+0x43/0x1b0 [i915]
<4>[  246.794519]        pci_device_probe+0x9e/0x120
<4>[  246.794519]        really_probe+0xea/0x3d0
<4>[  246.794519]        driver_probe_device+0x10b/0x120
<4>[  246.794519]        device_driver_attach+0x4a/0x50
<4>[  246.794519]        __driver_attach+0x97/0x130
<4>[  246.794519]        bus_for_each_dev+0x74/0xc0
<4>[  246.794519]        bus_add_driver+0x13f/0x210
<4>[  246.794519]        driver_register+0x56/0xe0
<4>[  246.794519]        do_one_initcall+0x58/0x300
<4>[  246.794519]        do_init_module+0x56/0x1f6
<4>[  246.794519]        load_module+0x25bd/0x2a40
<4>[  246.794519]        __se_sys_finit_module+0xd3/0xf0
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #0 (&dev->struct_mutex/1){+.+.}:
<4>[  246.794519]        __lock_acquire+0x15d8/0x1e90
<4>[  246.794519]        lock_acquire+0xa6/0x1c0
<4>[  246.794519]        __mutex_lock+0x9d/0x9b0
<4>[  246.794519]        userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794519]        __mmu_notifier_invalidate_range_start+0x85/0x110
<4>[  246.794519]        try_to_unmap_one+0x76b/0x860
<4>[  246.794519]        rmap_walk_anon+0x104/0x280
<4>[  246.794519]        try_to_unmap+0xc0/0xf0
<4>[  246.794519]        shrink_page_list+0x561/0xc10
<4>[  246.794519]        shrink_inactive_list+0x220/0x440
<4>[  246.794519]        shrink_node_memcg+0x36e/0x740
<4>[  246.794519]        shrink_node+0xcb/0x490
<4>[  246.794519]        balance_pgdat+0x241/0x580
<4>[  246.794519]        kswapd+0x16c/0x530
<4>[  246.794519]        kthread+0x119/0x130
<4>[  246.794519]        ret_from_fork+0x24/0x50
<4>[  246.794519]
                  other info that might help us debug this:

<4>[  246.794519] Chain exists of:
                    &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4>[  246.794519]  Possible unsafe locking scenario:

<4>[  246.794519]        CPU0                    CPU1
<4>[  246.794519]        ----                    ----
<4>[  246.794519]   lock(&anon_vma->rwsem);
<4>[  246.794519]                                lock(&mapping->i_mmap_rwsem);
<4>[  246.794519]                                lock(&anon_vma->rwsem);
<4>[  246.794519]   lock(&dev->struct_mutex/1);
<4>[  246.794519]
                   *** DEADLOCK ***

v2: Say no to mmap_ioctl

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a431174)
Signed-off-by: Rodrigo Vivi <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Dec 12, 2019
As virtual_context_destroy() may be called from a request signal, it may
be called from inside an irq-off section, and so we need to do a full
save/restore of the irq state rather than blindly re-enable irqs upon
unlocking.

<4> [110.024262] WARNING: inconsistent lock state
<4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ rib#1 Tainted: G     U
<4> [110.024292] --------------------------------
<4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.024592] {IN-HARDIRQ-W} state was registered at:
<4> [110.024612]   lock_acquire+0xa7/0x1c0
<4> [110.024627]   _raw_spin_lock_irqsave+0x33/0x50
<4> [110.024788]   intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
<4> [110.024808]   irq_work_run_list+0x49/0x70
<4> [110.024824]   irq_work_run+0x26/0x50
<4> [110.024839]   smp_irq_work_interrupt+0x44/0x1e0
<4> [110.024855]   irq_work_interrupt+0xf/0x20
<4> [110.024871]   __do_softirq+0xb7/0x47f
<4> [110.024885]   irq_exit+0xba/0xc0
<4> [110.024898]   do_IRQ+0x83/0x160
<4> [110.024910]   ret_from_intr+0x0/0x1d
<4> [110.024922] irq event stamp: 172864
<4> [110.024938] hardirqs last  enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
<4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
<4> [110.024988] softirqs last  enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
<4> [110.025031]
other info that might help us debug this:
<4> [110.025049]  Possible unsafe locking scenario:

<4> [110.025065]        CPU0
<4> [110.025075]        ----
<4> [110.025084]   lock(&(&rq->lock)->rlock);
<4> [110.025099]   <Interrupt>
<4> [110.025109]     lock(&(&rq->lock)->rlock);
<4> [110.025124]
 *** DEADLOCK ***

<4> [110.025144] 4 locks held by kworker/0:0/5:
<4> [110.025156]  #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025187]  rib#1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025219]  rib#2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
<4> [110.025405]  rib#3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.025634]
stack backtrace:
<4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7489+ rib#1
<4> [110.025675] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [110.025856] Workqueue: events engine_retire [i915]
<4> [110.025872] Call Trace:
<4> [110.025891]  dump_stack+0x71/0x9b
<4> [110.025907]  mark_lock+0x49a/0x500
<4> [110.025926]  ? print_shortest_lock_dependencies+0x200/0x200
<4> [110.025946]  mark_held_locks+0x49/0x70
<4> [110.025962]  ? _raw_spin_unlock_irq+0x24/0x50
<4> [110.025978]  lockdep_hardirqs_on+0xa2/0x1c0
<4> [110.025995]  _raw_spin_unlock_irq+0x24/0x50
<4> [110.026171]  virtual_context_destroy+0xc5/0x2e0 [i915]
<4> [110.026376]  __active_retire+0xb4/0x290 [i915]
<4> [110.026396]  dma_fence_signal_locked+0x9e/0x1b0
<4> [110.026613]  i915_request_retire+0x451/0x930 [i915]
<4> [110.026766]  retire_requests+0x4d/0x60 [i915]
<4> [110.026919]  engine_retire+0x63/0xe0 [i915]

Fixes: b1e3177 ("drm/i915: Coordinate i915_active with its own mutex")
Fixes: 6d06779 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 6f7ac82)
Signed-off-by: Joonas Lahtinen <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
Luo bin says:

====================
hinic: BugFixes

the bug fixed in patch rib#2 has been present since the first commit.
the bugs fixed in patch rib#1 and patch rib#3 have been present since the
following commits:
patch rib#1: 352f58b ("net-next/hinic: Set Rxq irq to specific cpu for NUMA")
patch rib#3: 421e952 ("hinic: add rss support")
====================

Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
rmnet_get_port() internally calls rcu_dereference_rtnl(),
which checks RTNL.
But rmnet_get_port() could be called by packet path.
The packet path is not protected by RTNL.
So, the suspicious RCU usage problem occurs.

Test commands:
    modprobe rmnet
    ip netns add nst
    ip link add veth0 type veth peer name veth1
    ip link set veth1 netns nst
    ip link add rmnet0 link veth0 type rmnet mux_id 1
    ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1
    ip netns exec nst ip link set veth1 up
    ip netns exec nst ip link set rmnet1 up
    ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1
    ip link set veth0 up
    ip link set rmnet0 up
    ip a a 192.168.100.1/24 dev rmnet0
    ping 192.168.100.2

Splat looks like:
[  146.630958][ T1174] WARNING: suspicious RCU usage
[  146.631735][ T1174] 5.6.0-rc1+ torvalds#447 Not tainted
[  146.632387][ T1174] -----------------------------
[  146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() !
[  146.634742][ T1174]
[  146.634742][ T1174] other info that might help us debug this:
[  146.634742][ T1174]
[  146.645992][ T1174]
[  146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1
[  146.646937][ T1174] 5 locks held by ping/1174:
[  146.647609][ T1174]  #0: ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980
[  146.662463][ T1174]  rib#1: ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150
[  146.671696][ T1174]  rib#2: ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940
[  146.673064][ T1174]  rib#3: ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150
[  146.690358][ T1174]  rib#4: ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020
[  146.699875][ T1174]
[  146.699875][ T1174] stack backtrace:
[  146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ torvalds#447
[  146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  146.706565][ T1174] Call Trace:
[  146.707102][ T1174]  dump_stack+0x96/0xdb
[  146.708007][ T1174]  rmnet_get_port.part.9+0x76/0x80 [rmnet]
[  146.709233][ T1174]  rmnet_egress_handler+0x107/0x420 [rmnet]
[  146.710492][ T1174]  ? sch_direct_xmit+0x1e2/0x1020
[  146.716193][ T1174]  rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet]
[  146.717012][ T1174]  dev_hard_start_xmit+0x160/0x740
[  146.717854][ T1174]  sch_direct_xmit+0x265/0x1020
[  146.718577][ T1174]  ? register_lock_class+0x14d0/0x14d0
[  146.719429][ T1174]  ? dev_watchdog+0xac0/0xac0
[  146.723738][ T1174]  ? __dev_queue_xmit+0x15fd/0x2940
[  146.724469][ T1174]  ? lock_acquire+0x164/0x3b0
[  146.725172][ T1174]  __dev_queue_xmit+0x20c7/0x2940
[ ... ]

Fixes: ceed73a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
sel_lock cannot nest in the console lock. Thanks to syzkaller, the
kernel states firmly:

> WARNING: possible circular locking dependency detected
> 5.6.0-rc3-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor.4/20336 is trying to acquire lock:
> ffff8880a2e952a0 (&tty->termios_rwsem){++++}, at: tty_unthrottle+0x22/0x100 drivers/tty/tty_ioctl.c:136
>
> but task is already holding lock:
> ffffffff89462e70 (sel_lock){+.+.}, at: paste_selection+0x118/0x470 drivers/tty/vt/selection.c:374
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> rib#2 (sel_lock){+.+.}:
>        mutex_lock_nested+0x1b/0x30 kernel/locking/mutex.c:1118
>        set_selection_kernel+0x3b8/0x18a0 drivers/tty/vt/selection.c:217
>        set_selection_user+0x63/0x80 drivers/tty/vt/selection.c:181
>        tioclinux+0x103/0x530 drivers/tty/vt/vt.c:3050
>        vt_ioctl+0x3f1/0x3a30 drivers/tty/vt/vt_ioctl.c:364

This is ioctl(TIOCL_SETSEL).
Locks held on the path: console_lock -> sel_lock

> -> rib#1 (console_lock){+.+.}:
>        console_lock+0x46/0x70 kernel/printk/printk.c:2289
>        con_flush_chars+0x50/0x650 drivers/tty/vt/vt.c:3223
>        n_tty_write+0xeae/0x1200 drivers/tty/n_tty.c:2350
>        do_tty_write drivers/tty/tty_io.c:962 [inline]
>        tty_write+0x5a1/0x950 drivers/tty/tty_io.c:1046

This is write().
Locks held on the path: termios_rwsem -> console_lock

> -> #0 (&tty->termios_rwsem){++++}:
>        down_write+0x57/0x140 kernel/locking/rwsem.c:1534
>        tty_unthrottle+0x22/0x100 drivers/tty/tty_ioctl.c:136
>        mkiss_receive_buf+0x12aa/0x1340 drivers/net/hamradio/mkiss.c:902
>        tty_ldisc_receive_buf+0x12f/0x170 drivers/tty/tty_buffer.c:465
>        paste_selection+0x346/0x470 drivers/tty/vt/selection.c:389
>        tioclinux+0x121/0x530 drivers/tty/vt/vt.c:3055
>        vt_ioctl+0x3f1/0x3a30 drivers/tty/vt/vt_ioctl.c:364

This is ioctl(TIOCL_PASTESEL).
Locks held on the path: sel_lock -> termios_rwsem

> other info that might help us debug this:
>
> Chain exists of:
>   &tty->termios_rwsem --> console_lock --> sel_lock

Clearly. From the above, we have:
 console_lock -> sel_lock
 sel_lock -> termios_rwsem
 termios_rwsem -> console_lock

Fix this by reversing the console_lock -> sel_lock dependency in
ioctl(TIOCL_SETSEL). First, lock sel_lock, then console_lock.

Signed-off-by: Jiri Slaby <[email protected]>
Reported-by: [email protected]
Fixes: 07e6124 ("vt: selection, close sel_buffer race")
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,

 LTP: starting fsync04
 /dev/zero: Can't open blockdev
 EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
 EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
 ==================================================================
 BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

 write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
  __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
  __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
  jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
  (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
  kjournald2+0x13b/0x450 [jbd2]
  kthread+0x1cd/0x1f0
  ret_from_fork+0x27/0x50

 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
  jbd2_write_access_granted+0x1b2/0x250 [jbd2]
  jbd2_write_access_granted at fs/jbd2/transaction.c:1155
  jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
  __ext4_journal_get_write_access+0x50/0x90 [ext4]
  ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
  ext4_mb_new_blocks+0x54f/0xca0 [ext4]
  ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
  ext4_map_blocks+0x3b4/0x950 [ext4]
  _ext4_get_block+0xfc/0x270 [ext4]
  ext4_get_block+0x3b/0x50 [ext4]
  __block_write_begin_int+0x22e/0xae0
  __block_write_begin+0x39/0x50
  ext4_write_begin+0x388/0xb50 [ext4]
  generic_perform_write+0x15d/0x290
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb05
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 5 locks held by fsync04/25724:
  #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
  rib#1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
  rib#2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
  rib#3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
  rib#4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
 irq event stamp: 1407125
 hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
 softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
 softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ rib#7
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
Fix NULL pointer dereference in the error flow of ib_create_qp_user
when accessing to uninitialized list pointers - rdma_mrs and sig_mrs.
The following crash from syzkaller revealed it.

  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [rib#1] SMP KASAN PTI
  CPU: 1 PID: 23167 Comm: syz-executor.1 Not tainted 5.5.0-rc5 rib#2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:ib_mr_pool_destroy+0x81/0x1f0
  Code: 00 00 fc ff df 49 c1 ec 03 4d 01 fc e8 a8 ea 72 fe 41 80 3c 24 00
  0f 85 62 01 00 00 48 8b 13 48 89 d6 4c 8d 6a c8 48 c1 ee 03 <42> 80 3c
  3e 00 0f 85 34 01 00 00 48 8d 7a 08 4c 8b 02 48 89 fe 48
  RSP: 0018:ffffc9000951f8b0 EFLAGS: 00010046
  RAX: 0000000000040000 RBX: ffff88810f268038 RCX: ffffffff82c41628
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000951f850
  RBP: ffff88810f268020 R08: 0000000000000004 R09: fffff520012a3f0a
  R10: 0000000000000001 R11: fffff520012a3f0a R12: ffffed1021e4d007
  R13: ffffffffffffffc8 R14: 0000000000000246 R15: dffffc0000000000
  FS:  00007f54bc788700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000116920002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   rdma_rw_cleanup_mrs+0x15/0x30
   ib_destroy_qp_user+0x674/0x7d0
   ib_create_qp_user+0xb01/0x11c0
   create_qp+0x1517/0x2130
   ib_uverbs_create_qp+0x13e/0x190
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f54bc787c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
  RDX: 0000000000000040 RSI: 0000000020000540 RDI: 0000000000000003
  RBP: 00007f54bc787c70 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54bc7886bc
  R13: 00000000004ca2ec R14: 000000000070ded0 R15: 0000000000000005

Fixes: a060b56 ("IB/core: generic RDMA READ/WRITE API")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
<4>[  259.166718] WARNING: inconsistent lock state
<4>[  259.166725] 5.6.0-rc3-CI-CI_DRM_8038+ rib#1 Tainted: G     U
<4>[  259.166727] --------------------------------
<4>[  259.166731] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4>[  259.166741] rtcwake/4221 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4>[  259.166745] ffffffff82635198 (rtc_lock){?...}, at: cmos_interrupt+0x18/0x100
<4>[  259.166768] {IN-HARDIRQ-W} state was registered at:
<4>[  259.166780]   lock_acquire+0xa7/0x1c0
<4>[  259.166790]   _raw_spin_lock+0x2a/0x40
<4>[  259.166799]   cmos_interrupt+0x18/0x100
<4>[  259.166808]   rtc_handler+0x75/0xc0
<4>[  259.166822]   acpi_ev_fixed_event_detect+0xf9/0x132
<4>[  259.166829]   acpi_ev_sci_xrupt_handler+0xb/0x28
<4>[  259.166838]   acpi_irq+0x13/0x30
<4>[  259.166849]   __handle_irq_event_percpu+0x41/0x2c0
<4>[  259.166859]   handle_irq_event_percpu+0x2b/0x70
<4>[  259.166868]   handle_irq_event+0x2f/0x50
<4>[  259.166875]   handle_fasteoi_irq+0x8e/0x150
<4>[  259.166883]   do_IRQ+0x7e/0x160
<4>[  259.166891]   ret_from_intr+0x0/0x35
<4>[  259.166898]   mwait_idle+0x7e/0x200
<4>[  259.166905]   do_idle+0x1bb/0x260
<4>[  259.166912]   cpu_startup_entry+0x14/0x20
<4>[  259.166921]   start_secondary+0x15f/0x1b0
<4>[  259.166929]   secondary_startup_64+0xa4/0xb0
<4>[  259.167264] irq event stamp: 41593
<4>[  259.167275] hardirqs last  enabled at (41593): [<ffffffff81a394e7>] _raw_spin_unlock_irqrestore+0x47/0x60
<4>[  259.167285] hardirqs last disabled at (41592): [<ffffffff81a3926d>] _raw_spin_lock_irqsave+0xd/0x50
<4>[  259.167296] softirqs last  enabled at (41568): [<ffffffff81e00385>] __do_softirq+0x385/0x47f
<4>[  259.167306] softirqs last disabled at (41561): [<ffffffff810babaa>] irq_exit+0xba/0xc0
<4>[  259.167309]
                  other info that might help us debug this:
<4>[  259.167312]  Possible unsafe locking scenario:

<4>[  259.167314]        CPU0
<4>[  259.167316]        ----
<4>[  259.167319]   lock(rtc_lock);
<4>[  259.167324]   <Interrupt>
<4>[  259.167326]     lock(rtc_lock);
<4>[  259.167332]
                   *** DEADLOCK ***

<4>[  259.167337] 6 locks held by rtcwake/4221:
<4>[  259.167665]  #0: ffff888175e89408 (sb_writers#5){.+.+}, at: vfs_write+0x1a4/0x1d0
<4>[  259.167687]  rib#1: ffff88816e112080 (&of->mutex){+.+.}, at: kernfs_fop_write+0xdd/0x1b0
<4>[  259.167706]  rib#2: ffff888179be85e0 (kn->count#236){.+.+}, at: kernfs_fop_write+0xe6/0x1b0
<4>[  259.167728]  rib#3: ffffffff82641e00 (system_transition_mutex){+.+.}, at: pm_suspend+0xb3/0x3b0
<4>[  259.167748]  rib#4: ffffffff826b3ea0 (acpi_scan_lock){+.+.}, at: acpi_suspend_begin+0x47/0x80
<4>[  259.167763]  rib#5: ffff888178f6b960 (&dev->mutex){....}, at: device_resume+0x92/0x1c0
<4>[  259.167778]
                  stack backtrace:
<4>[  259.167788] CPU: 1 PID: 4221 Comm: rtcwake Tainted: G     U            5.6.0-rc3-CI-CI_DRM_8038+ rib#1
<4>[  259.168106] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
<4>[  259.168110] Call Trace:
<4>[  259.168123]  dump_stack+0x71/0x9b
<4>[  259.168133]  mark_lock+0x49a/0x500
<4>[  259.168457]  ? print_shortest_lock_dependencies+0x200/0x200
<4>[  259.168469]  __lock_acquire+0x6d4/0x15d0
<4>[  259.168479]  ? __lock_acquire+0x460/0x15d0
<4>[  259.168490]  lock_acquire+0xa7/0x1c0
<4>[  259.168500]  ? cmos_interrupt+0x18/0x100
<4>[  259.168824]  _raw_spin_lock+0x2a/0x40
<4>[  259.168834]  ? cmos_interrupt+0x18/0x100
<4>[  259.168843]  cmos_interrupt+0x18/0x100
<4>[  259.169159]  cmos_resume+0x1fd/0x290
<4>[  259.169174]  ? __acpi_pm_set_device_wakeup+0x24/0x100
<4>[  259.169498]  pnp_bus_resume+0x5e/0x90
<4>[  259.169509]  ? pnp_bus_suspend+0x10/0x10
<4>[  259.169518]  dpm_run_callback+0x64/0x280
<4>[  259.169530]  device_resume+0xd4/0x1c0
<4>[  259.169540]  ? dpm_watchdog_set+0x60/0x60
<4>[  259.169860]  dpm_resume+0x106/0x410
<4>[  259.169870]  ? dpm_resume_early+0x38c/0x3e0
<4>[  259.169881]  dpm_resume_end+0x8/0x10
<4>[  259.170195]  suspend_devices_and_enter+0x16f/0xbe0
<4>[  259.170211]  ? rcu_read_lock_sched_held+0x4d/0x80
<4>[  259.170528]  pm_suspend+0x344/0x3b0
<4>[  259.170542]  state_store+0x78/0xe0
<4>[  259.170559]  kernfs_fop_write+0x112/0x1b0
<4>[  259.170579]  vfs_write+0xb9/0x1d0
<4>[  259.170896]  ksys_write+0x9f/0xe0
<4>[  259.170907]  do_syscall_64+0x4f/0x220
<4>[  259.170918]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  259.171229] RIP: 0033:0x7f9b4f3cb154
<4>[  259.171240] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
<4>[  259.171245] RSP: 002b:00007ffc057ce438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  259.171253] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f9b4f3cb154
<4>[  259.171257] RDX: 0000000000000004 RSI: 000055f4b3d185a0 RDI: 000000000000000a
<4>[  259.171572] RBP: 000055f4b3d185a0 R08: 000055f4b3d165e0 R09: 00007f9b4fab7740
<4>[  259.171576] R10: 000055f4b3d14010 R11: 0000000000000246 R12: 000055f4b3d16500
<4>[  259.171580] R13: 0000000000000004 R14: 00007f9b4f6a32a0 R15: 00007f9b4f6a2760

Fixes: c6d3a27 ("rtc: cmos: acknowledge ACPI driven wake alarms upon resume")
Fixes: 311ee9c ("rtc: cmos: allow using ACPI for RTC alarm instead of HPET")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # v4.18+
Signed-off-by: Jani Nikula <[email protected]>
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
Sigh, this is mostly my fault for not giving commit cd82d82
("drm/dp_mst: Add branch bandwidth validation to MST atomic check")
enough scrutiny during review. The way we're checking bandwidth
limitations here is mostly wrong:

For starters, drm_dp_mst_atomic_check_bw_limit() determines the
pbn_limit of a branch by simply scanning each port on the current branch
device, then uses the last non-zero full_pbn value that it finds. It
then counts the sum of the PBN used on each branch device for that
level, and compares against the full_pbn value it found before.

This is wrong because ports can and will have different PBN limitations
on many hubs, especially since a number of DisplayPort hubs out there
will be clever and only use the smallest link rate required for each
downstream sink - potentially giving every port a different full_pbn
value depending on what link rate it's trained at. This means with our
current code, which max PBN value we end up with is not well defined.

Additionally, we also need to remember when checking bandwidth
limitations that the top-most device in any MST topology is a branch
device, not a port. This means that the first level of a topology
doesn't technically have a full_pbn value that needs to be checked.
Instead, we should assume that so long as our VCPI allocations fit we're
within the bandwidth limitations of the primary MSTB.

We do however, want to check full_pbn on every port including those of
the primary MSTB. However, it's important to keep in mind that this
value represents the minimum link rate /between a port's sink or mstb,
and the mstb itself/. A quick diagram to explain:

                                MSTB rib#1
                               /       \
                              /         \
                           Port rib#1    Port rib#2
       full_pbn for Port rib#1 → |          | ← full_pbn for Port rib#2
                           Sink rib#1    MSTB rib#2
                                         |
                                       etc...

Note that in the above diagram, the combined PBN from all VCPI
allocations on said hub should not exceed the full_pbn value of port rib#2,
and the display configuration on sink rib#1 should not exceed the full_pbn
value of port rib#1. However, port rib#1 and port rib#2 can otherwise consume as
much bandwidth as they want so long as their VCPI allocations still fit.

And finally - our current bandwidth checking code also makes the mistake
of not checking whether something is an end device or not before trying
to traverse down it.

So, let's fix it by rewriting our bandwidth checking helpers. We split
the function into one part for handling branches which simply adds up
the total PBN on each branch and returns it, and one for checking each
port to ensure we're not going over its PBN limit. Phew.

This should fix regressions seen, where we erroneously reject display
configurations due to thinking they're going over our bandwidth limits
when they're not.

Changes since v1:
* Took an even closer look at how PBN limitations are supposed to be
  handled, and did some experimenting with Sean Paul. Ended up rewriting
  these helpers again, but this time they should actually be correct!
Changes since v2:
* Small indenting fix
* Fix pbn_used check in drm_dp_mst_atomic_check_port_bw_limit()

Signed-off-by: Lyude Paul <[email protected]>
Fixes: cd82d82 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check")
Cc: Sean Paul <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Mikita Lipski <[email protected]>
Tested-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
djdeath pushed a commit to djdeath/linux that referenced this pull request Mar 17, 2020
The vector management code assumes that managed interrupts cannot be
migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE()
which triggers when a managed interrupt vector association on a online CPU
is cleared. The CPU offline code uses a different mechanism which cannot
trigger this.

This assumption is not longer correct because the new CPU isolation feature
which affects the placement of managed interrupts must be able to move a
managed interrupt away from an online CPU.

There are two reasons why this can happen:

  1) When the interrupt is activated the affinity mask which was
     established in irq_create_affinity_masks() is handed in to
     the vector allocation code. This mask contains all CPUs to which
     the interrupt can be made affine to, but this does not take the
     CPU isolation 'managed_irq' mask into account.

     When the interrupt is finally requested by the device driver then the
     affinity is checked again and the CPU isolation 'managed_irq' mask is
     taken into account, which moves the interrupt to a non-isolated CPU if
     possible.

  2) The interrupt can be affine to an isolated CPU because the
     non-isolated CPUs in the calculated affinity mask are not online.

     Once a non-isolated CPU which is in the mask comes online the
     interrupt is migrated to this non-isolated CPU

In both cases the regular online migration mechanism is used which triggers
the WARN_ON_ONCE() in free_moved_vector().

Case rib#1 could have been addressed by taking the isolation mask into
account, but that would require a massive code change in the activation
logic and the eventual migration event was accepted as a reasonable
tradeoff when the isolation feature was developed. But even if rib#1 would be
addressed, rib#2 would still trigger it.

Of course the warning in free_moved_vector() was overlooked at that time
and the above two cases which have been discussed during patch review have
obviously never been tested before the final submission.

So keep it simple and remove the warning.

[ tglx: Rewrote changelog and added a comment to free_moved_vector() ]

Fixes: 11ea68f ("genirq, sched/isolation: Isolate from handling managed interrupts")
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ming Lei <[email protected]>                                                                                                                                                                       
Link: https://lkml.kernel.org/r/[email protected]
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jun 17, 2021
On its own, this change looks a little strange and doesn't do too much
useful. To understand why we're doing this we need to look forward to
future patches where we're going to probe our panel using the new DP
AUX bus. See the patch ("drm/bridge: ti-sn65dsi86: Add support for the
DP AUX bus").

Let's think about the set of steps we'll want to happen when we have
the DP AUX bus:

1. We'll create the DP AUX bus.
2. We'll populate the devices on the DP AUX bus (AKA our panel).
3. For setting up the bridge-related functions of ti-sn65dsi86 we'll
   need to get a reference to the panel.

If we do rib#1 - rib#3 in a single probe call things _mostly_ will work, but
it won't be massively robust. Let's explore.

First let's think of the easy case of no -EPROBE_DEFER. In that case
in step rib#2 when we populate the devices on the DP AUX bus it will
actually try probing the panel right away. Since the panel probe
doesn't defer then in step rib#3 we'll get a reference to the panel and
we're golden.

Second, let's think of the case when the panel returns
-EPROBE_DEFER. In that case step rib#2 won't synchronously create the
panel (it'll just add the device to the defer list to do it
later). Step rib#3 will fail to get the panel and the bridge sub-device
will return -EPROBE_DEFER. We'll depopulate the DP AUX bus. Later
we'll try the whole sequence again. Presumably the panel will
eventually stop returning -EPROBE_DEFER and we'll go back to the first
case where things were golden. So this case is OK too even if it's a
bit ugly that we have to keep creating / deleting the AUX bus over and
over.

So where is the problem? As I said, it's mostly about robustness. I
don't believe that step rib#2 (creating the sub-devices) is really
guaranteed to be synchronous. This is evidenced by the fact that it's
allowed to "succeed" by just sticking the device on the deferred
list. If anything about the process changes in Linux as a whole and
step rib#2 just kicks off the probe of the DP AUX endpoints (our panel)
in the background then we'd be in trouble because we might never get
the panel in step rib#3.

Adding an extra sub-device means we just don't need to worry about
it. We'll create the sub-device for the DP AUX bus and it won't go
away until the whole ti-sn65dsi86 driver goes away. If the bridge
sub-device defers (maybe because it can't find the panel) that won't
depopulate the DP AUX bus and so we don't need to worry about it.

NOTE: there's a little bit of a trick here. Though the AUX channel can
run without the MIPI-to-eDP bits of the code, the MIPI-to-eDP bits
can't run without the AUX channel. We could come up a complicated
signaling scheme (have the MIPI-to-eDP bits return EPROBE_DEFER for a
while or wait on some sort of completion), but it seems simple enough
to just not even bother creating the bridge device until the AUX
channel probes. That's what we'll do.

Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20210611101711.v10.7.If89144992cb9d900f8c91a8d1817dbe00f543720@changeid
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jun 17, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ rib#1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> rib#6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> rib#5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> rib#4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> rib#3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> rib#2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> rib#1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request Jun 17, 2021
As previously noted in commit 66e4f4a ("rtc: cmos: Use
spin_lock_irqsave() in cmos_interrupt()"):

<4>[  254.192378] WARNING: inconsistent lock state
<4>[  254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ rib#1 Not tainted
<4>[  254.192396] --------------------------------
<4>[  254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4>[  254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4>[  254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100
<4>[  254.192481] {IN-HARDIRQ-W} state was registered at:
<4>[  254.192488]   lock_acquire+0xd1/0x3d0
<4>[  254.192504]   _raw_spin_lock+0x2a/0x40
<4>[  254.192519]   cmos_interrupt+0x18/0x100
<4>[  254.192536]   rtc_handler+0x1f/0xc0
<4>[  254.192553]   acpi_ev_fixed_event_detect+0x109/0x13c
<4>[  254.192574]   acpi_ev_sci_xrupt_handler+0xb/0x28
<4>[  254.192596]   acpi_irq+0x13/0x30
<4>[  254.192620]   __handle_irq_event_percpu+0x43/0x2c0
<4>[  254.192641]   handle_irq_event_percpu+0x2b/0x70
<4>[  254.192661]   handle_irq_event+0x2f/0x50
<4>[  254.192680]   handle_fasteoi_irq+0x9e/0x150
<4>[  254.192693]   __common_interrupt+0x76/0x140
<4>[  254.192715]   common_interrupt+0x96/0xc0
<4>[  254.192732]   asm_common_interrupt+0x1e/0x40
<4>[  254.192750]   _raw_spin_unlock_irqrestore+0x38/0x60
<4>[  254.192767]   resume_irqs+0xba/0xf0
<4>[  254.192786]   dpm_resume_noirq+0x245/0x3d0
<4>[  254.192811]   suspend_devices_and_enter+0x230/0xaa0
<4>[  254.192835]   pm_suspend.cold.8+0x301/0x34a
<4>[  254.192859]   state_store+0x7b/0xe0
<4>[  254.192879]   kernfs_fop_write_iter+0x11d/0x1c0
<4>[  254.192899]   new_sync_write+0x11d/0x1b0
<4>[  254.192916]   vfs_write+0x265/0x390
<4>[  254.192933]   ksys_write+0x5a/0xd0
<4>[  254.192949]   do_syscall_64+0x33/0x80
<4>[  254.192965]   entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  254.192986] irq event stamp: 43775
<4>[  254.192994] hardirqs last  enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
<4>[  254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0
<4>[  254.193049] softirqs last  enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
<4>[  254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0
<4>[  254.193101]
                  other info that might help us debug this:
<4>[  254.193107]  Possible unsafe locking scenario:

<4>[  254.193112]        CPU0
<4>[  254.193117]        ----
<4>[  254.193121]   lock(rtc_lock);
<4>[  254.193137]   <Interrupt>
<4>[  254.193142]     lock(rtc_lock);
<4>[  254.193156]
                   *** DEADLOCK ***

<4>[  254.193161] 6 locks held by rtcwake/5309:
<4>[  254.193174]  #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0
<4>[  254.193232]  rib#1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0
<4>[  254.193282]  rib#2: ffff888100cef3c0 (kn->active#285
<7>[  254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled
<4>[  254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0
<4>[  254.193333]  rib#3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a
<4>[  254.193387]  rib#4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70
<4>[  254.193433]  rib#5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0
<4>[  254.193485]
                  stack backtrace:
<4>[  254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ rib#1
<4>[  254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
<4>[  254.193524] Call Trace:
<4>[  254.193536]  dump_stack+0x7f/0xad
<4>[  254.193567]  mark_lock.part.47+0x8ca/0xce0
<4>[  254.193604]  __lock_acquire+0x39b/0x2590
<4>[  254.193626]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
<4>[  254.193660]  lock_acquire+0xd1/0x3d0
<4>[  254.193677]  ? cmos_interrupt+0x18/0x100
<4>[  254.193716]  _raw_spin_lock+0x2a/0x40
<4>[  254.193735]  ? cmos_interrupt+0x18/0x100
<4>[  254.193758]  cmos_interrupt+0x18/0x100
<4>[  254.193785]  cmos_resume+0x2ac/0x2d0
<4>[  254.193813]  ? acpi_pm_set_device_wakeup+0x1f/0x110
<4>[  254.193842]  ? pnp_bus_suspend+0x10/0x10
<4>[  254.193864]  pnp_bus_resume+0x5e/0x90
<4>[  254.193885]  dpm_run_callback+0x5f/0x240
<4>[  254.193914]  device_resume+0xb2/0x1e0
<4>[  254.193942]  ? pm_dev_err+0x25/0x25
<4>[  254.193974]  dpm_resume+0xea/0x3f0
<4>[  254.194005]  dpm_resume_end+0x8/0x10
<4>[  254.194030]  suspend_devices_and_enter+0x29b/0xaa0
<4>[  254.194066]  pm_suspend.cold.8+0x301/0x34a
<4>[  254.194094]  state_store+0x7b/0xe0
<4>[  254.194124]  kernfs_fop_write_iter+0x11d/0x1c0
<4>[  254.194151]  new_sync_write+0x11d/0x1b0
<4>[  254.194183]  vfs_write+0x265/0x390
<4>[  254.194207]  ksys_write+0x5a/0xd0
<4>[  254.194232]  do_syscall_64+0x33/0x80
<4>[  254.194251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  254.194274] RIP: 0033:0x7f07d79691e7
<4>[  254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[  254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7
<4>[  254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b
<4>[  254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004
<4>[  254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004

which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough
to trigger the alarm during the suspend.

Fixes: 6950d04 ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
References: 66e4f4a ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"):
Signed-off-by: Chris Wilson <[email protected]>
Cc: Xiaofei Tan <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>

======================================================
 WARNING: possible circular locking dependency detected
 5.15.35-lockdep rib#6 Tainted: G        W
 ------------------------------------------------------
 frecon/429 is trying to acquire lock:
 ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

 but task is already holding lock:
 ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> rib#3 (&kms->commit_lock[i]){+.+.}-{3:3}:
        __mutex_lock_common+0x174/0x1a64
        mutex_lock_nested+0x98/0xac
        lock_crtcs+0xb4/0x124
        msm_atomic_commit_tail+0x330/0x748
        commit_tail+0x19c/0x278
        drm_atomic_helper_commit+0x1dc/0x1f0
        drm_atomic_commit+0xc0/0xd8
        drm_atomic_helper_set_config+0xb4/0x134
        drm_mode_setcrtc+0x688/0x1248
        drm_ioctl_kernel+0x1e4/0x338
        drm_ioctl+0x3a4/0x684
        __arm64_sys_ioctl+0x118/0x154
        invoke_syscall+0x78/0x224
        el0_svc_common+0x178/0x200
        do_el0_svc+0x94/0x13c
        el0_svc+0x5c/0xec
        el0t_64_sync_handler+0x78/0x108
        el0t_64_sync+0x1a4/0x1a8

 -> rib#2 (crtc_ww_class_mutex){+.+.}-{3:3}:
        __mutex_lock_common+0x174/0x1a64
        ww_mutex_lock+0xb8/0x278
        modeset_lock+0x304/0x4ac
        drm_modeset_lock+0x4c/0x7c
        drmm_mode_config_init+0x4a8/0xc50
        msm_drm_init+0x274/0xac0
        msm_drm_bind+0x20/0x2c
        try_to_bring_up_master+0x3dc/0x470
        __component_add+0x18c/0x3c0
        component_add+0x1c/0x28
        dp_display_probe+0x954/0xa98
        platform_probe+0x124/0x15c
        really_probe+0x1b0/0x5f8
        __driver_probe_device+0x174/0x20c
        driver_probe_device+0x70/0x134
        __device_attach_driver+0x130/0x1d0
        bus_for_each_drv+0xfc/0x14c
        __device_attach+0x1bc/0x2bc
        device_initial_probe+0x1c/0x28
        bus_probe_device+0x94/0x178
        deferred_probe_work_func+0x1a4/0x1f0
        process_one_work+0x5d4/0x9dc
        worker_thread+0x898/0xccc
        kthread+0x2d4/0x3d4
        ret_from_fork+0x10/0x20

 -> rib#1 (crtc_ww_class_acquire){+.+.}-{0:0}:
        ww_acquire_init+0x1c4/0x2c8
        drm_modeset_acquire_init+0x44/0xc8
        drm_helper_probe_single_connector_modes+0xb0/0x12dc
        drm_mode_getconnector+0x5dc/0xfe8
        drm_ioctl_kernel+0x1e4/0x338
        drm_ioctl+0x3a4/0x684
        __arm64_sys_ioctl+0x118/0x154
        invoke_syscall+0x78/0x224
        el0_svc_common+0x178/0x200
        do_el0_svc+0x94/0x13c
        el0_svc+0x5c/0xec
        el0t_64_sync_handler+0x78/0x108
        el0t_64_sync+0x1a4/0x1a8

 -> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
        __lock_acquire+0x2650/0x672c
        lock_acquire+0x1b4/0x4ac
        __mutex_lock_common+0x174/0x1a64
        mutex_lock_nested+0x98/0xac
        dp_panel_add_fail_safe_mode+0x4c/0xa0
        dp_hpd_plug_handle+0x1f0/0x280
        dp_bridge_enable+0x94/0x2b8
        drm_atomic_bridge_chain_enable+0x11c/0x168
        drm_atomic_helper_commit_modeset_enables+0x500/0x740
        msm_atomic_commit_tail+0x3e4/0x748
        commit_tail+0x19c/0x278
        drm_atomic_helper_commit+0x1dc/0x1f0
        drm_atomic_commit+0xc0/0xd8
        drm_atomic_helper_set_config+0xb4/0x134
        drm_mode_setcrtc+0x688/0x1248
        drm_ioctl_kernel+0x1e4/0x338
        drm_ioctl+0x3a4/0x684
        __arm64_sys_ioctl+0x118/0x154
        invoke_syscall+0x78/0x224
        el0_svc_common+0x178/0x200
        do_el0_svc+0x94/0x13c
        el0_svc+0x5c/0xec
        el0t_64_sync_handler+0x78/0x108
        el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
--  [email protected]

Changes in v6:
--  fix Fixes commit ID

Fixes: 8b2c181 ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <[email protected]>
Signed-off-by: Kuogee Hsieh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
[  168.544078] ======================================================
[  168.550309] WARNING: possible circular locking dependency detected
[  168.556523] 5.16.0-kfd-fkuehlin torvalds#148 Tainted: G            E
[  168.562558] ------------------------------------------------------
[  168.568764] kfdtest/3479 is trying to acquire lock:
[  168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
		kfd_topology_device_by_id+0x16/0x60 [amdgpu] [  168.583663]
                but task is already holding lock:
[  168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
		vm_mmap_pgoff+0xa9/0x180 [  168.597755]
                which lock already depends on the new lock.

[  168.605970]
                the existing dependency chain (in reverse order) is:
[  168.613487]
                -> rib#3 (&mm->mmap_lock#2){++++}-{3:3}:
[  168.619700]        lock_acquire+0xca/0x2e0
[  168.623814]        down_read+0x3e/0x140
[  168.627676]        do_user_addr_fault+0x40d/0x690
[  168.632399]        exc_page_fault+0x6f/0x270
[  168.636692]        asm_exc_page_fault+0x1e/0x30
[  168.641249]        filldir64+0xc8/0x1e0
[  168.645115]        call_filldir+0x7c/0x110
[  168.649238]        ext4_readdir+0x58e/0x940
[  168.653442]        iterate_dir+0x16a/0x1b0
[  168.657558]        __x64_sys_getdents64+0x83/0x140
[  168.662375]        do_syscall_64+0x35/0x80
[  168.666492]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  168.672095]
                -> rib#2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
[  168.679008]        lock_acquire+0xca/0x2e0
[  168.683122]        down_read+0x3e/0x140
[  168.686982]        path_openat+0x5b2/0xa50
[  168.691095]        do_file_open_root+0xfc/0x190
[  168.695652]        file_open_root+0xd8/0x1b0
[  168.702010]        kernel_read_file_from_path_initns+0xc4/0x140
[  168.709542]        _request_firmware+0x2e9/0x5e0
[  168.715741]        request_firmware+0x32/0x50
[  168.721667]        amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
[  168.730060]        smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
[  168.738414]        fiji_start_smu+0xcf/0x4e0 [amdgpu]
[  168.745539]        pp_dpm_load_fw+0x21/0x30 [amdgpu]
[  168.752503]        amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
[  168.760698]        amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
[  168.768412]        amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
[  168.776285]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[  168.784034]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[  168.791161]        local_pci_probe+0x40/0x80
[  168.797027]        work_for_cpu_fn+0x10/0x20
[  168.802839]        process_one_work+0x273/0x5b0
[  168.808903]        worker_thread+0x20f/0x3d0
[  168.814700]        kthread+0x176/0x1a0
[  168.819968]        ret_from_fork+0x1f/0x30
[  168.825563]
                -> rib#1 (&adev->pm.mutex){+.+.}-{3:3}:
[  168.834721]        lock_acquire+0xca/0x2e0
[  168.840364]        __mutex_lock+0xa2/0x930
[  168.846020]        amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
[  168.853257]        amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
[  168.861547]        kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
[  168.869478]        kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
[  168.877884]        kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
[  168.885556]        kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
[  168.893347]        amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
[  168.901177]        amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
[  168.909025]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[  168.916458]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[  168.923442]        local_pci_probe+0x40/0x80
[  168.929249]        work_for_cpu_fn+0x10/0x20
[  168.935008]        process_one_work+0x273/0x5b0
[  168.940944]        worker_thread+0x20f/0x3d0
[  168.946623]        kthread+0x176/0x1a0
[  168.951765]        ret_from_fork+0x1f/0x30
[  168.957277]
                -> #0 (&topology_lock){++++}-{3:3}:
[  168.965993]        check_prev_add+0x8f/0xbf0
[  168.971613]        __lock_acquire+0x1299/0x1ca0
[  168.977485]        lock_acquire+0xca/0x2e0
[  168.982877]        down_read+0x3e/0x140
[  168.987975]        kfd_topology_device_by_id+0x16/0x60 [amdgpu]
[  168.995583]        kfd_device_by_id+0xa/0x20 [amdgpu]
[  169.002180]        kfd_mmap+0x95/0x200 [amdgpu]
[  169.008293]        mmap_region+0x337/0x5a0
[  169.013679]        do_mmap+0x3aa/0x540
[  169.018678]        vm_mmap_pgoff+0xdc/0x180
[  169.024095]        ksys_mmap_pgoff+0x186/0x1f0
[  169.029734]        do_syscall_64+0x35/0x80
[  169.035005]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  169.041754]
                other info that might help us debug this:

[  169.053276] Chain exists of:
                  &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2

[  169.068389]  Possible unsafe locking scenario:

[  169.076661]        CPU0                    CPU1
[  169.082383]        ----                    ----
[  169.088087]   lock(&mm->mmap_lock#2);
[  169.092922]                                lock(&type->i_mutex_dir_key#6);
[  169.100975]                                lock(&mm->mmap_lock#2);
[  169.108320]   lock(&topology_lock);
[  169.112957]
                 *** DEADLOCK ***

This commit fixes the deadlock warning by ensuring pm.mutex is not
held while holding the topology lock. For this, kfd_local_mem_info
is moved into the KFD dev struct and filled during device init.
This cached value can then be used instead of querying the value
again and again.

Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d805 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").

DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.

The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  rib#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  rib#2  0x00007fffffffd960 in ?? ()
  rib#3  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  Backtrace stopped: frame did not save the PC

Alan helpfully describes some rules for correctly maintaining the CFI information:

  1) Every adjustment to the current frame address reg (ie. r1) must be
     described, and exactly at the instruction where r1 changes. Why?
     Because stack unwinding might want to access previous frames.

  2) If a function changes LR or any non-volatile register, the save
     location for those regs must be given. The CFI can be at any
     instruction after the saves up to the point that the reg is
     changed.
     (Exception: LR save should be described before a bl. not after)

  3) If asychronous unwind info is needed then restores of LR and
     non-volatile regs must also be described. The CFI can be at any
     instruction after the reg is restored up to the point where the
     save location is (potentially) trashed.

Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.

Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.

Finally, add CFI directives describing the save/restore of r2.

With the fix gdb can correctly back trace and navigate up and down the stack:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  rib#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  rib#2  0x0000000100015b60 in gettime ()
  rib#3  0x000000010000c8bc in print_long_format ()
  rib#4  0x000000010000d180 in print_current_files ()
  rib#5  0x00000001000054ac in main ()
  (gdb) up
  rib#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  rib#2  0x0000000100015b60 in gettime ()
  (gdb)
  rib#3  0x000000010000c8bc in print_long_format ()
  (gdb)
  rib#4  0x000000010000d180 in print_current_files ()
  (gdb)
  rib#5  0x00000001000054ac in main ()
  (gdb)
  Initial frame selected; you cannot go up.
  (gdb) down
  rib#4  0x000000010000d180 in print_current_files ()
  (gdb)
  rib#3  0x000000010000c8bc in print_long_format ()
  (gdb)
  rib#2  0x0000000100015b60 in gettime ()
  (gdb)
  rib#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb)

Fixes: ce7d805 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
Cc: [email protected] # v5.11+
Reported-by: Alan Modra <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
…g/drm/drm-intel into drm-next

drm/i915 feature pull rib#2 for v5.19:

Features and functionality:
- Add first set of DG2 PCI IDs for "motherboard down" designs (Matt Roper)
- Add initial RPL-P PCI IDs as ADL-P subplatform (Matt Atwood)

Refactoring and cleanups:
- Power well refactoring and cleanup (Imre)
- GVT-g refactor and mdev API cleanup (Christoph, Jason, Zhi)
- DPLL refactoring and cleanup (Ville)
- VBT panel specific data parsing cleanup (Ville)
- Use drm_mode_init() for on-stack modes (Ville)

Fixes:
- Fix PSR state pipe A/B confusion by clearing more state on disable (José)
- Fix FIFO underruns caused by not taking DRAM channel into account (Vinod)
- Fix FBC flicker on display 11+ by enabling a workaround (José)
- Fix VBT seamless DRRS min refresh rate check (Ville)
- Fix panel type assumption on bogus VBT data (Ville)
- Fix panel data parsing for VBT that misses panel data pointers block (Ville)
- Fix spurious AUX timeout/hotplug handling on LTTPR links (Imre)

Merges:
- Backmerge drm-next (Jani)
- GVT changes (Jani)

Signed-off-by: Dave Airlie <[email protected]>
From: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Now when Intel Elkhart Lake uses again common bit timing and there are
no other users for custom bit timing, we can bring back the changes
done by the commit 0ddd83f ("can: m_can: remove support for
custom bit timing").

This effectively reverts commit ea768b2 ("Revert "can: m_can:
remove support for custom bit timing"") while taking into account
commit ea22ba4 ("can: m_can: make custom bittiming fields const")
and commit 7d4a101 ("can: dev: add sanity check in
can_set_static_ctrlmode()").

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Jarkko Nikula <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.

PID: 1449   TASK: ff187d28ed658040  CPU: 34  COMMAND: "ice-ptp-0000:51"
 #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be
 rib#1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d
 rib#2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd
 rib#3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54
 rib#4 [ff1966a94a713d08] no_context at ffffffff9d06bda4
 rib#5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c
 rib#6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4
 rib#7 [ff1966a94a713de0] page_fault at ffffffff9da0107e
    [exception RIP: ice_ptp_update_cached_phctime+91]
    RIP: ffffffffc076db8b  RSP: ff1966a94a713e98  RFLAGS: 00010246
    RAX: 16e3db9c6b7ccae4  RBX: ff187d269dd3c180  RCX: ff187d269cd4d018
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ff187d269cfcc644   R8: ff187d339b9641b0   R9: 0000000000000000
    R10: 0000000000000002  R11: 0000000000000000  R12: ff187d269cfcc648
    R13: ffffffff9f128784  R14: ffffffff9d101b70  R15: ff187d269cfcc640
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 rib#8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice]
 rib#9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b
 rib#10 [ff1966a94a713f10] kthread at ffffffff9d101b4d
 rib#11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f

Fixes: 77a7811 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Dave Cain <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
Currently, software objects of flow steering are created and destroyed
during reload flow. In case a device is unloaded, the following error
is printed during grace period:

 mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95):
    Driver is in error state. Unloading

As a solution to fix use-after-free bugs, where we try to access
these objects, when reading the value of flow_steering_mode devlink
param[1], let's split flow steering creation and destruction into two
routines:
    * init and cleanup: memory, cache, and pools allocation/free.
    * create and destroy: namespaces initialization and cleanup.

While at it, re-order the cleanup function to mirror the init function.

[1]
Kasan trace:

[  385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0
[  385.119849 ] Read of size 4 at addr ffff888104b79308 by task bash/291
[  385.119849 ]
[  385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ rib#2
[  385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  385.119849 ] Call Trace:
[  385.119849 ]  <TASK>
[  385.119849 ]  dump_stack_lvl+0x6e/0x91
[  385.119849 ]  print_address_description.constprop.0+0x1f/0x160
[  385.119849 ]  ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[  385.119849 ]  ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[  385.119849 ]  kasan_report.cold+0x83/0xdf
[  385.119849 ]  ? devlink_param_notify+0x20/0x190
[  385.119849 ]  ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[  385.119849 ]  mlx5_devlink_fs_mode_get+0x3b/0xa0
[  385.119849 ]  devlink_nl_param_fill+0x18a/0xa50
[  385.119849 ]  ? _raw_spin_lock_irqsave+0x8d/0xe0
[  385.119849 ]  ? devlink_flash_update_timeout_notify+0xf0/0xf0
[  385.119849 ]  ? __wake_up_common+0x4b/0x1e0
[  385.119849 ]  ? preempt_count_sub+0x14/0xc0
[  385.119849 ]  ? _raw_spin_unlock_irqrestore+0x28/0x40
[  385.119849 ]  ? __wake_up_common_lock+0xe3/0x140
[  385.119849 ]  ? __wake_up_common+0x1e0/0x1e0
[  385.119849 ]  ? __sanitizer_cov_trace_const_cmp8+0x27/0x80
[  385.119849 ]  ? __rcu_read_unlock+0x48/0x70
[  385.119849 ]  ? kasan_unpoison+0x23/0x50
[  385.119849 ]  ? __kasan_slab_alloc+0x2c/0x80
[  385.119849 ]  ? memset+0x20/0x40
[  385.119849 ]  ? __sanitizer_cov_trace_const_cmp4+0x25/0x80
[  385.119849 ]  devlink_param_notify+0xce/0x190
[  385.119849 ]  devlink_unregister+0x92/0x2b0
[  385.119849 ]  remove_one+0x41/0x140
[  385.119849 ]  pci_device_remove+0x68/0x140
[  385.119849 ]  ? pcibios_free_irq+0x10/0x10
[  385.119849 ]  __device_release_driver+0x294/0x3f0
[  385.119849 ]  device_driver_detach+0x82/0x130
[  385.119849 ]  unbind_store+0x193/0x1b0
[  385.119849 ]  ? subsys_interface_unregister+0x270/0x270
[  385.119849 ]  drv_attr_store+0x4e/0x70
[  385.119849 ]  ? drv_attr_show+0x60/0x60
[  385.119849 ]  sysfs_kf_write+0xa7/0xc0
[  385.119849 ]  kernfs_fop_write_iter+0x23a/0x2f0
[  385.119849 ]  ? sysfs_kf_bin_read+0x160/0x160
[  385.119849 ]  new_sync_write+0x311/0x430
[  385.119849 ]  ? new_sync_read+0x480/0x480
[  385.119849 ]  ? _raw_spin_lock+0x87/0xe0
[  385.119849 ]  ? __sanitizer_cov_trace_cmp4+0x25/0x80
[  385.119849 ]  ? security_file_permission+0x94/0xa0
[  385.119849 ]  vfs_write+0x4c7/0x590
[  385.119849 ]  ksys_write+0xf6/0x1e0
[  385.119849 ]  ? __x64_sys_read+0x50/0x50
[  385.119849 ]  ? fpregs_assert_state_consistent+0x99/0xa0
[  385.119849 ]  do_syscall_64+0x3d/0x90
[  385.119849 ]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  385.119849 ] RIP: 0033:0x7fc36ef38504
[  385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f
80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[  385.119849 ] RSP: 002b:00007ffde0ff3d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  385.119849 ] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc36ef38504
[  385.119849 ] RDX: 000000000000000c RSI: 00007fc370521040 RDI: 0000000000000001
[  385.119849 ] RBP: 00007fc370521040 R08: 00007fc36f00b8c0 R09: 00007fc36ee4b740
[  385.119849 ] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc36f00a760
[  385.119849 ] R13: 000000000000000c R14: 00007fc36f005760 R15: 000000000000000c
[  385.119849 ]  </TASK>
[  385.119849 ]
[  385.119849 ] Allocated by task 65:
[  385.119849 ]  kasan_save_stack+0x1e/0x40
[  385.119849 ]  __kasan_kmalloc+0x81/0xa0
[  385.119849 ]  mlx5_init_fs+0x11b/0x1160
[  385.119849 ]  mlx5_load+0x13c/0x220
[  385.119849 ]  mlx5_load_one+0xda/0x160
[  385.119849 ]  mlx5_recover_device+0xb8/0x100
[  385.119849 ]  mlx5_health_try_recover+0x2f9/0x3a1
[  385.119849 ]  devlink_health_reporter_recover+0x75/0x100
[  385.119849 ]  devlink_health_report+0x26c/0x4b0
[  385.275909 ]  mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0
[  385.275909 ]  process_one_work+0x520/0x970
[  385.275909 ]  worker_thread+0x378/0x950
[  385.275909 ]  kthread+0x1bb/0x200
[  385.275909 ]  ret_from_fork+0x1f/0x30
[  385.275909 ]
[  385.275909 ] Freed by task 65:
[  385.275909 ]  kasan_save_stack+0x1e/0x40
[  385.275909 ]  kasan_set_track+0x21/0x30
[  385.275909 ]  kasan_set_free_info+0x20/0x30
[  385.275909 ]  __kasan_slab_free+0xfc/0x140
[  385.275909 ]  kfree+0xa5/0x3b0
[  385.275909 ]  mlx5_unload+0x2e/0xb0
[  385.275909 ]  mlx5_unload_one+0x86/0xb0
[  385.275909 ]  mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf
[  385.275909 ]  process_one_work+0x520/0x970
[  385.275909 ]  worker_thread+0x378/0x950
[  385.275909 ]  kthread+0x1bb/0x200
[  385.275909 ]  ret_from_fork+0x1f/0x30
[  385.275909 ]
[  385.275909 ] The buggy address belongs to the object at ffff888104b79300
[  385.275909 ]  which belongs to the cache kmalloc-128 of size 128
[  385.275909 ] The buggy address is located 8 bytes inside of
[  385.275909 ]  128-byte region [ffff888104b79300, ffff888104b79380)
[  385.275909 ] The buggy address belongs to the page:
[  385.275909 ] page:00000000de44dd39 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104b78
[  385.275909 ] head:00000000de44dd39 order:1 compound_mapcount:0
[  385.275909 ] flags: 0x8000000000010200(slab|head|zone=2)
[  385.275909 ] raw: 8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0
[  385.275909 ] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[  385.275909 ] page dumped because: kasan: bad access detected
[  385.275909 ]
[  385.275909 ] Memory state around the buggy address:
[  385.275909 ]  ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
[  385.275909 ]  ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  385.275909 ] >ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  385.275909 ]                       ^
[  385.275909 ]  ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  385.275909 ]  ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  385.275909 ]]

Fixes: e890acd ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
matt-auld pushed a commit to matt-auld/linux that referenced this pull request May 24, 2022
The splat below can be seen when running kvm-unit-test:

     =============================
     WARNING: suspicious RCU usage
     5.18.0-rc7 rib#5 Tainted: G          IOE
     -----------------------------
     /home/kernel/linux/arch/x86/kvm/../../../virt/kvm/eventfd.c:80 RCU-list traversed in non-reader section!!

     other info that might help us debug this:

     rcu_scheduler_active = 2, debug_locks = 1
     4 locks held by qemu-system-x86/35124:
      #0: ffff9725391d80b8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x710 [kvm]
      rib#1: ffffbd25cfb2a0b8 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0xdeb/0x1900 [kvm]
      rib#2: ffffbd25cfb2b920 (&kvm->irq_srcu){....}-{0:0}, at: kvm_hv_notify_acked_sint+0x79/0x1e0 [kvm]
      rib#3: ffffbd25cfb2b920 (&kvm->irq_srcu){....}-{0:0}, at: irqfd_resampler_ack+0x5/0x110 [kvm]

     stack backtrace:
     CPU: 2 PID: 35124 Comm: qemu-system-x86 Tainted: G          IOE     5.18.0-rc7 rib#5
     Call Trace:
      <TASK>
      dump_stack_lvl+0x6c/0x9b
      irqfd_resampler_ack+0xfd/0x110 [kvm]
      kvm_notify_acked_gsi+0x32/0x90 [kvm]
      kvm_hv_notify_acked_sint+0xc5/0x1e0 [kvm]
      kvm_hv_set_msr_common+0xec1/0x1160 [kvm]
      kvm_set_msr_common+0x7c3/0xf60 [kvm]
      vmx_set_msr+0x394/0x1240 [kvm_intel]
      kvm_set_msr_ignored_check+0x86/0x200 [kvm]
      kvm_emulate_wrmsr+0x4f/0x1f0 [kvm]
      vmx_handle_exit+0x6fb/0x7e0 [kvm_intel]
      vcpu_enter_guest+0xe5a/0x1900 [kvm]
      kvm_arch_vcpu_ioctl_run+0x16e/0xac0 [kvm]
      kvm_vcpu_ioctl+0x279/0x710 [kvm]
      __x64_sys_ioctl+0x83/0xb0
      do_syscall_64+0x3b/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae

resampler-list is protected by irq_srcu (see kvm_irqfd_assign), so fix
the false positive by using list_for_each_entry_srcu().

Signed-off-by: Wanpeng Li <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants