consider interaction between i915 perf an cmd parser #9

rib · 2016-02-04T12:10:02Z

There's a slightly awkward backwards compatibility issue involving Mesa...

Mesa currently has some code to try and allow configuring the OA unit which involves always testing whether it's possible to write to oacontrol via an LOAD_REGISTER_IMM command which can interfere with the i915 perf driver managing oacontrol.

Trying to remove oacontrol from the whitelist in the cmd parser can result in some older versions of Mesa crashing. (todo: dig up which versions this affected)

If we land my GL_INTEL_performance_query driver then my series removes the can_write_oacontrol() test in intel_extensions.c

Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Tong Zhang <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] # v4.4+ Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 torvalds#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Dmitry reported a lockdep splat [1] (false positive) that we can fix by releasing the spinlock before calling icmp_send() from ip_expire() This is a false positive because sending an ICMP message can not possibly re-enter the IP frag engine. [1] [ INFO: possible circular locking dependency detected ] 4.10.0+ torvalds#29 Not tainted ------------------------------------------------------- modprobe/12392 is trying to acquire lock: (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock include/linux/spinlock.h:299 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock include/linux/netdevice.h:3486 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 but task is already holding lock: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> rib#1 (&(&q->lock)->rlock){+.-...}: validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669 ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713 packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459 deliver_skb net/core/dev.c:1834 [inline] dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890 xmit_one net/core/dev.c:2903 [inline] dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923 sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308 neigh_output include/net/neighbour.h:478 [inline] ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228 ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672 ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545 ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985 __sys_sendmmsg+0x25c/0x750 net/socket.c:2075 SYSC_sendmmsg net/socket.c:2106 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2101 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281 return_from_SYSCALL_64+0x0/0x7a -> #0 (_xmit_ETHER#2){+.-...}: check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 __read_once_size include/linux/compiler.h:254 [inline] atomic_read arch/x86/include/asm/atomic.h:26 [inline] rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] __rcu_is_watching kernel/rcu/tree.c:1133 [inline] rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); *** DEADLOCK *** 10 locks held by modprobe/12392: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>] __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336 rib#1: (rcu_read_lock){......}, at: [<ffffffff8188cab6>] filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324 rib#2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] spin_lock include/linux/spinlock.h:299 [inline] rib#2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] pte_alloc_one_map mm/memory.c:2944 [inline] rib#2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072 rib#3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] lockdep_copy_map include/linux/lockdep.h:175 [inline] rib#3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258 rib#4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] rib#4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 rib#5: (rcu_read_lock){......}, at: [<ffffffff8389a633>] ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216 rib#6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock include/linux/spinlock.h:309 [inline] rib#6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock net/ipv4/icmp.c:219 [inline] rib#6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681 rib#7: (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>] ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198 rib#8: (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>] __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324 rib#9: (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 stack backtrace: CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ torvalds#29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:52 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204 check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline] RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10 RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000 R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25 R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000 </IRQ> rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 RIP: 0033:0x7f83172f2786 RSP: 002b:00007fffe859ae80 EFLAGS: 00010293 RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970 RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000 R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040 R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] rib#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] rib#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] rib#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] rib#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] rib#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] rib#14 [ffff8800a57bbe00] evict at ffffffff811e1b67 rib#15 [ffff8800a57bbe28] iput at ffffffff811e23a5 rib#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 rib#17 [ffff8800a57bbe88] dput at ffffffff811dd06c rib#18 [ffff8800a57bbea8] __fput at ffffffff811c823b rib#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e rib#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 rib#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c rib#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: rib#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . rib#9 [] tcp_rcv_established at ffffffff81580b64 rib#10 [] tcp_v4_do_rcv at ffffffff8158b54a rib#11 [] tcp_v4_rcv at ffffffff8158cd02 rib#12 [] ip_local_deliver_finish at ffffffff815668f4 rib#13 [] ip_local_deliver at ffffffff81566bd9 rib#14 [] ip_rcv_finish at ffffffff8156656d rib#15 [] ip_rcv at ffffffff81566f06 rib#16 [] __netif_receive_skb_core at ffffffff8152b3a2 rib#17 [] __netif_receive_skb at ffffffff8152b608 rib#18 [] netif_receive_skb at ffffffff8152b690 rib#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] rib#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] rib#21 [] net_rx_action at ffffffff8152bac2 rib#22 [] __do_softirq at ffffffff81084b4f rib#23 [] call_softirq at ffffffff8164845c rib#24 [] do_softirq at ffffffff81016fc5 rib#25 [] irq_exit at ffffffff81084ee5 torvalds#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

mipsxx_pmu_handle_shared_irq() calls irq_work_run() while holding the pmuint_rwlock for read. irq_work_run() can, via perf_pending_event(), call try_to_wake_up() which can try to take rq->lock. However, perf can also call perf_pmu_enable() (and thus take the pmuint_rwlock for write) while holding the rq->lock, from finish_task_switch() via perf_event_context_sched_in(). This leads to an ABBA deadlock: PID: 3855 TASK: 8f7ce288 CPU: 2 COMMAND: "process" #0 [89c39ac8] __delay at 803b5be4 rib#1 [89c39ac8] do_raw_spin_lock at 8008fdcc rib#2 [89c39af8] try_to_wake_up at 8006e47c rib#3 [89c39b38] pollwake at 8018eab0 rib#4 [89c39b68] __wake_up_common at 800879f4 rib#5 [89c39b98] __wake_up at 800880e4 rib#6 [89c39bc8] perf_event_wakeup at 8012109c rib#7 [89c39be8] perf_pending_event at 80121184 rib#8 [89c39c08] irq_work_run_list at 801151f0 rib#9 [89c39c38] irq_work_run at 80115274 rib#10 [89c39c50] mipsxx_pmu_handle_shared_irq at 8002cc7c PID: 1481 TASK: 8eaac6a8 CPU: 3 COMMAND: "process" #0 [8de7f900] do_raw_write_lock at 800900e0 rib#1 [8de7f918] perf_event_context_sched_in at 80122310 rib#2 [8de7f938] __perf_event_task_sched_in at 80122608 rib#3 [8de7f958] finish_task_switch at 8006b8a4 rib#4 [8de7f998] __schedule at 805e4dc4 rib#5 [8de7f9f8] schedule at 805e5558 rib#6 [8de7fa10] schedule_hrtimeout_range_clock at 805e9984 rib#7 [8de7fa70] poll_schedule_timeout at 8018e8f8 rib#8 [8de7fa88] do_select at 8018f338 rib#9 [8de7fd88] core_sys_select at 8018f5cc rib#10 [8de7fee0] sys_select at 8018f854 rib#11 [8de7ff28] syscall_common at 80028fc8 The lock seems to be there to protect the hardware counters so there is no need to hold it across irq_work_run(). Signed-off-by: Rabin Vincent <[email protected]> Signed-off-by: Ralf Baechle <[email protected]>

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: rib#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . rib#9 [] tcp_rcv_established at ffffffff81580b64 rib#10 [] tcp_v4_do_rcv at ffffffff8158b54a rib#11 [] tcp_v4_rcv at ffffffff8158cd02 rib#12 [] ip_local_deliver_finish at ffffffff815668f4 rib#13 [] ip_local_deliver at ffffffff81566bd9 rib#14 [] ip_rcv_finish at ffffffff8156656d rib#15 [] ip_rcv at ffffffff81566f06 rib#16 [] __netif_receive_skb_core at ffffffff8152b3a2 rib#17 [] __netif_receive_skb at ffffffff8152b608 rib#18 [] netif_receive_skb at ffffffff8152b690 rib#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] rib#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] rib#21 [] net_rx_action at ffffffff8152bac2 rib#22 [] __do_softirq at ffffffff81084b4f rib#23 [] call_softirq at ffffffff8164845c rib#24 [] do_softirq at ffffffff81016fc5 rib#25 [] irq_exit at ffffffff81084ee5 torvalds#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Andrey reported a lockdep warning on non-initialized spinlock: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 4099 Comm: a.out Not tainted 4.12.0-rc6+ rib#9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x717/0x1aa0 kernel/locking/lockdep.c:755 ? 0xffffffffa0000000 __lock_acquire+0x269/0x3690 kernel/locking/lockdep.c:3255 lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855 __raw_spin_lock_bh ./include/linux/spinlock_api_smp.h:135 _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:175 spin_lock_bh ./include/linux/spinlock.h:304 ip_mc_clear_src+0x27/0x1e0 net/ipv4/igmp.c:2076 igmpv3_clear_delrec+0xee/0x4f0 net/ipv4/igmp.c:1194 ip_mc_destroy_dev+0x4e/0x190 net/ipv4/igmp.c:1736 We miss a spin_lock_init() in igmpv3_add_delrec(), probably because previously we never use it on this code path. Since we already unlink it from the global mc_tomb list, it is probably safe not to acquire this spinlock here. It does not harm to have it although, to avoid conditional locking. Fixes: c38b7d3 ("igmp: acquire pmc lock for ip_mc_clear_src()") Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () rib#1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 rib#2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 rib#3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 rib#4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 rib#5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 rib#6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 rib#7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 rib#8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 rib#9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 rib#10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 rib#11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list | wc -l 38 # perf buildid-list | head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: Thomas-Mich Richter <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Hendrik Brueckner <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas-Mich Richter <[email protected]> Fixes: f3b3614 ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

The dummy-hcd driver calls the gadget driver's disconnect callback under the wrong conditions. It should invoke the callback when Vbus power is turned off, but instead it does so when the D+ pullup is turned off. This can cause a deadlock in the composite core when a gadget driver is unregistered: [ 88.361471] ============================================ [ 88.362014] WARNING: possible recursive locking detected [ 88.362580] 4.14.0-rc2+ rib#9 Not tainted [ 88.363010] -------------------------------------------- [ 88.363561] v4l_id/526 is trying to acquire lock: [ 88.364062] (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547e03>] composite_disconnect+0x43/0x100 [libcomposite] [ 88.365051] [ 88.365051] but task is already holding lock: [ 88.365826] (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547b09>] usb_function_deactivate+0x29/0x80 [libcomposite] [ 88.366858] [ 88.366858] other info that might help us debug this: [ 88.368301] Possible unsafe locking scenario: [ 88.368301] [ 88.369304] CPU0 [ 88.369701] ---- [ 88.370101] lock(&(&cdev->lock)->rlock); [ 88.370623] lock(&(&cdev->lock)->rlock); [ 88.371145] [ 88.371145] *** DEADLOCK *** [ 88.371145] [ 88.372211] May be due to missing lock nesting notation [ 88.372211] [ 88.373191] 2 locks held by v4l_id/526: [ 88.373715] #0: (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547b09>] usb_function_deactivate+0x29/0x80 [libcomposite] [ 88.374814] rib#1: (&(&dum_hcd->dum->lock)->rlock){....}, at: [<ffffffffa05bd48d>] dummy_pullup+0x7d/0xf0 [dummy_hcd] [ 88.376289] [ 88.376289] stack backtrace: [ 88.377726] CPU: 0 PID: 526 Comm: v4l_id Not tainted 4.14.0-rc2+ rib#9 [ 88.378557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 88.379504] Call Trace: [ 88.380019] dump_stack+0x86/0xc7 [ 88.380605] __lock_acquire+0x841/0x1120 [ 88.381252] lock_acquire+0xd5/0x1c0 [ 88.381865] ? composite_disconnect+0x43/0x100 [libcomposite] [ 88.382668] _raw_spin_lock_irqsave+0x40/0x54 [ 88.383357] ? composite_disconnect+0x43/0x100 [libcomposite] [ 88.384290] composite_disconnect+0x43/0x100 [libcomposite] [ 88.385490] set_link_state+0x2d4/0x3c0 [dummy_hcd] [ 88.386436] dummy_pullup+0xa7/0xf0 [dummy_hcd] [ 88.387195] usb_gadget_disconnect+0xd8/0x160 [udc_core] [ 88.387990] usb_gadget_deactivate+0xd3/0x160 [udc_core] [ 88.388793] usb_function_deactivate+0x64/0x80 [libcomposite] [ 88.389628] uvc_function_disconnect+0x1e/0x40 [usb_f_uvc] This patch changes the code to test the port-power status bit rather than the port-connect status bit when deciding whether to isue the callback. Signed-off-by: Alan Stern <[email protected]> Reported-by: David Tulloh <[email protected]> CC: <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>

The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B), contradicts other flows such as ipoib_open possibly causing a deadlock. To prevent this deadlock heavy flush is called with RTNL locked and only then tries to acquire vlan_rwsem. This deadlock is possible only when there are child interfaces. [ 140.941758] ====================================================== [ 140.946276] WARNING: possible circular locking dependency detected [ 140.950950] 4.15.0-rc1+ rib#9 Tainted: G O [ 140.954797] ------------------------------------------------------ [ 140.959424] kworker/u32:1/146 is trying to acquire lock: [ 140.963450] (rtnl_mutex){+.+.}, at: [<ffffffffc083516a>] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib] [ 140.970006] but task is already holding lock: [ 140.975141] (&priv->vlan_rwsem){++++}, at: [<ffffffffc0834ee1>] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib] [ 140.982105] which lock already depends on the new lock. [ 140.990023] the existing dependency chain (in reverse order) is: [ 140.998650] -> rib#1 (&priv->vlan_rwsem){++++}: [ 141.005276] down_read+0x4d/0xb0 [ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib] [ 141.014400] __dev_open+0xcb/0x140 [ 141.017919] __dev_change_flags+0x1a4/0x1e0 [ 141.022133] dev_change_flags+0x23/0x60 [ 141.025695] devinet_ioctl+0x704/0x7d0 [ 141.029156] sock_do_ioctl+0x20/0x50 [ 141.032526] sock_ioctl+0x221/0x300 [ 141.036079] do_vfs_ioctl+0xa6/0x6d0 [ 141.039656] SyS_ioctl+0x74/0x80 [ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96 [ 141.046891] -> #0 (rtnl_mutex){+.+.}: [ 141.051701] lock_acquire+0xd4/0x220 [ 141.055212] __mutex_lock+0x88/0x970 [ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib] [ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib] [ 141.067648] process_one_work+0x1f5/0x610 [ 141.071429] worker_thread+0x4a/0x3f0 [ 141.074890] kthread+0x141/0x180 [ 141.078085] ret_from_fork+0x24/0x30 [ 141.081559] other info that might help us debug this: [ 141.088967] Possible unsafe locking scenario: [ 141.094280] CPU0 CPU1 [ 141.097953] ---- ---- [ 141.101640] lock(&priv->vlan_rwsem); [ 141.104771] lock(rtnl_mutex); [ 141.109207] lock(&priv->vlan_rwsem); [ 141.114032] lock(rtnl_mutex); [ 141.116800] *** DEADLOCK *** Fixes: b4b678b ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop") Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>

It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) rib#1 0x00007fc08889c2f3 malloc (libc.so.6) rib#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) rib#3 0x0000560e6005e75c n/a (urxvt) rib#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) rib#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) rib#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) rib#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) rib#8 0x0000560e6005cb55 ev_run (urxvt) rib#9 0x0000560e6003b9b9 main (urxvt) rib#10 0x00007fc08883af4a __libc_start_main (libc.so.6) rib#11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/[email protected] Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <[email protected]> Reported-by: Sergey Senozhatsky <[email protected]> Tested-by: Sergey Senozhatsky <[email protected]> Suggested-by: Minchan Kim <[email protected]> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: <[email protected]> [4.14] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

This patch fixes NULL pointer crash due to active timer running for abort IOCB. From crash dump analysis it was discoverd that get_next_timer_interrupt() encountered a corrupted entry on the timer list. rib#9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8 [exception RIP: get_next_timer_interrupt+440] RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013 RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600 RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6 RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001 R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6 R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Looking at the assembly of get_next_timer_interrupt(), address came from %r8 (ffff95e1f6451188) which is pointing to list_head with single entry at ffff95e5ff621178. 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1) 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx) crash> rd ffff95e1f6451188 10 ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b..... ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E..... ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E..... ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|.... ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E..... crash> rd ffff95e5ff621178 10 ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y.... ffff95e5ff621188: 0000000000000000 00000000ffffffff ................ ffff95e5ff621198: 00000000000000a0 0000000000000010 ................ ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b............. ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q.... ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080. CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13 FREE / [ALLOCATED] ffff95e5ff621080 (cpu 6 cache) Examining the contents of that memory reveals a pointer to a constant string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). crash> rd ffffffffc059277c 20 ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma crash> struct -ox srb_iocb struct srb_iocb { union { struct {...} logio; struct {...} els_logo; struct {...} tmf; struct {...} fxiocb; struct {...} abt; struct ct_arg ctarg; struct {...} mbx; struct {...} nack; [0x0 ] } u; [0xb8] struct timer_list timer; [0x108] void (*timeout)(void *); } SIZE: 0x110 crash> ! bc ibase=16 obase=10 B8+40 F8 The object is a srb_t, and at offset 0xf8 within that structure (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list. Cc: <[email protected]> rib#4.4+ Fixes: 4440e46 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.") Signed-off-by: Himanshu Madhani <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Not only is the context suspect to disappearing, but so is it's timeline. Under a lockless inspection of the requests for debugging from intel_engine_dump(), the context may already have been freed and we have to check before chasing the dangling pointer. [28033.681755] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel snd_pcm mei_me mei i915 r8169 mii prime_numbers i2c_hid [28033.681796] CPU: 3 PID: 3058 Comm: gem_exec_schedu Tainted: G U 4.16.0-rc5+ rib#9 [28033.681804] Hardware name: Acer Aspire E5-575G/Ironman_SK , BIOS V1.12 08/02/2016 [28033.681834] RIP: 0010:print_request+0x2b/0xb0 [i915] [28033.681840] RSP: 0018:ffffc90004afbc18 EFLAGS: 00010202 [28033.681847] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801921b5a40 RCX: 0000000000000006 [28033.681854] RDX: ffffc90004afbc60 RSI: ffff8801921b5a40 RDI: 0000000000000004 [28033.681861] RBP: ffffc90004afbd80 R08: 0000000000000000 R09: 0000000000000001 [28033.681868] R10: ffffc90004afbbd0 R11: ffffc90004afbc73 R12: ffffc90004afbc60 [28033.681875] R13: ffffc90004afbd80 R14: ffff8801d40ec670 R15: ffff8801921b5a40 [28033.681883] FS: 00007fbba5f6c8c0(0000) GS:ffff8801e8400000(0000) knlGS:0000000000000000 [28033.681891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [28033.681897] CR2: 00007fbba5f8f000 CR3: 00000001b2efa002 CR4: 00000000003606e0 [28033.681904] Call Trace: [28033.681932] intel_engine_print_registers+0x6a7/0x930 [i915] [28033.681962] intel_engine_dump+0x30d/0x740 [i915] [28033.681971] ? seq_printf+0x3a/0x50 [28033.681995] i915_engine_info+0xb8/0xe0 [i915] [28033.682003] ? drm_get_color_range_name+0x20/0x20 [28033.682010] seq_read+0xe1/0x440 [28033.682018] full_proxy_read+0x51/0x80 [28033.682025] __vfs_read+0x21/0x130 [28033.682031] ? do_sys_open+0x134/0x220 [28033.682037] ? kmem_cache_free+0x177/0x2b0 [28033.682043] vfs_read+0xa1/0x150 [28033.682049] SyS_read+0x40/0xa0 [28033.682055] do_syscall_64+0x6b/0x1b0 [28033.682063] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [28033.682069] RIP: 0033:0x7fbba4655d11 [28033.682074] RSP: 002b:00007ffd8c49da58 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [28033.682082] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbba4655d11 [28033.682089] RDX: 000000000000003f RSI: 00005647bfbfc260 RDI: 0000000000000006 [28033.682096] RBP: 000000000000003f R08: 00000000ffffffff R09: 0000000000000000 [28033.682104] R10: 0000000000000000 R11: 0000000000000246 R12: 00005647bfbfc260 [28033.682111] R13: 0000000000000006 R14: 0000000000000000 R15: 00005647bfbfc260 [28033.682119] Code: 41 55 41 54 49 89 d4 55 53 48 89 fd 48 8b 86 c8 00 00 00 48 8b 3d d6 1e 14 e2 48 89 f3 48 2b be a8 02 00 00 48 8b 80 b0 00 00 00 <4c> 8b 68 18 e8 bc 80 02 e1 8b 8b 70 02 00 00 8b b3 28 02 00 00 [28033.682206] RIP: print_request+0x2b/0xb0 [i915] RSP: ffffc90004afbc18 Reported-by: Mika Kuoppala <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

platform_domain_notifier contains a variable sized array, which the pm_clk_notify() notifier treats as a NULL terminated array: for (con_id = clknb->con_ids; *con_id; con_id++) pm_clk_add(dev, *con_id); Omitting the initialiser for con_ids means that the array is zero sized, and there is no NULL terminator. This leads to pm_clk_notify() overrunning into what ever structure follows, which may not be NULL. This leads to an oops: Unable to handle kernel NULL pointer dereference at virtual address 0000008c pgd = c0003000 [0000008c] *pgd=80000800004003c, *pmd=00000000c Internal error: Oops: 206 [rib#1] PREEMPT SMP ARM Modules linked in:c CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ rib#9 Hardware name: Keystone PC is at strlen+0x0/0x34 LR is at kstrdup+0x18/0x54 pc : [<c0623340>] lr : [<c0111d6c>] psr: 20000013 sp : eec73dc0 ip : eed780c0 fp : 00000001 r10: 00000000 r9 : 00000000 r8 : eed71e10 r7 : 0000008c r6 : 0000008c r5 : 014000c0 r4 : c03a6ff4 r3 : c09445d0 r2 : 00000000 r1 : 014000c0 r0 : 0000008c Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 00003000 DAC: fffffffd Process swapper/0 (pid: 1, stack limit = 0xeec72210) Stack: (0xeec73dc0 to 0xeec74000) ... [<c0623340>] (strlen) from [<c0111d6c>] (kstrdup+0x18/0x54) [<c0111d6c>] (kstrdup) from [<c03a6ff4>] (__pm_clk_add+0x58/0x120) [<c03a6ff4>] (__pm_clk_add) from [<c03a731c>] (pm_clk_notify+0x64/0xa8) [<c03a731c>] (pm_clk_notify) from [<c004614c>] (notifier_call_chain+0x44/0x84) [<c004614c>] (notifier_call_chain) from [<c0046320>] (__blocking_notifier_call_chain+0x48/0x60) [<c0046320>] (__blocking_notifier_call_chain) from [<c0046350>] (blocking_notifier_call_chain+0x18/0x20) [<c0046350>] (blocking_notifier_call_chain) from [<c0390234>] (device_add+0x36c/0x534) [<c0390234>] (device_add) from [<c047fc00>] (of_platform_device_create_pdata+0x70/0xa4) [<c047fc00>] (of_platform_device_create_pdata) from [<c047fea0>] (of_platform_bus_create+0xf0/0x1ec) [<c047fea0>] (of_platform_bus_create) from [<c047fff8>] (of_platform_populate+0x5c/0xac) [<c047fff8>] (of_platform_populate) from [<c08b1f04>] (of_platform_default_populate_init+0x8c/0xa8) [<c08b1f04>] (of_platform_default_populate_init) from [<c000a78c>] (do_one_initcall+0x3c/0x164) [<c000a78c>] (do_one_initcall) from [<c087bd9c>] (kernel_init_freeable+0x10c/0x1d0) [<c087bd9c>] (kernel_init_freeable) from [<c0628db0>] (kernel_init+0x8/0xf0) [<c0628db0>] (kernel_init) from [<c00090d8>] (ret_from_fork+0x14/0x3c) Exception stack(0xeec73fb0 to 0xeec73ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: e3520000 1afffff7 e12fff1e c0801730 (e5d02000) ---[ end trace cafa8f148e262e80 ]--- Fix this by adding the necessary initialiser. Fixes: fc20ffe ("ARM: keystone: add PM domain support for clock management") Signed-off-by: Russell King <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: Olof Johansson <[email protected]>

For psr block rib#9, the vbt description has moved to options [0-3] for TP1,TP2,TP3 Wakeup time from decimal value without any change to vbt structure. Since spec does not mention from which VBT version this change was added to vbt.bsf file, we cannot depend on bdb->version check to change for all the platforms. There is RCR inplace for GOP team to provide the version number to make generic change. Since Kabylake with bdb version 209 is having this change, limiting this change to gen9_bc and version 209+ to unblock google. Tested on skl(bdb version 203,without options) and kabylake(bdb version 209,212) having new options. bspec 20131 v2: (Jani and Rodrigo) move the 165 version check to intel_bios.c v3: Jani Move the abstraction to intel_bios. v4: Jani Rename tp*_wakeup_time to have "us" suffix. For values outside range[0-3],default to max 2500us. Old decimal value was wake up time in multiples of 100us. v5: Jani and Rodrigo Handle option 2 in default condition. Print oustide range value. For negetive values default to 2500us. v6: Jani Handle default first and then fall through for case 2. v7: Rodrigo Apply this change for IS_GEN9_BC and vbt version > 209 v8: Puthik Add new function vbt_psr_to_us. v9: Jani Change to v7 version as it's more readable. DK add comment /*fall through*/ after case2. Cc: Rodrigo Vivi <[email protected]> Cc: Puthikorn Voravootivat <[email protected]> Cc: Dhinakaran Pandiyan <[email protected]> Cc: Jani Nikula <[email protected]> Cc: José Roberto de Souza <[email protected]> Signed-off-by: Maulik V Vaghela <[email protected]> Signed-off-by: Vathsala Nagaraju <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Defer probe of qman portals after qman probing. This fixes the crash below, seen on NXP LS1043A SoCs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [0000000000000004] user address but active_mm is swapper Internal error: Oops: 96000004 [rib#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc1-next-20180622-00200-g986f5c179185 rib#9 Hardware name: LS1043A RDB Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO) pc : qman_set_sdest+0x74/0xa0 lr : qman_portal_probe+0x22c/0x470 sp : ffff00000803bbc0 x29: ffff00000803bbc0 x28: 0000000000000000 x27: ffff0000090c1b88 x26: ffff00000927cb68 x25: ffff00000927c000 x24: ffff00000927cb60 x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000090e9000 x20: ffff800073b5c810 x19: ffff800027401298 x18: ffffffffffffffff x17: 0000000000000001 x16: 0000000000000000 x15: ffff0000090e96c8 x14: ffff80002740138a x13: ffff0000090f2000 x12: 0000000000000030 x11: ffff000008f25000 x10: 0000000000000000 x9 : ffff80007bdfd2c0 x8 : 0000000000004000 x7 : ffff80007393cc18 x6 : 0040000000000001 x5 : 0000000000000000 x4 : ffffffffffffffff x3 : 0000000000000004 x2 : ffff00000927c900 x1 : 0000000000000000 x0 : 0000000000000004 Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) Call trace: qman_set_sdest+0x74/0xa0 platform_drv_probe+0x50/0xa8 driver_probe_device+0x214/0x2f8 __driver_attach+0xd8/0xe0 bus_for_each_dev+0x68/0xc8 driver_attach+0x20/0x28 bus_add_driver+0x108/0x228 driver_register+0x60/0x110 __platform_driver_register+0x40/0x48 qman_portal_driver_init+0x20/0x84 do_one_initcall+0x58/0x168 kernel_init_freeable+0x184/0x22c kernel_init+0x10/0x108 ret_from_fork+0x10/0x18 Code: f9400443 11001000 927e4800 8b000063 (b9400063) ---[ end trace 4f6d50489ecfb930 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Laurentiu Tudor <[email protected]> Signed-off-by: Li Yang <[email protected]>

Seeing as we use this registermap in the context of our IRQ handlers, we need to be using spinlocks for reading/writing registers so that we can still read them from IRQ handlers without having to grab any mutexes and accidentally sleep. We don't currently do this, as pointed out by lockdep: [ 18.403770] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908 [ 18.406744] in_atomic(): 1, irqs_disabled(): 128, pid: 68, name: kworker/u17:0 [ 18.413864] INFO: lockdep is turned off. [ 18.417675] irq event stamp: 12 [ 18.420778] hardirqs last enabled at (11): [<ffff000008a4f57c>] _raw_spin_unlock_irq+0x2c/0x60 [ 18.429510] hardirqs last disabled at (12): [<ffff000008a48914>] __schedule+0xc4/0xa60 [ 18.437345] softirqs last enabled at (0): [<ffff0000080b55e0>] copy_process.isra.4.part.5+0x4d8/0x1c50 [ 18.446684] softirqs last disabled at (0): [<0000000000000000>] (null) [ 18.453979] CPU: 0 PID: 68 Comm: kworker/u17:0 Tainted: G W O 4.20.0-rc3Lyude-Test+ rib#9 [ 18.469839] Hardware name: amlogic khadas-vim2/khadas-vim2, BIOS 2018.07-rc2-armbian 09/11/2018 [ 18.480037] Workqueue: hci0 hci_power_on [bluetooth] [ 18.487138] Call trace: [ 18.494192] dump_backtrace+0x0/0x1b8 [ 18.501280] show_stack+0x14/0x20 [ 18.508361] dump_stack+0xbc/0xf4 [ 18.515427] ___might_sleep+0x140/0x1d8 [ 18.522515] __might_sleep+0x50/0x88 [ 18.529582] __mutex_lock+0x60/0x870 [ 18.536621] mutex_lock_nested+0x1c/0x28 [ 18.543660] regmap_lock_mutex+0x10/0x18 [ 18.550696] regmap_read+0x38/0x70 [ 18.557727] dw_hdmi_hardirq+0x58/0x138 [dw_hdmi] [ 18.564804] __handle_irq_event_percpu+0xac/0x410 [ 18.571891] handle_irq_event_percpu+0x34/0x88 [ 18.578982] handle_irq_event+0x48/0x78 [ 18.586051] handle_fasteoi_irq+0xac/0x160 [ 18.593061] generic_handle_irq+0x24/0x38 [ 18.599989] __handle_domain_irq+0x60/0xb8 [ 18.606857] gic_handle_irq+0x50/0xa0 [ 18.613659] el1_irq+0xb4/0x130 [ 18.620394] debug_lockdep_rcu_enabled+0x2c/0x30 [ 18.627111] schedule+0x38/0xa0 [ 18.633781] schedule_timeout+0x3a8/0x510 [ 18.640389] wait_for_common+0x15c/0x180 [ 18.646905] wait_for_completion+0x14/0x20 [ 18.653319] mmc_wait_for_req_done+0x28/0x168 [ 18.659693] mmc_wait_for_req+0xa8/0xe8 [ 18.665978] mmc_wait_for_cmd+0x64/0x98 [ 18.672180] mmc_io_rw_direct_host+0x94/0x130 [ 18.678385] mmc_io_rw_direct+0x10/0x18 [ 18.684516] sdio_enable_func+0xe8/0x1d0 [ 18.690627] btsdio_open+0x24/0xc0 [btsdio] [ 18.696821] hci_dev_do_open+0x64/0x598 [bluetooth] [ 18.703025] hci_power_on+0x50/0x270 [bluetooth] [ 18.709163] process_one_work+0x2a0/0x6e0 [ 18.715252] worker_thread+0x40/0x448 [ 18.721310] kthread+0x12c/0x130 [ 18.727326] ret_from_fork+0x10/0x1c [ 18.735555] ------------[ cut here ]------------ [ 18.741430] do not call blocking ops when !TASK_RUNNING; state=2 set at [<000000006265ec59>] wait_for_common+0x140/0x180 [ 18.752417] WARNING: CPU: 0 PID: 68 at kernel/sched/core.c:6096 __might_sleep+0x7c/0x88 [ 18.760553] Modules linked in: dm_mirror dm_region_hash dm_log dm_mod btsdio bluetooth snd_soc_hdmi_codec dw_hdmi_i2s_audio ecdh_generic brcmfmac brcmutil cfg80211 rfkill ir_nec_decoder meson_dw_hdmi(O) dw_hdmi rc_geekbox meson_rng meson_ir ao_cec rng_core rc_core cec leds_pwm efivars nfsd ip_tables x_tables crc32_generic f2fs uas meson_gxbb_wdt pwm_meson efivarfs ipv6 [ 18.799469] CPU: 0 PID: 68 Comm: kworker/u17:0 Tainted: G W O 4.20.0-rc3Lyude-Test+ rib#9 [ 18.808858] Hardware name: amlogic khadas-vim2/khadas-vim2, BIOS 2018.07-rc2-armbian 09/11/2018 [ 18.818045] Workqueue: hci0 hci_power_on [bluetooth] [ 18.824088] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 18.829891] pc : __might_sleep+0x7c/0x88 [ 18.835722] lr : __might_sleep+0x7c/0x88 [ 18.841256] sp : ffff000008003cb0 [ 18.846751] x29: ffff000008003cb0 x28: 0000000000000000 [ 18.852269] x27: ffff00000938e000 x26: ffff800010283000 [ 18.857726] x25: ffff800010353280 x24: ffff00000868ef50 [ 18.863166] x23: 0000000000000000 x22: 0000000000000000 [ 18.868551] x21: 0000000000000000 x20: 000000000000038c [ 18.873850] x19: ffff000008cd08c0 x18: 0000000000000010 [ 18.879081] x17: ffff000008a68cb0 x16: 0000000000000000 [ 18.884197] x15: 0000000000aaaaaa x14: 0e200e200e200e20 [ 18.889239] x13: 0000000000000001 x12: 00000000ffffffff [ 18.894261] x11: ffff000008adfa48 x10: 0000000000000001 [ 18.899517] x9 : ffff0000092a0158 x8 : 0000000000000000 [ 18.904674] x7 : ffff00000812136c x6 : 0000000000000000 [ 18.909895] x5 : 0000000000000000 x4 : 0000000000000001 [ 18.915080] x3 : 0000000000000007 x2 : 0000000000000007 [ 18.920269] x1 : 99ab8e9ebb6c8500 x0 : 0000000000000000 [ 18.925443] Call trace: [ 18.929904] __might_sleep+0x7c/0x88 [ 18.934311] __mutex_lock+0x60/0x870 [ 18.938687] mutex_lock_nested+0x1c/0x28 [ 18.943076] regmap_lock_mutex+0x10/0x18 [ 18.947453] regmap_read+0x38/0x70 [ 18.951842] dw_hdmi_hardirq+0x58/0x138 [dw_hdmi] [ 18.956269] __handle_irq_event_percpu+0xac/0x410 [ 18.960712] handle_irq_event_percpu+0x34/0x88 [ 18.965176] handle_irq_event+0x48/0x78 [ 18.969612] handle_fasteoi_irq+0xac/0x160 [ 18.974058] generic_handle_irq+0x24/0x38 [ 18.978501] __handle_domain_irq+0x60/0xb8 [ 18.982938] gic_handle_irq+0x50/0xa0 [ 18.987351] el1_irq+0xb4/0x130 [ 18.991734] debug_lockdep_rcu_enabled+0x2c/0x30 [ 18.996180] schedule+0x38/0xa0 [ 19.000609] schedule_timeout+0x3a8/0x510 [ 19.005064] wait_for_common+0x15c/0x180 [ 19.009513] wait_for_completion+0x14/0x20 [ 19.013951] mmc_wait_for_req_done+0x28/0x168 [ 19.018402] mmc_wait_for_req+0xa8/0xe8 [ 19.022809] mmc_wait_for_cmd+0x64/0x98 [ 19.027177] mmc_io_rw_direct_host+0x94/0x130 [ 19.031563] mmc_io_rw_direct+0x10/0x18 [ 19.035922] sdio_enable_func+0xe8/0x1d0 [ 19.040294] btsdio_open+0x24/0xc0 [btsdio] [ 19.044742] hci_dev_do_open+0x64/0x598 [bluetooth] [ 19.049228] hci_power_on+0x50/0x270 [bluetooth] [ 19.053687] process_one_work+0x2a0/0x6e0 [ 19.058143] worker_thread+0x40/0x448 [ 19.062608] kthread+0x12c/0x130 [ 19.067064] ret_from_fork+0x10/0x1c [ 19.071513] irq event stamp: 12 [ 19.075937] hardirqs last enabled at (11): [<ffff000008a4f57c>] _raw_spin_unlock_irq+0x2c/0x60 [ 19.083560] hardirqs last disabled at (12): [<ffff000008a48914>] __schedule+0xc4/0xa60 [ 19.091401] softirqs last enabled at (0): [<ffff0000080b55e0>] copy_process.isra.4.part.5+0x4d8/0x1c50 [ 19.100801] softirqs last disabled at (0): [<0000000000000000>] (null) [ 19.108135] ---[ end trace 38c4920787b88c75 ]--- So, fix this by enabling the fast_io option in our regmap config so that regmap uses spinlocks for locking instead of mutexes. Signed-off-by: Lyude Paul <[email protected]> Fixes: 3f68be7 ("drm/meson: Add support for HDMI encoder and DW-HDMI bridge + PHY") Cc: Daniel Vetter <[email protected]> Cc: Neil Armstrong <[email protected]> Cc: Carlo Caione <[email protected]> Cc: Kevin Hilman <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: <[email protected]> # v4.12+ Acked-by: Neil Armstrong <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sean Paul <[email protected]>

It was observed that a process blocked indefintely in __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP to be cleared via fscache_wait_for_deferred_lookup(). At this time, ->backing_objects was empty, which would normaly prevent __fscache_read_or_alloc_page() from getting to the point of waiting. This implies that ->backing_objects was cleared *after* __fscache_read_or_alloc_page was was entered. When an object is "killed" and then "dropped", FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is ->backing_objects cleared. This leaves a window where something else can set FSCACHE_COOKIE_LOOKING_UP and __fscache_read_or_alloc_page() can start waiting, before ->backing_objects is cleared There is some uncertainty in this analysis, but it seems to be fit the observations. Adding the wake in this patch will be handled correctly by __fscache_read_or_alloc_page(), as it checks if ->backing_objects is empty again, after waiting. Customer which reported the hang, also report that the hang cannot be reproduced with this fix. The backtrace for the blocked process looked like: PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh" #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1 rib#1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed rib#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8 rib#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e rib#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache] rib#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache] rib#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs] rib#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs] rib#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73 rib#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs] rib#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756 rib#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa rib#12 [ffff881ff43eff18] sys_read at ffffffff811fda62 rib#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e Signed-off-by: NeilBrown <[email protected]> Signed-off-by: David Howells <[email protected]>

Function graph tracing recurses into itself when stackleak is enabled, causing the ftrace graph selftest to run for up to 90 seconds and trigger the softlockup watchdog. Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200 200 mcount_get_lr_addr x0 // pointer to function's saved lr (gdb) bt \#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200 \rib#1 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153 \rib#2 0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106 \rib#3 0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507 \rib#4 0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286 \rib#5 ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321 \rib#6 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153 \rib#7 0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876 \rib#8 0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650 \rib#9 0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205 Rework so we mark stackleak_track_stack as notrace Co-developed-by: Arnd Bergmann <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Anders Roxell <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Kees Cook <[email protected]>

The *_frag_reasm() functions are susceptible to miscalculating the byte count of packet fragments in case the truesize of a head buffer changes. The truesize member may be changed by the call to skb_unclone(), leaving the fragment memory limit counter unbalanced even if all fragments are processed. This miscalculation goes unnoticed as long as the network namespace which holds the counter is not destroyed. Should an attempt be made to destroy a network namespace that holds an unbalanced fragment memory limit counter the cleanup of the namespace never finishes. The thread handling the cleanup gets stuck in inet_frags_exit_net() waiting for the percpu counter to reach zero. The thread is usually in running state with a stacktrace similar to: PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4" rib#5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480 rib#6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b rib#7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c rib#8 [ffff880621563db0] ops_exit_list at ffffffff814f5856 rib#9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0 rib#10 [ffff880621563e38] process_one_work at ffffffff81096f14 It is not possible to create new network namespaces, and processes that call unshare() end up being stuck in uninterruptible sleep state waiting to acquire the net_mutex. The bug was observed in the IPv6 netfilter code by Per Sundstrom. I thank him for his analysis of the problem. The parts of this patch that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. Signed-off-by: Jiri Wiesner <[email protected]> Reported-by: Per Sundstrom <[email protected]> Acked-by: Peter Oskolkov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

When option CONFIG_KASAN is enabled toghether with ftrace, function ftrace_graph_caller() gets in to a recursion, via functions kasan_check_read() and kasan_check_write(). Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179 179 mcount_get_pc x0 // function's pc (gdb) bt #0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179 rib#1 0xffffff90101406c8 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151 rib#2 0xffffff90106fd084 in kasan_check_write (p=0xffffffc06c170878, size=4) at ../mm/kasan/common.c:105 rib#3 0xffffff90104a2464 in atomic_add_return (v=<optimized out>, i=<optimized out>) at ./include/generated/atomic-instrumented.h:71 rib#4 atomic_inc_return (v=<optimized out>) at ./include/generated/atomic-fallback.h:284 rib#5 trace_graph_entry (trace=0xffffffc03f5ff380) at ../kernel/trace/trace_functions_graph.c:441 rib#6 0xffffff9010481774 in trace_graph_entry_watchdog (trace=<optimized out>) at ../kernel/trace/trace_selftest.c:741 rib#7 0xffffff90104a185c in function_graph_enter (ret=<optimized out>, func=<optimized out>, frame_pointer=18446743799894897728, retp=<optimized out>) at ../kernel/trace/trace_functions_graph.c:196 rib#8 0xffffff9010140628 in prepare_ftrace_return (self_addr=18446743592948977792, parent=0xffffffc03f5ff418, frame_pointer=18446743799894897728) at ../arch/arm64/kernel/ftrace.c:231 rib#9 0xffffff90101406f4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) Rework so that the kasan implementation isn't traced. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anders Roxell <[email protected]> Acked-by: Dmitry Vyukov <[email protected]> Tested-by: Dmitry Vyukov <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

…_map Detected via gcc's ASan: Direct leak of 2048 byte(s) in 64 object(s) allocated from: 6 #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370) 7 rib#1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43 8 rib#2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85 9 rib#3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250 10 rib#4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382 11 rib#5 0x556b0f0e3231 in print_events util/parse-events.c:2514 12 rib#6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 13 rib#7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 14 rib#8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 15 rib#9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 16 rib#10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520 17 rib#11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Detected with gcc's ASan: Direct leak of 66 byte(s) in 5 object(s) allocated from: #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) rib#1 0x560c8761034d in collect_config util/config.c:597 rib#2 0x560c8760d9cb in get_value util/config.c:169 rib#3 0x560c8760dfd7 in perf_parse_file util/config.c:285 rib#4 0x560c8760e0d2 in perf_config_from_file util/config.c:476 rib#5 0x560c876108fd in perf_config_set__init util/config.c:661 rib#6 0x560c87610c72 in perf_config_set__new util/config.c:709 rib#7 0x560c87610d2f in perf_config__init util/config.c:718 rib#8 0x560c87610e5d in perf_config util/config.c:730 rib#9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442 rib#10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Taeung Song <[email protected]> Fixes: 20105ca ("perf config: Introduce perf_config_set class") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Detected with gcc's ASan: Direct leak of 4356 byte(s) in 120 object(s) allocated from: #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) rib#1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215 rib#2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339 rib#3 0x55719af66272 in print_events util/parse-events.c:2542 rib#4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 rib#5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: 40218da ("perf list: Show SDT and pre-cached events") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

…r-free issue The evlist should be destroyed before the perf session. Detected with gcc's ASan: ================================================================= ==27350==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000002e38 at pc 0x5611da276999 bp 0x7ffce8f1d1a0 sp 0x7ffce8f1d190 WRITE of size 8 at 0x62b000002e38 thread T0 #0 0x5611da276998 in __list_del /home/work/linux/tools/include/linux/list.h:89 rib#1 0x5611da276d4a in __list_del_entry /home/work/linux/tools/include/linux/list.h:102 rib#2 0x5611da276e77 in list_del_init /home/work/linux/tools/include/linux/list.h:145 rib#3 0x5611da2781cd in thread__put util/thread.c:130 rib#4 0x5611da2cc0a8 in __thread__zput util/thread.h:68 rib#5 0x5611da2d2dcb in hist_entry__delete util/hist.c:1148 rib#6 0x5611da2cdf91 in hists__delete_entry util/hist.c:337 rib#7 0x5611da2ce19e in hists__delete_entries util/hist.c:365 rib#8 0x5611da2db2ab in hists__delete_all_entries util/hist.c:2639 rib#9 0x5611da2db325 in hists_evsel__exit util/hist.c:2651 rib#10 0x5611da1c5352 in perf_evsel__exit util/evsel.c:1304 rib#11 0x5611da1c5390 in perf_evsel__delete util/evsel.c:1309 rib#12 0x5611da1b35f0 in perf_evlist__purge util/evlist.c:124 rib#13 0x5611da1b38e2 in perf_evlist__delete util/evlist.c:148 rib#14 0x5611da069781 in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1645 rib#15 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#16 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#17 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#18 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#19 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) rib#20 0x5611d9ff35c9 in _start (/home/work/linux/tools/perf/perf+0x3e95c9) 0x62b000002e38 is located 11320 bytes inside of 27448-byte region [0x62b000000200,0x62b000006d38) freed by thread T0 here: #0 0x7fdccb04ab70 in free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedb70) rib#1 0x5611da260df4 in perf_session__delete util/session.c:201 rib#2 0x5611da063de5 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1300 rib#3 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642 rib#4 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#5 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#6 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#7 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#8 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) previously allocated by thread T0 here: #0 0x7fdccb04b138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) rib#1 0x5611da26010c in zalloc util/util.h:23 rib#2 0x5611da260824 in perf_session__new util/session.c:118 rib#3 0x5611da0633a6 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1192 rib#4 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642 rib#5 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#6 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#7 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#8 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#9 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) SUMMARY: AddressSanitizer: heap-use-after-free /home/work/linux/tools/include/linux/list.h:89 in __list_del Shadow bytes around the buggy address: 0x0c567fff8570: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c567fff85c0: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd 0x0c567fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8610: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==27350==ABORTING Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Using gcc's ASan, Changbin reports: ================================================================= ==7494==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) rib#1 0x5625e5330a5e in zalloc util/util.h:23 rib#2 0x5625e5330a9b in perf_counts__new util/counts.c:10 rib#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 rib#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 rib#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 rib#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 rib#7 0x5625e51528e6 in run_test tests/builtin-test.c:358 rib#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388 rib#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 rib#10 0x5625e515572f in cmd_test tests/builtin-test.c:722 rib#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 72 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) rib#1 0x5625e532560d in zalloc util/util.h:23 rib#2 0x5625e532566b in xyarray__new util/xyarray.c:10 rib#3 0x5625e5330aba in perf_counts__new util/counts.c:15 rib#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 rib#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 rib#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 rib#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 rib#8 0x5625e51528e6 in run_test tests/builtin-test.c:358 rib#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388 rib#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 rib#11 0x5625e515572f in cmd_test tests/builtin-test.c:722 rib#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) His patch took care of evsel->prev_raw_counts, but the above backtraces are about evsel->counts, so fix that instead. Reported-by: Changbin Du <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

…_event_on_all_cpus test ================================================================= ==7497==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) rib#1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45 rib#2 0x5625e5326703 in cpu_map__read util/cpumap.c:103 rib#3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120 rib#4 0x5625e5326915 in cpu_map__new util/cpumap.c:135 rib#5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36 rib#6 0x5625e51528e6 in run_test tests/builtin-test.c:358 rib#7 0x5625e5152baf in test_and_print tests/builtin-test.c:388 rib#8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 rib#9 0x5625e515572f in cmd_test tests/builtin-test.c:722 rib#10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

================================================================= ==7506==ERROR: LeakSanitizer: detected memory leaks Direct leak of 13 byte(s) in 3 object(s) allocated from: #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) rib#1 0x5625e53aaef0 in expr__find_other util/expr.y:221 rib#2 0x5625e51bcd3f in test__expr tests/expr.c:52 rib#3 0x5625e51528e6 in run_test tests/builtin-test.c:358 rib#4 0x5625e5152baf in test_and_print tests/builtin-test.c:388 rib#5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 rib#6 0x5625e515572f in cmd_test tests/builtin-test.c:722 rib#7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

================================================================= ==20875==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1160 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) rib#1 0x55bd50005599 in zalloc util/util.h:23 rib#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327 rib#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216 rib#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69 rib#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358 rib#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388 rib#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583 rib#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722 rib#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 19 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) rib#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f) Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

By calling maps__insert() we assume to get 2 references on the map, which we relese within maps__remove call. However if there's already same map name, we currently don't bump the reference and can crash, like: Program received signal SIGABRT, Aborted. 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 rib#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6 rib#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6 rib#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6 rib#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131 rib#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148 rib#6 map__put (map=0x1224df0) at util/map.c:299 rib#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953 rib#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959 rib#9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65 rib#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728 rib#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741 rib#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362 rib#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243 rib#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322 rib#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8, ... Add the map to the list and getting the reference event if we find the map with same name. Signed-off-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Saint-Etienne <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Fixes: 1e62856 ("perf symbols: Fix slowness due to -ffunction-section") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

… allocation After memory allocation failure vc_allocate() doesn't clean up data which has been initialized in visual_init(). In case of fbcon this leads to divide-by-0 in fbcon_init() on next open of the same tty. memory allocation in vc_allocate() may fail here: 1097: vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_KERNEL); on next open() fbcon_init() skips vc_font.data initialization: 1088: if (!p->fontdata) { division by zero in fbcon_init() happens here: 1149: new_cols /= vc->vc_font.width; Additional check is needed in fbcon_deinit() to prevent usage of uninitialized vc_screenbuf: 1251: if (vc->vc_hi_font_mask && vc->vc_screenbuf) 1252: set_vc_hi_font(vc, false); Crash: rib#6 [ffffc90001eafa60] divide_error at ffffffff81a00be4 [exception RIP: fbcon_init+463] RIP: ffffffff814b860f RSP: ffffc90001eafb18 RFLAGS: 00010246 ... rib#7 [ffffc90001eafb60] visual_init at ffffffff8154c36e rib#8 [ffffc90001eafb80] vc_allocate at ffffffff8154f53c rib#9 [ffffc90001eafbc8] con_install at ffffffff8154f624 ... Signed-off-by: Grzegorz Halat <[email protected]> Reviewed-by: Oleksandr Natalenko <[email protected]> Acked-by: Bartlomiej Zolnierkiewicz <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Make sure to return a proper negative error code from copy_process() when anon_inode_getfile() fails with CLONE_PIDFD. Otherwise _do_fork() will not detect an error and get_task_pid() will operator on a nonsensical pointer: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c R13: 00007ffc15fbb0ff R14: 00007ff07e47e9c0 R15: 0000000000000000 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [rib#1] PREEMPT SMP KASAN CPU: 1 PID: 7990 Comm: syz-executor290 Not tainted 5.2.0-rc6+ rib#9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:get_task_pid+0xe1/0x210 kernel/pid.c:372 Code: 89 ff e8 62 27 5f 00 49 8b 07 44 89 f1 4c 8d bc c8 90 01 00 00 eb 0c e8 0d fe 25 00 49 81 c7 38 05 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 18 00 74 08 4c 89 ff e8 31 27 5f 00 4d 8b 37 e8 f9 47 12 00 RSP: 0018:ffff88808a4a7d78 EFLAGS: 00010203 RAX: 00000000000000a7 RBX: dffffc0000000000 RCX: ffff888088180600 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88808a4a7d90 R08: ffffffff814fb3a8 R09: ffffed1015d66bf8 R10: ffffed1015d66bf8 R11: 1ffff11015d66bf7 R12: 0000000000041ffc R13: 1ffff11011494fbc R14: 0000000000000000 R15: 000000000000053d FS: 00007ff07e47e700(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b5100 CR3: 0000000094df2000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _do_fork+0x1b9/0x5f0 kernel/fork.c:2360 __do_sys_clone kernel/fork.c:2454 [inline] __se_sys_clone kernel/fork.c:2448 [inline] __x64_sys_clone+0xc1/0xd0 kernel/fork.c:2448 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: https://lore.kernel.org/lkml/[email protected] Reported-and-tested-by: [email protected] Fixes: 6fd2fe4 ("copy_process(): don't use ksys_close() on cleanups") Cc: Jann Horn <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Christian Brauner <[email protected]>

This fixes the following error caused by a race condition between phydev->adjust_link() and a MDIO transaction in the phy interrupt handler. The issue was reproduced with the ethernet FEC driver and a micrel KSZ9031 phy. [ 146.195696] fec 2188000.ethernet eth0: MDIO read timeout [ 146.201779] ------------[ cut here ]------------ [ 146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c [ 146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug [ 146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 rib#9 [ 146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 146.236257] unwind_backtrace from show_stack+0x10/0x14 [ 146.241640] show_stack from dump_stack_lvl+0x58/0x70 [ 146.246841] dump_stack_lvl from __warn+0xb4/0x24c [ 146.251772] __warn from warn_slowpath_fmt+0x5c/0xd4 [ 146.256873] warn_slowpath_fmt from phy_error+0x24/0x6c [ 146.262249] phy_error from kszphy_handle_interrupt+0x40/0x48 [ 146.268159] kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78 [ 146.274417] irq_thread_fn from irq_thread+0xf0/0x1dc [ 146.279605] irq_thread from kthread+0xe4/0x104 [ 146.284267] kthread from ret_from_fork+0x14/0x28 [ 146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8) [ 146.294448] 1fa0: 00000000 00000000 00000000 00000000 [ 146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 146.318262] irq event stamp: 12325 [ 146.321780] hardirqs last enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60 [ 146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60 [ 146.338259] softirqs last enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624 [ 146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178 [ 146.354447] ---[ end trace 0000000000000000 ]--- With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link() calls fec_stop()/fec_restart() and both these function reset and temporary disable the FEC disrupting any MII transaction that could be happening at the same time. fec_enet_adjust_link() and phy_read() can be running at the same time when we have one additional interrupt before the phy_state_machine() is able to terminate. Thread 1 (phylib WQ) | Thread 2 (phy interrupt) | | phy_interrupt() <-- PHY IRQ | handle_interrupt() | phy_read() | phy_trigger_machine() | --> schedule phylib WQ | | phy_state_machine() | phy_check_link_status() | phy_link_change() | phydev->adjust_link() | fec_enet_adjust_link() | --> FEC reset | phy_interrupt() <-- PHY IRQ | phy_read() | Fix this by acquiring the phydev lock in phy_interrupt(). Link: https://lore.kernel.org/all/[email protected]/ Fixes: c974bdb ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices") cc: <[email protected]> Signed-off-by: Francesco Dolcini <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be rib#1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d rib#2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd rib#3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 rib#4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 rib#5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c rib#6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 rib#7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 rib#8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] rib#9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b rib#10 [ff1966a94a713f10] kthread at ffffffff9d101b4d rib#11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a7811 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Michal Schmidt <[email protected]> Tested-by: Dave Cain <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider interaction between i915 perf an cmd parser #9

consider interaction between i915 perf an cmd parser #9

rib commented Feb 4, 2016

consider interaction between i915 perf an cmd parser #9

consider interaction between i915 perf an cmd parser #9

Comments

rib commented Feb 4, 2016