syzkaller: possible deadlock in `__dev_queue_xmit` #451

cpaasch · 2023-10-24T02:40:59Z

Probably similar to #447

HEAD: d8bdf563d46

syzkaller-id: e4b86bbe048203bc8163952a662f5ccc68a5ade1

Trace:

============================================
WARNING: possible recursive locking detected
6.6.0-rc5-gcd8bdf563d46 #60 Not tainted
--------------------------------------------
syz-executor.4/25943 is trying to acquire lock:
ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3807 [inline]
ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0xaf9/0x3790 net/core/dev.c:4315

but task is already holding lock:
ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3807 [inline]
ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0xaf9/0x3790 net/core/dev.c:4315

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&sch->q.lock);
  lock(&sch->q.lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by syz-executor.4/25943:
 #0: ffff88803cb30130 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1720 [inline]
 #0: ffff88803cb30130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x71/0x1510 net/mptcp/protocol.c:1786
 #1: ffff88805855d6f0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1720 [inline]
 #1: ffff88805855d6f0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0xcc/0x4a0 net/mptcp/protocol.c:1731
 #2: ffffffff8500fbc0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:303 [inline]
 #2: ffffffff8500fbc0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:749 [inline]
 #2: ffffffff8500fbc0 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5d/0x1620 net/ipv4/ip_output.c:468
 #3: ffffffff8500fbc0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:303 [inline]
 #3: ffffffff8500fbc0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:749 [inline]
 #3: ffffffff8500fbc0 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x3d9/0x1040 net/ipv4/ip_output.c:226
 #4: ffffffff8500fc20 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
 #4: ffffffff8500fc20 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:801 [inline]
 #4: ffffffff8500fc20 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x248/0x3790 net/core/dev.c:4274
 #5: ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
 #5: ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3807 [inline]
 #5: ffff8880427fd908 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0xaf9/0x3790 net/core/dev.c:4315
 #6: ffffffff8500fc20 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
 #6: ffffffff8500fc20 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:801 [inline]
 #6: ffffffff8500fc20 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x248/0x3790 net/core/dev.c:4274

stack backtrace:
CPU: 0 PID: 25943 Comm: syz-executor.4 Not tainted 6.6.0-rc5-gcd8bdf563d46 #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xdd/0x130 lib/dump_stack.c:106
 __lock_acquire+0x5b99/0x7990 kernel/locking/lockdep.c:3062
 lock_acquire+0x14d/0x3e0 kernel/locking/lockdep.c:5753
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:351 [inline]
 __dev_xmit_skb net/core/dev.c:3807 [inline]
 __dev_queue_xmit+0xaf9/0x3790 net/core/dev.c:4315
 sch_frag_xmit_hook+0x216/0x1b40 net/sched/sch_frag.c:148
 tcf_mirred_act+0xa8d/0x1290 net/sched/act_mirred.c:219
 tcf_action_exec+0x356/0x930 include/net/tc_wrapper.h:58
 basic_classify+0x1ad/0x2c0 include/net/pkt_cls.h:344
 tcf_classify+0x787/0x1130 include/net/tc_wrapper.h:185
 fq_codel_enqueue+0x160/0x1400 net/sched/sch_fq_codel.c:94
 dev_qdisc_enqueue+0x4d/0x210 net/core/dev.c:3743
 __dev_queue_xmit+0xcf8/0x3790 net/core/dev.c:3832
 NF_HOOK+0x33a/0x3d0 include/linux/netfilter.h:304
 arp_solicit+0xad8/0xcb0 net/ipv4/arp.c:392
 neigh_probe net/core/neighbour.c:1066 [inline]
 __neigh_event_send+0xe26/0x1420 net/core/neighbour.c:1233
 neigh_resolve_output+0x1b4/0x720 include/net/neighbour.h:466
 ip_finish_output2+0xc06/0x1040 include/net/neighbour.h:542
 __ip_queue_xmit+0xea2/0x1620 net/ipv4/ip_output.c:533
 __tcp_transmit_skb+0x2042/0x31c0 net/ipv4/tcp_output.c:1408
 tcp_connect+0x30d0/0x4ed0 net/ipv4/tcp_output.c:1426
 tcp_v4_connect+0x1032/0x19f0 net/ipv4/tcp_ipv4.c:323
 mptcp_connect+0x3fc/0xa90 net/mptcp/protocol.c:3706
 __inet_stream_connect+0x1ee/0xcd0 net/ipv4/af_inet.c:675
 tcp_sendmsg_fastopen+0x39d/0x5d0 net/ipv4/tcp.c:1023
 mptcp_sendmsg_fastopen+0x124/0x4a0 net/mptcp/protocol.c:1734
 mptcp_sendmsg+0x12f5/0x1510 net/mptcp/protocol.c:1792
 __sock_sendmsg+0x15e/0x230 net/socket.c:730
 ____sys_sendmsg+0x49f/0x710 net/socket.c:2558
 ___sys_sendmsg+0x1c4/0x230 net/socket.c:2612
 __sys_sendmmsg+0x1f0/0x420 net/socket.c:2698
 __do_sys_sendmmsg net/socket.c:2727 [inline]
 __se_sys_sendmmsg net/socket.c:2724 [inline]
 __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2724
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f90c6d336a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f90c6060cd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000006bbf80 RCX: 00007f90c6d336a9
RDX: 0000000000000001 RSI: 0000000020000300 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000020000001 R11: 0000000000000246 R12: 00000000006bbf8c
R13: fffffffffffffea8 R14: 00000000006bbf80 R15: 000000000001fe40
 </TASK>
netlink: 16 bytes leftover after parsing attributes in process `syz-executor.6'.
netlink: 8 bytes leftover after parsing attributes in process `syz-executor.5'.

No reproducer.

This is running with KASAN enabled.

Kconfig:
Kconfig_k5_lockdep.txt

The text was updated successfully, but these errors were encountered:

pabeni · 2023-10-24T06:26:50Z

This does not look mptcp related.

It's more like a TC loop due to mirred usage. @dcaratti: I thought such issues have been addressed already with commit ca22da2, but it looks like the latter only address ingress, this is for egress -> egress.

dcaratti · 2023-10-24T08:15:40Z

This does not look mptcp related.

It's more like a TC loop due to mirred usage. @dcaratti: I thought such issues have been addressed already with commit ca22da2, but it looks like the latter only address ingress, this is for egress -> egress.

@pabeni @matttbe this is a long-standing issue with qdiscs: the qdisc lock has no annotation. So, if you setup a qdisc on a device (eth0) and then you do mirred egress to another device (eth1), that also has a qdisc , you will notice lockdep complaining _ because enqueue on eth1 takes eth1 qdisc root lock while the qdisc lock for eth0 is already taken. I wrote a shellscript with topology to reproduce this _ will share it here

dcaratti · 2023-10-24T08:38:22Z

#!/bin/bash -x

do_cleanup() {
	local d 

	ip netns exec uno pkill netserver
	for d in vetha_0 vetha_1 vethb_0 vethb_1 br_a br_b; do
		ip link del dev  $d
	done 2>/dev/null
	for d in uno due; do
		ip netns del $d
	done 2>/dev/null
	:
	
}

do_setup() {
	local n j

	ip netns add uno
	ip netns add due
	ip link add name vetha_0 type veth peer name eth0 netns uno
	ip link add name vetha_1 type veth peer name eth1 netns uno
	ip link add name vethb_0 type veth peer name eth0 netns due
	ip link add name vethb_1 type veth peer name eth1 netns due
	for n in uno due; do
		ip -n $n link set dev eth0 up
		ip -n $n link set dev eth1 up
	done
	for n in a b ; do
		ip link add name br_$n type $DRV
		for j in 0 1; do
			ip link set dev veth${n}_${j} master br_${n}
		done
		ip link set dev br_$n up
# br_b has the noop qdisc in the original setup by xiumei
# setting prio on br_b just changes the 2nd false contender
# of qdisc root lock
		tc qdisc add dev br_$n root handle $n: prio
	done

	for n in vetha_0 vetha_1 vethb_0 vethb_1; do
		ip link set dev $n up
		tc qdisc add dev $n root tbf rate 100000000 limit 10000 burst 20000 
	done

	tc qdisc add dev br_a ingress

	tc filter add dev br_a parent ffff: protocol all matchall action mirred egress mirror dev br_b
	tc filter add dev br_a parent a: protocol all matchall action mirred egress mirror dev br_b
	ip a a dev br_b 198.51.100.2/24
	ip a a dev br_a 192.0.2.2/24
	ip -n uno a a dev eth0 192.0.2.1/24
}


do_test() {
	ip netns exec uno netserver
	netperf -H 192.0.2.1 -l10
	ip netns exec uno pkill netserver
}

case ${2:-bridge} in
bridge | bond) DRV=$2 ;;
*) printf "use bond or bridge" ; exit 1 ;;
esac

case ${1:-noopt} in
cleanup|setup|test) arg=$1; shift; do_${arg} $@ ;;
*) printf "usage: $0 <cleanup|setup|test> <bond|bridge>\n" ;;
esac

cpaasch · 2023-10-27T03:02:05Z

Wouldn't it be possible to have a struct lock_class_key per qdisc instance ? That way lockdep should be silenced, no?

pabeni · 2023-10-27T06:59:04Z

Wouldn't it be possible to have a struct lock_class_key per qdisc instance ? That way lockdep should be silenced, no?

That was basically Davide's proposal on IRC. But it will not be enough: this splat also happen when act_mirred re-inject the packet on the same egress device

dcaratti · 2023-10-27T07:04:05Z

Wouldn't it be possible to have a struct lock_class_key per qdisc instance ? That way lockdep should be silenced, no?

That was basically Davide's proposal on IRC. But it will not be enough: this splat also happen when act_mirred re-inject the packet on the same egress device

hi Paolo,

maybe we do want to keep the lockdep splat in this case - I think that in this case mirred would trigger a real deadlock. Just thinking loud :)

dcaratti · 2023-10-27T10:10:29Z

Wouldn't it be possible to have a struct lock_class_key per qdisc instance ? That way lockdep should be silenced, no?

POC code:

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f232512505f89..995c1102085af 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -126,6 +126,7 @@ struct Qdisc {
 
        struct rcu_head         rcu;
        netdevice_tracker       dev_tracker;
+       struct lock_class_key   root_lock_key;
        /* private data */
        long privdata[] ____cacheline_aligned;
 };
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 5d7e23f4cc0ee..135deef309c36 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -942,7 +942,9 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
        __skb_queue_head_init(&sch->gso_skb);
        __skb_queue_head_init(&sch->skb_bad_txq);
        gnet_stats_basic_sync_init(&sch->bstats);
+       lockdep_register_key(&sch->root_lock_key);
        spin_lock_init(&sch->q.lock);
+       lockdep_set_class(&sch->q.lock, &sch->root_lock_key);
 
        if (ops->static_flags & TCQ_F_CPUSTATS) {
                sch->cpu_bstats =
@@ -1061,7 +1063,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc)
 
        if (ops->destroy)
                ops->destroy(qdisc);
-
+       lockdep_unregister_key(&qdisc->root_lock_key);
        module_put(ops->owner);
        netdev_put(qdisc_dev(qdisc), &qdisc->dev_tracker);

dcaratti · 2023-10-27T10:38:16Z

Wouldn't it be possible to have a struct lock_class_key per qdisc instance ? That way lockdep should be silenced, no?

That was basically Davide's proposal on IRC. But it will not be enough: this splat also happen when act_mirred re-inject the packet on the same egress device

hi Paolo,

maybe we do want to keep the lockdep splat in this case - I think that in this case mirred would trigger a real deadlock. Just thinking loud :)

indeed, if I do mirred egress redirect to to the same device, lockdep detects a real lockup:

# ip netns add prova                      
# ip link add name vetha type veth peer name eth0 netns prova                                             
# ip link set dev vetha up                                                                                
# ip -n prova link set dev eth0 up
# ip a a dev vetha 192.0.2.2/24
# ip -n prova a a dev eth0 192.0.2.2/24
# tc qdisc add dev vetha root htb default a
# tc qdisc show dev vetha 
qdisc htb 8009: root refcnt 5 r2q 10 default 0xa direct_packets_stat 1 direct_qlen 1000
# tc filter add dev vetha parent 8009: matchall classid a action mirred egress redirect dev vetha
# ping 192.0.2.1 
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
Message from syslogd@dhcp-158-243 at Oct 27 06:31:51 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ping:1315]

meanwhile, dmesg says:

[ 1471.538116] ============================================
[ 1471.538639] WARNING: possible recursive locking detected
[ 1471.539143] 6.5.0+ #575 Not tainted
[ 1471.539479] --------------------------------------------
[ 1471.539984] ping/1315 is trying to acquire lock:
[ 1471.540428] ffff888170cfc110 (&sch->root_lock_key#2){+.-.}-{2:2}, at: __dev_q
ueue_xmit+0x15fc/0x3140
[ 1471.541350]
               but task is already holding lock:
[ 1471.541895] ffff888170cfc110 (&sch->root_lock_key#2){+.-.}-{2:2}, at: __dev_q
ueue_xmit+0x15fc/0x3140
[ 1471.542749]
               other info that might help us debug this:
[ 1471.543356]  Possible unsafe locking scenario:

[ 1471.543911]        CPU0
[ 1471.544159]        ----
[ 1471.544402]   lock(&sch->root_lock_key#2);
[ 1471.544791]   lock(&sch->root_lock_key#2);
[ 1471.545186]
                *** DEADLOCK ***

:[ 1756.347214] watchdog: BUG: soft lockup - CPU#1 stuck for 268s! [ping:1315]
[ 1756.352781] Modules linked in: sch_htb act_mirred cls_matchall sch_ingress sch_prio bridge stp llc sch_tbf veth rfkill intel_rapl_msr iTCO_wdt iTCO_vendor_support intel_rapl_common pcspkr joydev i2c_i801 virtio_balloon i2c_smbus lpc_ich ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel sha512_ssse3 virtio_net serio_raw libata net_failover virtio_blk failover virtio_console sunrpc dm_mirror dm_region_hash dm_log dm_mod
[ 1756.356360] irq event stamp: 15462rsive locking detected
[ 1756.356659] hardirqs last  enabled at (15462): [<ffffffffab9995f4>] _raw_spin_unlock_irqrestore+0x34/0x50
[ 1756.357525] hardirqs last disabled at (15461): [<ffffffffab99934e>] _raw_spin_lock_irqsave+0x6e/0x90
[ 1756.358299] softirqs last  enabled at (15444): [<ffffffffab15fe80>] ___neigh_create+0xc80/0x22b0
[ 1756.359072] softirqs last disabled at (15446): [<ffffffffab15cf7f>] __neigh_event_send+0x2f/0x14f040
[ 1756.359861] CPU: 1 PID: 1315 Comm: ping Tainted: G             L     6.5.0+ #575
[ 1756.360491] Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
[ 1756.361218] RIP: 0010:__pv_queued_spin_lock_slowpath+0x65f/0xfd0, at: __dev_queue_xmit+0x15fc/0x3140
[ 1756.361763] Code: f5 41 c6 46 01 01 bb 00 80 00 00 48 c1 e9 03 41 83 e5 07 bd 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 24 01 eb 0b f3 90 <83> eb 01 0f 84 e2 05 00 00 41 0f b6 04 24 44 38 e8 7f 08 84 c0 0f
[ 1756.363329] RSP: 0018:ffff888174f2eb08 EFLAGS: 00000206
[ 1756.363788] RAX: 0000000000000003 RBX: 0000000000001a08 RCX: 1ffff1102e19f81f
[ 1756.364392] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa95464d5
[ 1756.365002] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[ 1756.365609] R10: fffffbfff5a8224c R11: ffffffffad411267 R12: ffffed102e19f81f
[ 1756.366219] R13: 0000000000000000 R14: ffff888170cfc0f8 R15: ffff888136c05d80
[ 1756.366826] FS:  00007f39c96b1480(0000) GS:ffff888136a00000(0000) knlGS:0000000000000000
[ 1756.367502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1756.368000] CR2: 00007ffedadde000 CR3: 000000017559a000 CR4: 0000000000350ee0
[ 1756.368603] Call Trace:
[ 1756.368827]  <IRQ>
[ 1756.369011]  ? watchdog_timer_fn+0x2b0/0x360
[ 1756.369403]  ? __hrtimer_run_queues+0x540/0xa60
[ 1756.369809]  ? __pfx___hrtimer_run_queues+0x10/0x10
[ 1756.370229]  ? ktime_get_update_offsets_now+0xd3/0x2b0
[ 1756.370671]  ? hrtimer_interrupt+0x2cb/0x780
[ 1756.371048]  ? __sysvec_apic_timer_interrupt+0x145/0x4b0
[ 1756.371510]  ? sysvec_apic_timer_interrupt+0x6a/0x90
[ 1756.371952]  </IRQ>
[ 1756.372140]  <TASK>
[ 1756.372328]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 1756.372794]  ? kvm_wait+0xb5/0x100
[ 1756.373101]  ? __pv_queued_spin_lock_slowpath+0x65f/0xfd0
[ 1756.373565]  ? __pv_queued_spin_lock_slowpath+0xcee/0xfd0
[ 1756.374031]  ? __pfx___pv_queued_spin_lock_slowpath+0x10/0x10
[ 1756.374520]  ? lock_acquire+0x1f7/0x540
[ 1756.374867]  do_raw_spin_lock+0x1e5/0x2a0
[ 1756.375217]  ? __pfx_do_raw_spin_lock+0x10/0x10
[ 1756.375606]  ? __pfx_do_raw_spin_trylock+0x10/0x10
[ 1756.376021]  ? rcu_read_lock_bh_held+0x63/0xc0
[ 1756.376415]  _raw_spin_lock+0x65/0x80
[ 1756.376735]  __dev_queue_xmit+0x15fc/0x3140
[ 1756.377107]  ? kernel_text_address+0xcf/0xe0
[ 1756.377486]  ? __kernel_text_address+0x12/0x40
[ 1756.377874]  ? unwind_get_return_address+0x67/0xb0
[ 1756.378289]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 1756.378744]  ? arch_stack_walk+0xa5/0x100
[ 1756.379098]  ? __pfx___dev_queue_xmit+0x10/0x10
[ 1756.379493]  ? __pfx_stack_trace_save+0x10/0x10
[ 1756.379890]  ? kasan_set_track+0x25/0x30
[ 1756.380249]  ? slab_post_alloc_hook+0xb4/0x440
[ 1756.380641]  ? trace_kmem_cache_alloc+0x2b/0xb0
[ 1756.381041]  ? kmem_cache_alloc+0x174/0x270
[ 1756.381405]  ? skb_clone+0x107/0x320
[ 1756.381719]  ? __copy_skb_header+0xb3/0x420
[ 1756.382082]  ? __skb_clone+0x583/0x780
[ 1756.382409]  tcf_mirred_act+0x82e/0x1260 [act_mirred]
[ 1756.382857]  ? trc_check_slow_task+0x230/0x260
[ 1756.383241]  ? rcu_read_lock_bh_held+0x63/0xc0
[ 1756.383630]  tcf_action_exec+0x161/0x450
[ 1756.383986]  tcf_classify+0x3f8/0xf60
[ 1756.384318]  ? rcu_read_lock_any_held.part.15+0x7/0x20
[ 1756.384756]  htb_enqueue+0x66d/0x12f0 [sch_htb]
[ 1756.385158]  ? lock_acquire+0x1ca/0x540
[ 1756.385491]  ? find_held_lock+0x3a/0x1d0
[ 1756.385849]  ? __pfx_htb_enqueue+0x10/0x10 [sch_htb]
[ 1756.386283]  dev_qdisc_enqueue+0x46/0x220
[ 1756.386633]  __dev_queue_xmit+0x16b8/0x3140
[ 1756.386994]  ? trace_kmem_cache_alloc+0x2b/0xb0
[ 1756.387387]  ? kmem_cache_alloc_node+0x1ab/0x280
[ 1756.387787]  ? __pfx___dev_queue_xmit+0x10/0x10
[ 1756.388179]  ? __build_skb_around+0x23d/0x330
[ 1756.388560]  ? __pfx___alloc_skb+0x10/0x10
[ 1756.388916]  ? __pfx_lock_release+0x10/0x10
[ 1756.389281]  ? eth_header+0xd6/0x190
[ 1756.389601]  ? arp_create+0x6bd/0x8f0
[ 1756.389936]  ? local_clock_noinstr+0xf/0xb0
[ 1756.390303]  arp_xmit+0xd5/0x310
[ 1756.390592]  ? __pfx_arp_xmit+0x10/0x10
[ 1756.390931]  arp_solicit+0x441/0x1090
[ 1756.391253]  ? slab_post_alloc_hook+0xb4/0x440
[ 1756.391640]  ? __pfx_arp_solicit+0x10/0x10
[ 1756.392001]  ? trace_kmem_cache_alloc+0x2b/0xb0
[ 1756.392443]  ? __copy_skb_header+0x34c/0x420
[ 1756.392822]  neigh_probe+0xaf/0xf0
[ 1756.393120]  __neigh_event_send+0xb52/0x14f0
[ 1756.393490]  ? trace_hardirqs_off+0x191/0x1b0
[ 1756.393882]  ? __local_bh_enable_ip+0xa9/0x110
[ 1756.394273]  ? ___neigh_create+0xc80/0x22b0
[ 1756.394634]  neigh_resolve_output+0x48e/0x780
[ 1756.395019]  ip_finish_output2+0x6ea/0x1e90
[ 1756.395390]  ? local_clock_noinstr+0xf/0xb0
[ 1756.395750]  ? __pfx_ip_finish_output2+0x10/0x10
[ 1756.396161]  ? kvm_sched_clock_read+0x11/0x20
[ 1756.396539]  ? lock_release+0x659/0xcd0
[ 1756.396876]  __ip_finish_output+0x895/0x1380
[ 1756.397247]  ? __pfx___ip_finish_output+0x10/0x10
[ 1756.397649]  ? nf_hook_slow+0xaa/0x180
[ 1756.397987]  ip_output+0x1d7/0x510
[ 1756.398293]  ? __pfx_ip_output+0x10/0x10
[ 1756.398648]  ? __ip_make_skb+0xeb7/0x24e0
[ 1756.398998]  ? __pfx_ip_finish_output+0x10/0x10

Xiumei and Cristoph reported the following lockdep splat, it complains of the qdisc root being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ torvalds#598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ torvalds#598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep class to silence false warnings. CC: Xiumei Mu <[email protected]> Reported-by: Cristoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]>

Xiumei and Cristoph reported the following lockdep splat, it complains of the qdisc root being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ torvalds#598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ torvalds#598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep class to silence false warnings. CC: Xiumei Mu <[email protected]> Reported-by: Cristoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: NipaLocal <nipa@local>

matttbe · 2024-02-21T17:04:08Z

The suggested fix has been rejected on the ML. Not sure how to fix that with the ideas that have been shared.

→ at the end, not related to MPTCP, an issue with TC, we can close the ticket.

Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ torvalds#598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ torvalds#598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: NipaLocal <nipa@local>

[ Upstream commit af0cb3f ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit af0cb3f ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ torvalds#598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ torvalds#598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit af0cb3f ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit af0cb3f ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 deepin-community#2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 deepin-community#3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 deepin-community#4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit af0cb3fa3f9ed258d14abab0152e28a0f9593084 ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: August <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2075154 [ Upstream commit af0cb3f ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Manuel Diewald <[email protected]> Signed-off-by: Stefan Bader <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2075154 [ Upstream commit af0cb3f ] Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <[email protected]> CC: Xiumei Mu <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Manuel Diewald <[email protected]> Signed-off-by: Stefan Bader <[email protected]> Signed-off-by: Roxana Nicolescu <[email protected]>

cpaasch added bug syzkaller labels Oct 24, 2023

matttbe assigned dcaratti Oct 31, 2023

matttbe closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syzkaller: possible deadlock in `__dev_queue_xmit` #451

syzkaller: possible deadlock in `__dev_queue_xmit` #451

cpaasch commented Oct 24, 2023

pabeni commented Oct 24, 2023

dcaratti commented Oct 24, 2023 •

edited

Loading

dcaratti commented Oct 24, 2023

cpaasch commented Oct 27, 2023

pabeni commented Oct 27, 2023

dcaratti commented Oct 27, 2023 •

edited

Loading

dcaratti commented Oct 27, 2023 •

edited by matttbe

Loading

dcaratti commented Oct 27, 2023 •

edited

Loading

matttbe commented Feb 21, 2024

syzkaller: possible deadlock in __dev_queue_xmit #451

syzkaller: possible deadlock in __dev_queue_xmit #451

Comments

cpaasch commented Oct 24, 2023

pabeni commented Oct 24, 2023

dcaratti commented Oct 24, 2023 • edited Loading

dcaratti commented Oct 24, 2023

cpaasch commented Oct 27, 2023

pabeni commented Oct 27, 2023

dcaratti commented Oct 27, 2023 • edited Loading

dcaratti commented Oct 27, 2023 • edited by matttbe Loading

dcaratti commented Oct 27, 2023 • edited Loading

matttbe commented Feb 21, 2024

syzkaller: possible deadlock in `__dev_queue_xmit` #451

syzkaller: possible deadlock in `__dev_queue_xmit` #451

dcaratti commented Oct 24, 2023 •

edited

Loading

dcaratti commented Oct 27, 2023 •

edited

Loading

dcaratti commented Oct 27, 2023 •

edited by matttbe

Loading

dcaratti commented Oct 27, 2023 •

edited

Loading