Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: KASAN: use-after-free in inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) #256

Closed
matttbe opened this issue Jan 26, 2022 · 4 comments
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Jan 26, 2022

Both my CI (Tessares) and the public CI (Cirrus) have reported this issue after the last sync with net-next:

(...)
ok 1 selftests: net/mptcp: mptcp_connect.sh
# selftests: net/mptcp: pm_netlink.sh
# defaults addr list                                 [ OK ]
# defaults limits                                    [ OK ]
# simple add/get addr                                [ OK ]
# dump addrs                                         [ OK ]
# simple del addr                                    [ OK ]
# dump addrs after del                               [ OK ]
# duplicate addr                                     [ OK ]
# id addr increment                                  [ OK ]
# hard addr limit                                    [ OK ]
# above hard addr limit                              [ OK ]
[  230.847337] ==================================================================
[  230.847852] BUG: KASAN: use-after-free in inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  230.848282] Read of size 8 at addr ffff8880066999c0 by task swapper/1/0
[  230.848781] 
[  230.848913] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.16.0-g05854a699d27 #2
[  230.849374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  230.849960] Call Trace:
[  230.850156]  <IRQ>
[  230.850326] dump_stack_lvl (lib/dump_stack.c:107) 
[  230.850590] print_address_description.constprop.0 (mm/kasan/report.c:256) 
[  230.850993] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  230.851300] kasan_report.cold (mm/kasan/report.c:443) 
[  230.851586] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  230.851864] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148) 
[  230.852138] inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  230.852406] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148) 
[  230.852673] call_timer_fn (kernel/time/timer.c:1421) 
[  230.852983] ? add_timer (kernel/time/timer.c:1398) 
[  230.853238] ? lock_downgrade (kernel/locking/lockdep.c:5647) 
[  230.853525] ? mark_held_locks (kernel/locking/lockdep.c:4194) 
[  230.853807] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438) 
[  230.854217] run_timer_softirq (kernel/time/timer.c:1467) 
[  230.854588] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148) 
[  230.854953] ? hrtimer_interrupt (kernel/time/hrtimer.c:1824) 
[  230.855371] ? call_timer_fn (kernel/time/timer.c:1744) 
[  230.855746] ? rcu_read_lock_sched_held (./include/linux/lockdep.h:283) 
[  230.856167] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120) 
[  230.856578] __do_softirq (kernel/softirq.c:558) 
[  230.856947] irq_exit_rcu (kernel/softirq.c:432) 
[  230.857279] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097 (discriminator 14)) 
[  230.857702]  </IRQ>
[  230.857906]  <TASK>
[  230.858115] asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:638) 
[  230.858580] RIP: 0010:default_idle (arch/x86/kernel/process.c:734) 
[ 230.858961] Code: e2 48 89 ef 31 f6 5d 41 5c e9 fc 09 2c ff cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 eb 07 0f 00 2d 32 c3 4e 00 fb f4 <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 41
All code
========
   0:	e2 48                	loop   0x4a
   2:	89 ef                	mov    %ebp,%edi
   4:	31 f6                	xor    %esi,%esi
   6:	5d                   	pop    %rbp
   7:	41 5c                	pop    %r12
   9:	e9 fc 09 2c ff       	jmpq   0xffffffffff2c0a0a
   e:	cc                   	int3   
   f:	cc                   	int3   
  10:	cc                   	int3   
  11:	cc                   	int3   
  12:	cc                   	int3   
  13:	cc                   	int3   
  14:	cc                   	int3   
  15:	cc                   	int3   
  16:	cc                   	int3   
  17:	cc                   	int3   
  18:	cc                   	int3   
  19:	cc                   	int3   
  1a:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  1f:	eb 07                	jmp    0x28
  21:	0f 00 2d 32 c3 4e 00 	verw   0x4ec332(%rip)        # 0x4ec35a
  28:	fb                   	sti    
  29:	f4                   	hlt    
  2a:*	c3                   	retq   		<-- trapping instruction
  2b:	66 66 2e 0f 1f 84 00 	data16 nopw %cs:0x0(%rax,%rax,1)
  32:	00 00 00 00 
  36:	0f 1f 40 00          	nopl   0x0(%rax)
  3a:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3f:	41                   	rex.B

Code starting with the faulting instruction
===========================================
   0:	c3                   	retq   
   1:	66 66 2e 0f 1f 84 00 	data16 nopw %cs:0x0(%rax,%rax,1)
   8:	00 00 00 00 
   c:	0f 1f 40 00          	nopl   0x0(%rax)
  10:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  15:	41                   	rex.B
[  230.860500] RSP: 0018:ffff8880013bfde8 EFLAGS: 00000202
[  230.861004] RAX: ffffffff94b6b160 RBX: 0000000000000001 RCX: ffffffff94b5a961
[  230.861549] RDX: 0000000000000000 RSI: ffffffff950a5e00 RDI: ffffffff95210160
[  230.861990] RBP: 0000000000000001 R08: 0000000000000001 R09: ffffed100da69163
[  230.862442] R10: ffff88806d348b13 R11: ffffed100da69162 R12: 0000000000000001
[  230.862900] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800139b600
[  230.863382] ? __cpuidle_text_start (arch/x86/kernel/process.c:732) 
[  230.863674] ? rcu_eqs_enter.constprop.0 (kernel/rcu/tree.c:633) 
[  230.864013] default_idle_call (./arch/x86/include/asm/irqflags.h:40) 
[  230.864283] do_idle (kernel/sched/idle.c:195) 
[  230.864513] ? arch_cpu_idle_exit+0x40/0x40
 
[  230.864798] cpu_startup_entry (kernel/sched/idle.c:402 (discriminator 1)) 
[  230.865057] start_secondary (arch/x86/kernel/smpboot.c:224) 
[  230.865380] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:224) 
[  230.865696] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:300) 
[  230.866042]  </TASK>
[  230.866216] 
[  230.866342] Allocated by task 351:
[  230.866571] kasan_save_stack (mm/kasan/common.c:38) 
[  230.866836] __kasan_slab_alloc (mm/kasan/common.c:46) 
[  230.867109] kmem_cache_alloc (./include/linux/kasan.h:260) 
[  230.867427] copy_net_ns (./include/linux/slab.h:705) 
[  230.867721] create_new_namespaces.isra.0 (kernel/nsproxy.c:110) 
[  230.868046] unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
[  230.868362] ksys_unshare (kernel/fork.c:3048) 
[  230.868616] __x64_sys_unshare (kernel/fork.c:3117) 
[  230.868888] do_syscall_64 (arch/x86/entry/common.c:50) 
[  230.869148] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[  230.869514] 
[  230.869643] The buggy address belongs to the object at ffff888006699540
[  230.869643]  which belongs to the cache net_namespace of size 5184
[  230.870505] The buggy address is located 1152 bytes inside of
[  230.870505]  5184-byte region [ffff888006699540, ffff88800669a980)
[  230.871273] The buggy address belongs to the page:
[  230.871620] page:000000005599ae95 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888006699540 pfn:0x6698
[  230.872320] head:000000005599ae95 order:3 compound_mapcount:0 compound_pincount:0
[  230.872819] flags: 0x100000000010200(slab|head|node=0|zone=1)
[  230.873233] raw: 0100000000010200 0000000000000000 dead000000000122 ffff88800126a3c0
[  230.873877] raw: ffff888006699540 0000000080060004 00000001ffffffff 0000000000000000
[  230.874548] page dumped because: kasan: bad access detected
[  230.875010] 
[  230.875167] Memory state around the buggy address:
[  230.875577]  ffff888006699880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.876182]  ffff888006699900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.876792] >ffff888006699980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.877424]                                            ^
[  230.877875]  ffff888006699a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.878482]  ffff888006699a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.879150] ==================================================================
[  230.879749] Disabling lock debugging due to kernel taint
# id limit                                           [ OK ]
# flush addrs                                        [ OK ]
(...)
Call Trace found (additional kconfig: '-e KASAN -e KASAN_OUTLINE -d TEST_KASAN -e PROVE_LOCKING -e DEBUG_LOCKDEP -e PREEMPT -e DEBUG_PREEMPT -e DEBUG_SLAVE -e DEBUG_PAGEALLOC -e DEBUG_MUTEXES -e DEBUG_SPINLOCK -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_OBJECTS_RCU_HEAD')

and:

+ ./mptcp_connect.sh
+ /tmp/cirrus-ci-build/tools/testing/selftests/kselftest/prefix.pl
+ tee /tmp/cirrus-ci-build/selftest_mptcp_connect.tap.tmp
[   68.405714] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[   68.966706] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[   69.974514] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth2: link becomes ready
[   69.981607] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[   71.455465] IPv6: ADDRCONF(NETDEV_CHANGE): ns4eth3: link becomes ready
[   71.463022] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
# INFO: set ns3-61f0e43e-xtAz6c dev ns3eth2: ethtool -K tso off gso off
# INFO: set ns4-61f0e43e-xtAz6c dev ns4eth3: ethtool -K tso off gso off gro off
# Created /tmp/tmp.u3JXFLfngY (size 4261916	/tmp/tmp.u3JXFLfngY) containing data sent by client
# Created /tmp/tmp.qLgo7GZBiQ (size 4814876	/tmp/tmp.qLgo7GZBiQ) containing data sent by server
# New MPTCP socket can be blocked via sysctl		[ OK ]
# INFO: validating network environment with pings
[   87.348676] netem: version 1.3
# INFO: Using loss of 0.48% delay 15 ms on ns3eth4
# ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   558ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration   537ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration   497ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration   616ms) [  104.102763] ==================================================================
[  104.108937] BUG: KASAN: use-after-free in inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  104.114462] Read of size 8 at addr ffff8880049784c0 by task mptcp_connect.s/1181
[  104.120557] 
[  104.122066] CPU: 3 PID: 1181 Comm: mptcp_connect.s Not tainted 5.16.0-g05854a699d27 #1
[  104.128275] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  104.135427] Call Trace:
[  104.137683]  <IRQ>
[  104.139559] dump_stack_lvl (lib/dump_stack.c:107) 
[  104.142796] print_address_description.constprop.0 (mm/kasan/report.c:256) 
[  104.147519] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  104.150787] kasan_report.cold (mm/kasan/report.c:443) 
[  104.154163] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  104.157499] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148) 
[  104.160795] inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46) 
[  104.164656] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148) 
[  104.168789] call_timer_fn (arch/x86/include/asm/jump_label.h:27) 
[  104.172903] ? add_timer (kernel/time/timer.c:1398) 
[  104.176637] ? lock_downgrade (kernel/locking/lockdep.c:5647) 
[  104.180362] ? mark_held_locks (kernel/locking/lockdep.c:4194) 
[  104.183691] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438) 
[  104.187923] run_timer_softirq (kernel/time/timer.c:1467) 
[  104.191324] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148) 
[  104.195426] ? hrtimer_interrupt (kernel/time/hrtimer.c:1824) 
[  104.199139] ? call_timer_fn (kernel/time/timer.c:1744) 
[  104.202229] ? pvclock_clocksource_read (arch/x86/include/asm/pvclock.h:35 (discriminator 1)) 
[  104.206643] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283) 
[  104.210792] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120) 
[  104.214732] __do_softirq (arch/x86/include/asm/jump_label.h:27) 
[  104.218188] irq_exit_rcu (kernel/softirq.c:432) 
[  104.221329] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097 (discriminator 14)) 
[  104.225478]  </IRQ>
[  104.227526]  <TASK>
[  104.229599] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:638) 
[  104.234047] RIP: 0010:call_rcu (kernel/rcu/tree.c:3107) 
[ 104.238278] Code: 85 18 01 00 00 49 39 c4 0f 8f 52 04 00 00 e8 a9 46 0d 00 9c 58 f6 c4 02 0f 85 fa 03 00 00 48 83 3c 24 00 74 01 fb 48 83 c4 20 <5b> 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 05 58 3b 80 02 48 89 44 24
All code
========
   0:	85 18                	test   %ebx,(%rax)
   2:	01 00                	add    %eax,(%rax)
   4:	00 49 39             	add    %cl,0x39(%rcx)
   7:	c4                   	(bad)  
   8:	0f 8f 52 04 00 00    	jg     0x460
   e:	e8 a9 46 0d 00       	callq  0xd46bc
  13:	9c                   	pushfq 
  14:	58                   	pop    %rax
  15:	f6 c4 02             	test   $0x2,%ah
  18:	0f 85 fa 03 00 00    	jne    0x418
  1e:	48 83 3c 24 00       	cmpq   $0x0,(%rsp)
  23:	74 01                	je     0x26
  25:	fb                   	sti    
  26:	48 83 c4 20          	add    $0x20,%rsp
  2a:*	5b                   	pop    %rbx		<-- trapping instruction
  2b:	5d                   	pop    %rbp
  2c:	41 5c                	pop    %r12
  2e:	41 5d                	pop    %r13
  30:	41 5e                	pop    %r14
  32:	41 5f                	pop    %r15
  34:	c3                   	retq   
  35:	48 8b 05 58 3b 80 02 	mov    0x2803b58(%rip),%rax        # 0x2803b94
  3c:	48                   	rex.W
  3d:	89                   	.byte 0x89
  3e:	44                   	rex.R
  3f:	24                   	.byte 0x24

Code starting with the faulting instruction
===========================================
   0:	5b                   	pop    %rbx
   1:	5d                   	pop    %rbp
   2:	41 5c                	pop    %r12
   4:	41 5d                	pop    %r13
   6:	41 5e                	pop    %r14
   8:	41 5f                	pop    %r15
   a:	c3                   	retq   
   b:	48 8b 05 58 3b 80 02 	mov    0x2803b58(%rip),%rax        # 0x2803b6a
  12:	48                   	rex.W
  13:	89                   	.byte 0x89
  14:	44                   	rex.R
  15:	24                   	.byte 0x24
[  104.256457] RSP: 0018:ffff888008a6fa18 EFLAGS: 00000286
[  104.261598] RAX: 0000000000000002 RBX: ffff888003f5dc08 RCX: dffffc0000000000
[  104.268748] RDX: 0000000000000000 RSI: ffffffff8c2ae740 RDI: ffffffff8c422920
[  104.276073] RBP: ffff88806d3c89c0 R08: 0000000000000001 R09: 0000000000000001
[  104.282558] R10: ffffffff8d7bfbd7 R11: fffffbfff1af7f7a R12: 000000000000000c
[  104.289655] R13: ffff88806d3c8aa8 R14: ffff88806d3c8ad8 R15: ffff88806d3c8a60
[  104.296939] __dentry_kill (fs/dcache.c:585) 
[  104.300853] ? dput (fs/dcache.c:872) 
[  104.304349] dput (fs/dcache.c:709) 
[  104.307580] ? simple_attr_release (fs/libfs.c:1226) 
[  104.312019] walk_component (fs/namei.c:558) 
[  104.316014] ? handle_dots.part.0 (fs/namei.c:1952) 
[  104.320456] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438) 
[  104.325683] ? set_root (include/linux/seqlock.h:105) 
[  104.329353] ? generic_permission (fs/namei.c:342) 
[  104.333858] link_path_walk.part.0 (fs/namei.c:2294) 
[  104.338496] ? walk_component (fs/namei.c:2215) 
[  104.342746] ? kmem_cache_alloc (include/linux/kasan.h:260) 
[  104.347015] ? register_lock_class (kernel/locking/lockdep.c:4885) 
[  104.351188] path_lookupat.isra.0 (fs/namei.c:2448) 
[  104.354888] filename_lookup (fs/namei.c:2478) 
[  104.358997] ? may_linkat (fs/namei.c:2472) 
[  104.362706] ? simple_attr_release (fs/libfs.c:1226) 
[  104.367260] ? strncpy_from_user (arch/x86/include/asm/word-at-a-time.h:62) 
[  104.371863] user_path_at_empty (fs/namei.c:2803) 
[  104.376030] do_faccessat (fs/open.c:423) 
[  104.379803] ? stream_open (fs/open.c:397) 
[  104.383625] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438) 
[  104.388926] ? syscall_enter_from_user_mode (arch/x86/include/asm/irqflags.h:45) 
[  104.394001] do_syscall_64 (arch/x86/entry/common.c:50) 
[  104.397826] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[  104.403021] RIP: 0033:0x7f5f44f252ab
[ 104.406802] Code: 77 05 c3 0f 1f 40 00 48 8b 15 e1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 b1 9b 0d 00 f7 d8
All code
========
   0:	77 05                	ja     0x7
   2:	c3                   	retq   
   3:	0f 1f 40 00          	nopl   0x0(%rax)
   7:	48 8b 15 e1 9b 0d 00 	mov    0xd9be1(%rip),%rdx        # 0xd9bef
   e:	f7 d8                	neg    %eax
  10:	64 89 02             	mov    %eax,%fs:(%rdx)
  13:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
  1a:	c3                   	retq   
  1b:	0f 1f 40 00          	nopl   0x0(%rax)
  1f:	f3 0f 1e fa          	endbr64 
  23:	b8 15 00 00 00       	mov    $0x15,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 05                	ja     0x37
  32:	c3                   	retq   
  33:	0f 1f 40 00          	nopl   0x0(%rax)
  37:	48 8b 15 b1 9b 0d 00 	mov    0xd9bb1(%rip),%rdx        # 0xd9bef
  3e:	f7 d8                	neg    %eax

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 05                	ja     0xd
   8:	c3                   	retq   
   9:	0f 1f 40 00          	nopl   0x0(%rax)
   d:	48 8b 15 b1 9b 0d 00 	mov    0xd9bb1(%rip),%rdx        # 0xd9bc5
  14:	f7 d8                	neg    %eax
[  104.425161] RSP: 002b:00007fff72fdcd28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
[  104.432694] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5f44f252ab
[  104.438534] RDX: 00007fff72fdcd30 RSI: 0000000000000001 RDI: 00007f5f46258830
[  104.444337] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[  104.450143] R10: 00007f5f461e8010 R11: 0000000000000246 R12: 0000000000000000
[  104.455949] R13: 00007f5f46258830 R14: 0000000000000000 R15: 0000000000000001
[  104.462157]  </TASK>
[  104.464183] 
[  104.465738] Allocated by task 187:
[  104.468802] kasan_save_stack (mm/kasan/common.c:38) 
[  104.472133] __kasan_slab_alloc (mm/kasan/common.c:46) 
[  104.475586] kmem_cache_alloc (include/linux/kasan.h:260) 
[  104.479090] copy_net_ns (include/linux/slab.h:705) 
[  104.482466] create_new_namespaces.isra.0 (kernel/nsproxy.c:110) 
[  104.486914] unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
[  104.491036] ksys_unshare (kernel/fork.c:3048) 
[  104.494232] __x64_sys_unshare (kernel/fork.c:3117) 
[  104.497533] do_syscall_64 (arch/x86/entry/common.c:50) 
==========================================
Call Trace found
(...)
Extra kconfig: -e KASAN -e KASAN_OUTLINE -d TEST_KASAN -e PROVE_LOCKING -e DEBUG_LOCKDEP -e PREEMPT -e DEBUG_PREEMPT -e DEBUG_SLAVE -e DEBUG_PAGEALLOC -e DEBUG_MUTEXES -e DEBUG_SPINLOCK -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_OBJECTS_RCU_HEAD -e NET_NS_REFCNT_TRACKER -d KFENCE

It doesn't look like it is directly linked to MPTCP.

@matttbe matttbe added the bug label Jan 26, 2022
@matttbe
Copy link
Member Author

matttbe commented Jan 26, 2022

Reproduced on top of net-next, running mptcp_connect.sh selftest with a "debug" kernel:

-e KASAN -e KASAN_OUTLINE -d TEST_KASAN -e PROVE_LOCKING -e DEBUG_LOCKDEP -e PREEMPT -e DEBUG_PREEMPT -e DEBUG_SLAVE -e DEBUG_PAGEALLOC -e DEBUG_MUTEXES -e DEBUG_SPINLOCK -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_OBJECTS_RCU_HEAD -e NET_NS_REFCNT_
TRACKER -d KFENCE
[  288.530995] ==================================================================
[  288.531772] BUG: KASAN: use-after-free in inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  288.531772] Read of size 8 at addr ffff888001ad04c0 by task swapper/0/0
[  288.531772]
[  288.531772] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-11547-gab14f1802cfb #732
[  288.531772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  288.531772] Call Trace:
[  288.531772]  <IRQ>
[  288.531772] dump_stack_lvl (lib/dump_stack.c:107)
[  288.531772] print_address_description.constprop.0 (mm/kasan/report.c:256)
[  288.531772] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  288.531772] kasan_report.cold (mm/kasan/report.c:443)
[  288.531772] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  288.531772] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148)
[  288.531772] inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  288.531772] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148)
[  288.531772] call_timer_fn (arch/x86/include/asm/jump_label.h:27)
[  288.531772] ? add_timer (kernel/time/timer.c:1398)
[  288.531772] ? lock_downgrade (kernel/locking/lockdep.c:5647)
[  288.531772] ? mark_held_locks (kernel/locking/lockdep.c:4194)
[  288.531772] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438)
[  288.531772] run_timer_softirq (kernel/time/timer.c:1467)
[  288.531772] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148)
[  288.531772] ? hrtimer_interrupt (kernel/time/hrtimer.c:1824)
[  288.531772] ? call_timer_fn (kernel/time/timer.c:1744)
[  288.531772] ? pvclock_clocksource_read (arch/x86/include/asm/atomic64_64.h:184)
[  288.531772] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283)
[  288.531772] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[  288.531772] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[  288.531772] irq_exit_rcu (kernel/softirq.c:432)
[  288.531772] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097 (discriminator 14))
[  288.531772]  </IRQ>
[  288.531772]  <TASK>
[  288.531772] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:638)
[  288.531772] RIP: 0010:default_idle (arch/x86/kernel/process.c:734)
[ 288.531772] Code: e2 48 89 ef 31 f6 5d 41 5c e9 1c a3 2d ff cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 eb 07 0f 00 2d 62 94 51 00 fb f4 <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 41
All code
========
   0:   e2 48                   loop   0x4a
   2:   89 ef                   mov    %ebp,%edi
   4:   31 f6                   xor    %esi,%esi
   6:   5d                      pop    %rbp
   7:   41 5c                   pop    %r12
   9:   e9 1c a3 2d ff          jmpq   0xffffffffff2da32a
   e:   cc                      int3
   f:   cc                      int3
  10:   cc                      int3
  11:   cc                      int3
  12:   cc                      int3
  13:   cc                      int3
  14:   cc                      int3
  15:   cc                      int3
  16:   cc                      int3
  17:   cc                      int3
  18:   cc                      int3
  19:   cc                      int3
  1a:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  1f:   eb 07                   jmp    0x28
  21:   0f 00 2d 62 94 51 00    verw   0x519462(%rip)        # 0x51948a
  28:   fb                      sti
  29:   f4                      hlt
  2a:*  c3                      retq            <-- trapping instruction
  2b:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  32:   00 00 00 00
  36:   0f 1f 40 00             nopl   0x0(%rax)
  3a:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  3f:   41                      rex.B

Code starting with the faulting instruction
===========================================
   0:   c3                      retq
   1:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
   8:   00 00 00 00
   c:   0f 1f 40 00             nopl   0x0(%rax)
  10:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  15:   41                      rex.B
[  288.531772] RSP: 0018:ffffffffa2407e40 EFLAGS: 00000202
[  288.531772] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffa1333881
[  288.531772] RDX: 0000000000000000 RSI: ffffffffa18ae700 RDI: ffffffffa1a228e0
[  288.531772] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed100da49163
[  288.531772] R10: ffff88806d248b13 R11: ffffed100da49162 R12: 0000000000000000
[  288.531772] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa2446600
[  288.531772] ? rcu_eqs_enter.constprop.0 (kernel/rcu/tree.c:633)
[  288.531772] default_idle_call (arch/x86/include/asm/irqflags.h:40)
[  288.531772] do_idle (kernel/sched/idle.c:195)
   288.531772] ? arch_cpu_idle_exit+0x40/0x40
[  288.531772] ? do_idle (kernel/sched/idle.c:262)
[  288.531772] cpu_startup_entry (kernel/sched/idle.c:402 (discriminator 1))
[  288.531772] start_kernel (init/main.c:1137)
[  288.531772] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:300)
[  288.531772]  </TASK>
[  288.531772]
[  288.531772] Allocated by task 166:
[  288.531772] kasan_save_stack (mm/kasan/common.c:38)
[  288.531772] __kasan_slab_alloc (mm/kasan/common.c:46)
[  288.531772] kmem_cache_alloc (include/linux/kasan.h:260)
[  288.531772] copy_net_ns (include/linux/slab.h:705)
[  288.531772] create_new_namespaces.isra.0 (kernel/nsproxy.c:110)
[  288.531772] unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4))
[  288.531772] ksys_unshare (kernel/fork.c:3048)
[  288.531772] __x64_sys_unshare (kernel/fork.c:3117)
[  288.531772] do_syscall_64 (arch/x86/entry/common.c:50)
[  288.531772] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
[  288.531772]
[  288.531772] The buggy address belongs to the object at ffff888001ad0000
[  288.531772]  which belongs to the cache net_namespace of size 5248
[  288.531772] The buggy address is located 1216 bytes inside of
[  288.531772]  5248-byte region [ffff888001ad0000, ffff888001ad1480)
[  288.531772] The buggy address belongs to the page:
[  288.531772] page:00000000cb3dfc90 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1ad0
[  288.531772] head:00000000cb3dfc90 order:3 compound_mapcount:0 compound_pincount:0
[  288.531772] flags: 0x100000000010200(slab|head|node=0|zone=1)
[  288.531772] raw: 0100000000010200 0000000000000000 dead000000000122 ffff88800126a280
[  288.531772] raw: 0000000000000000 0000000080050005 00000001ffffffff 0000000000000000
[  288.531772] page dumped because: kasan: bad access detected
[  288.531772]
[  288.531772] Memory state around the buggy address:
[  288.531772]  ffff888001ad0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  288.531772]  ffff888001ad0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  288.531772] >ffff888001ad0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  288.531772]                                            ^
[  288.531772]  ffff888001ad0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  288.531772]  ffff888001ad0580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  288.531772] ==================================================================
[  288.531772] Disabling lock debugging due to kernel taint

@matttbe
Copy link
Member Author

matttbe commented Jan 26, 2022

As suggested by @pabeni , here is the issue reproduced without using MPTCP:

==========================================
[   32.034891] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth2: link becomes ready
[   32.038442] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[   33.075950] IPv6: ADDRCONF(NETDEV_CHANGE): ns4eth3: link becomes ready
[   33.079567] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
# INFO: set ns3-61f16e84-valN7l dev ns3eth2: ethtool -K tso off
# INFO: set ns4-61f16e84-valN7l dev ns4eth3: ethtool -K tso off gso off gro off
# Created /tmp/tmp.S4i4lMmKCx (size 367644      /tmp/tmp.S4i4lMmKCx) containing data sent by client
# Created /tmp/tmp.58YVFaiWxK (size 215068      /tmp/tmp.58YVFaiWxK) containing data sent by server
# New MPTCP socket can be blocked via sysctl            [ OK ]
# INFO: validating network environment with pings
[   44.432828] netem: version 1.3
# INFO: Using loss of 0.29% delay 3 ms on ns3eth4
# ns1 TCP   -> ns1 (10.0.1.1:10000      ) TCP   (duration   304ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10001) TCP   (duration   320ms) [ OK ]
# ns1 TCP   -> ns2 (10.0.1.2:10002      ) TCP   (duration   289ms) [ OK ]
# ns1 TCP   -> ns2 (dead:beef:1::2:10003) TCP   (duration   304ms) [ OK ]
# ns1 TCP   -> ns2 (10.0.2.1:10004      ) TCP   (duration   321ms) [ OK ]
# ns1 TCP   -> ns2 (dead:beef:2::1:10005) TCP   (duration   325ms) [ OK ]
# ns1 TCP   -> ns3 (10.0.2.2:10006      ) TCP   (duration   315ms) [ OK ]
# ns1 TCP   -> ns3 (dead:beef:2::2:10007) TCP   (duration   350ms) [ OK ]
# ns1 TCP   -> ns3 (10.0.3.2:10008      ) TCP   (duration   332ms) [ OK ]
# ns1 TCP   -> ns3 (dead:beef:3::2:10009) TCP   (duration   324ms) [ OK ]
# ns1 TCP   -> ns4 (10.0.3.1:10010      ) TCP   (duration   298ms) [ OK ]
# ns1 TCP   -> ns4 (dead:beef:3::1:10011) TCP   (duration   319ms) [ OK ]
# ns2 TCP   -> ns1 (10.0.1.1:10012      ) TCP   (duration   306ms) [ OK ]
# ns2 TCP   -> ns1 (dead:beef:1::1:10013) TCP   (duration   312ms) [ OK ]
# ns2 TCP   -> ns3 (10.0.2.2:10014      ) TCP   (duration   305ms) [ OK ]
# ns2 TCP   -> ns3 (dead:beef:2::2:10015) TCP   (duration   334ms) [ OK ]
# ns2 TCP   -> ns3 (10.0.3.2:10016      ) TCP   (duration   339ms) [ OK ]
# ns2 TCP   -> ns3 (dead:beef:3::2:10017) TCP   (duration   324ms) [ OK ]
# ns2 TCP   -> ns4 (10.0.3.1:10018      ) TCP   (duration   348ms) [ OK ]
# ns2 TCP   -> ns4 (dead:beef:3::1:10019) TCP   (duration   307ms) [ OK ]
# ns3 TCP   -> ns1 (10.0.1.1:10020      ) TCP   (duration   326ms) [ OK ]
# ns3 TCP   -> ns1 (dead:beef:1::1:10021) TCP   (duration   336ms) [ OK ]
# ns3 TCP   -> ns2 (10.0.1.2:10022      ) TCP   (duration   346ms) [ OK ]
# ns3 TCP   -> ns2 (dead:beef:1::2:10023) TCP   (duration   318ms) [ OK ]
# ns3 TCP   -> ns2 (10.0.2.1:10024      ) TCP   (duration   309ms) [ OK ]
# ns3 TCP   -> ns2 (dead:beef:2::1:10025) TCP   (duration   351ms) [ OK ]
# ns3 TCP   -> ns4 (10.0.3.1:10026      ) TCP   (duration   328ms) [ OK ]
# ns3 TCP   -> ns4 (dead:beef:3::1:10027) TCP   (duration   316ms) [ OK ]
# ns4 TCP   -> ns1 (10.0.1.1:10028      ) TCP   (duration   324ms) [ OK ]
# ns4 TCP   -> ns1 (dead:beef:1::1:10029) TCP   (duration   325ms) [ OK ]
# ns4 TCP   -> ns2 (10.0.1.2:10030      ) TCP   (duration   360ms) [ OK ]
# ns4 TCP   -> ns2 (dead:beef:1::2:10031) TCP   (duration   366ms) [ OK ]
# ns4 TCP   -> ns2 (10.0.2.1:10032      ) TCP   (duration   333ms) [ OK ]
# ns4 TCP   -> ns2 (dead:beef:2::1:10033) TCP   (duration   325ms) [ OK ]
# ns4 TCP   -> ns3 (10.0.2.2:10034      ) TCP   (duration   298ms) [ OK ]
# ns4 TCP   -> ns3 (dead:beef:2::2:10035) TCP   (duration   317ms) [ OK ]
# ns4 TCP   -> ns3 (10.0.3.2:10036      ) TCP   (duration   331ms) [ OK ]
# ns4 TCP   -> ns3 (dead:beef:3::2:10037) TCP   (duration   295ms) [ OK ]
# INFO: with peek mode: saveWithPeek
# ns1 TCP   -> ns1 (10.0.1.1:10038      ) TCP   (duration   290ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10039) TCP   (duration   290ms) [ OK ]
# INFO: with peek mode: saveAfterPeek
# ns1 TCP   -> ns1 (10.0.1.1:10040      ) TCP   (duration   305ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10041) TCP   (duration   292ms) [ OK ]
# INFO: test tproxy ipv4
# PASS: tproxy ipv4
# INFO: test tproxy ipv6
# PASS: tproxy ipv6
# INFO: disconnect
# ns1 TCP   -> ns1 (10.0.1.1:20000      ) TCP   (duration   320ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:20001) TCP   (duration   306ms) [ OK ]
# Time: 165 seconds
++ rc=0
++ grep -q '[C]all Trace:' .virtme/results/ab14f1802cfb/debug/output.log
[  194.307343] ==================================================================
[  194.308159] BUG: KASAN: use-after-free in inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  194.308159] Read of size 8 at addr ffff888001ac2fc0 by task grep/2874
[  194.308159]
[  194.308159] CPU: 0 PID: 2874 Comm: grep Not tainted 5.16.0-11547-gab14f1802cfb-dirty #734
[  194.308159] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  194.308159] Call Trace:
[  194.308159]  <IRQ>
[  194.308159] dump_stack_lvl (lib/dump_stack.c:107)
[  194.308159] print_address_description.constprop.0 (mm/kasan/report.c:256)
[  194.308159] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  194.308159] kasan_report.cold (mm/kasan/report.c:443)
[  194.308159] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  194.308159] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148)
[  194.308159] inet_twsk_kill (net/ipv4/inet_timewait_sock.c:46)
[  194.308159] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148)
[  194.308159] call_timer_fn (arch/x86/include/asm/jump_label.h:27)
[  194.308159] ? add_timer (kernel/time/timer.c:1398)
[  194.308159] ? lock_downgrade (kernel/locking/lockdep.c:5647)
[  194.308159] ? mark_held_locks (kernel/locking/lockdep.c:4194)
[  194.308159] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438)
[  194.308159] run_timer_softirq (kernel/time/timer.c:1467)
[  194.308159] ? inet_twsk_kill (net/ipv4/inet_timewait_sock.c:148)
[  194.308159] ? hrtimer_interrupt (kernel/time/hrtimer.c:1824)
[  194.308159] ? call_timer_fn (kernel/time/timer.c:1744)
[  194.308159] ? pvclock_clocksource_read (arch/x86/include/asm/atomic64_64.h:184)
[  194.308159] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283)
[  194.308159] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[  194.308159] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[  194.308159] irq_exit_rcu (kernel/softirq.c:432)
[  194.308159] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097 (discriminator 14))
[  194.308159]  </IRQ>
[  194.308159]  <TASK>
[  194.308159] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:638)
[  194.308159] RIP: 0010:_raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:152)
[ 194.308159] Code: 74 24 10 e8 25 2b b0 fe 48 89 ef e8 3d 70 b0 fe 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> fe e9 aa fe 65 8b 05 57 21 ae 62 85 c0 74 0a 5b 5d c3 e8 cb b0
All code
========
   0:   74 24                   je     0x26
   2:   10 e8                   adc    %ch,%al
   4:   25 2b b0 fe 48          and    $0x48feb02b,%eax
   9:   89 ef                   mov    %ebp,%edi
   b:   e8 3d 70 b0 fe          callq  0xfffffffffeb0704d
  10:   81 e3 00 02 00 00       and    $0x200,%ebx
  16:   75 25                   jne    0x3d
  18:   9c                      pushfq
  19:   58                      pop    %rax
  1a:   f6 c4 02                test   $0x2,%ah
  1d:   75 2d                   jne    0x4c
  1f:   48 85 db                test   %rbx,%rbx
  22:   74 01                   je     0x25
  24:   fb                      sti
  25:   bf 01 00 00 00          mov    $0x1,%edi
  2a:*  e8 fe e9 aa fe          callq  0xfffffffffeaaea2d               <-- trapping instruction
  2f:   65 8b 05 57 21 ae 62    mov    %gs:0x62ae2157(%rip),%eax        # 0x62ae218d
  36:   85 c0                   test   %eax,%eax
  38:   74 0a                   je     0x44
  3a:   5b                      pop    %rbx
  3b:   5d                      pop    %rbp
  3c:   c3                      retq
  3d:   e8                      .byte 0xe8
  3e:   cb                      lret
  3f:   b0                      .byte 0xb0

Code starting with the faulting instruction
===========================================
   0:   e8 fe e9 aa fe          callq  0xfffffffffeaaea03
   5:   65 8b 05 57 21 ae 62    mov    %gs:0x62ae2157(%rip),%eax        # 0x62ae2163
   c:   85 c0                   test   %eax,%eax
   e:   74 0a                   je     0x1a
  10:   5b                      pop    %rbx
  11:   5d                      pop    %rbp
  12:   c3                      retq
  13:   e8                      .byte 0xe8
  14:   cb                      lret
  15:   b0                      .byte 0xb0
[  194.308159] RSP: 0018:ffff888004ccf9a0 EFLAGS: 00000206
[  194.308159] RAX: 0000000000000006 RBX: 0000000000000200 RCX: dffffc0000000000
[  194.308159] RDX: 0000000000000000 RSI: ffffffff9daae720 RDI: 0000000000000001
[  194.308159] RBP: ffff88807ffdf750 R08: 0000000000000001 R09: 0000000000000001
[  194.308159] R10: ffff88807ffdf753 R11: ffffed100fffbeea R12: ffff888004ccfc38
[  194.308159] R13: ffffea000147ed40 R14: 000000000000000e R15: ffff88807ffdf700
[  194.308159] release_pages (mm/swap.c:980)
[  194.308159] ? pagevec_move_tail_fn (mm/swap.c:903)
[  194.308159] __pagevec_release (include/linux/pagevec.h:57)
[  194.308159] truncate_inode_pages_range (mm/truncate.c:385)
[  194.308159] ? truncate_inode_partial_folio (mm/truncate.c:343)
[  194.308159] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:438)
[  194.308159] ? _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:45)
[  194.308159] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4357 (discriminator 3))
[  194.308159] v9fs_evict_inode (fs/9p/vfs_inode.c:389)
[  194.308159] evict (fs/inode.c:644)
[  194.308159] __dentry_kill (fs/dcache.c:585)
[  194.308159] ? dput (fs/dcache.c:872)
[  194.308159] dput (fs/dcache.c:709)
[  194.308159] __fput (fs/file_table.c:294)
[  194.308159] task_work_run (kernel/task_work.c:166 (discriminator 1))
[  194.308159] do_exit (kernel/exit.c:807)
[  194.308159] ? is_current_pgrp_orphaned (kernel/exit.c:734)
[  194.308159] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283)
[  194.308159] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[  194.308159] do_group_exit (kernel/exit.c:916)
[  194.308159] __x64_sys_exit_group (kernel/exit.c:946)
[  194.308159] do_syscall_64 (arch/x86/entry/common.c:50)
[  194.308159] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
[  194.308159] RIP: 0033:0x7f0c56c642c6
[ 194.308159] Code: Unable to access opcode bytes at RIP 0x7f0c56c6429c.
objdump: '/tmp/tmp.qqA7QMQwMg.o': No such file

Code starting with the faulting instruction
===========================================
[  194.308159] RSP: 002b:00007fff873608f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  194.308159] RAX: ffffffffffffffda RBX: 00007f0c56d6b610 RCX: 00007f0c56c642c6
[  194.308159] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
[  194.308159] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff80
[  194.308159] R10: 0000000000000006 R11: 0000000000000246 R12: 00007f0c56d6b610
[  194.308159] R13: 0000000000000002 R14: 00007f0c56d6efc8 R15: 0000000000000000
[  194.308159]  </TASK>
[  194.308159]
[  194.308159] Allocated by task 172:
[  194.308159] kasan_save_stack (mm/kasan/common.c:38)
[  194.308159] __kasan_slab_alloc (mm/kasan/common.c:46)
[  194.308159] kmem_cache_alloc (include/linux/kasan.h:260)
[  194.308159] copy_net_ns (include/linux/slab.h:705)
[  194.308159] create_new_namespaces.isra.0 (kernel/nsproxy.c:110)
[  194.308159] unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4))
[  194.308159] ksys_unshare (kernel/fork.c:3048)
[  194.308159] __x64_sys_unshare (kernel/fork.c:3117)
[  194.308159] do_syscall_64 (arch/x86/entry/common.c:50)
[  194.308159] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
[  194.499261]
==========================================

On top of the current net-next (ab14f18 ("net: Adjust sk_gso_max_size once when set")) running ./mptcp_connect.sh -tt with this patch:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f60f01b14fac..c4efe854484c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2985,6 +2985,11 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 	}
 
 out:
+	if (newsk->sk_kern_sock && !kern) {
+		newsk->sk_net_refcnt = 1;
+		get_net_track(sock_net(newsk), &newsk->ns_tracker, GFP_KERNEL);
+		sock_inuse_add(sock_net(newsk), 1);
+	}
 	newsk->sk_kern_sock = kern;
 	return newsk;
 }
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index cb5809b89081..152d39591682 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -609,8 +609,8 @@ run_tests_lo()
 		local_addr="0.0.0.0"
 	fi
 
-	do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP \
-		    ${connect_addr} ${local_addr} "${extra_args}"
+	#do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP \
+	#	    ${connect_addr} ${local_addr} "${extra_args}"
 	lret=$?
 	if [ $lret -ne 0 ]; then
 		ret=$lret
@@ -624,16 +624,16 @@ run_tests_lo()
 		fi
 	fi
 
-	do_transfer ${listener_ns} ${connector_ns} MPTCP TCP \
-		    ${connect_addr} ${local_addr} "${extra_args}"
+	#do_transfer ${listener_ns} ${connector_ns} MPTCP TCP \
+	#	    ${connect_addr} ${local_addr} "${extra_args}"
 	lret=$?
 	if [ $lret -ne 0 ]; then
 		ret=$lret
 		return 1
 	fi
 
-	do_transfer ${listener_ns} ${connector_ns} TCP MPTCP \
-		    ${connect_addr} ${local_addr} "${extra_args}"
+	#do_transfer ${listener_ns} ${connector_ns} TCP MPTCP \
+	#	    ${connect_addr} ${local_addr} "${extra_args}"
 	lret=$?
 	if [ $lret -ne 0 ]; then
 		ret=$lret
@@ -716,8 +716,8 @@ EOF
 
 	TEST_COUNT=10000
 	local extra_args="-o TRANSPARENT"
-	do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP \
-		    ${connect_addr} ${local_addr} "${extra_args}"
+	#do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP \
+	#	    ${connect_addr} ${local_addr} "${extra_args}"
 	lret=$?
 
 	ip netns exec "$listener_ns" nft flush ruleset

@mjmartineau
Copy link
Member

@matttbe
Copy link
Member Author

matttbe commented Jan 27, 2022

Eric posted his fix:

https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

This patch is now in the export branch and the bug seems to be gone now!
(Also tested (and reported) by @pabeni)

@matttbe matttbe closed this as completed Jan 27, 2022
jenkins-tessares pushed a commit that referenced this issue Jun 11, 2023
In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a8 ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.

This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.

Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.

[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132]  <TASK>
[1009.379135]  dump_stack_lvl+0x33/0x50
[1009.379146]  ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191]  print_address_description.constprop.0+0x27/0x300
[1009.379201]  kasan_report+0x107/0x120
[1009.379210]  ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255]  nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302]  nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350]  nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397]  ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441]  ? kasan_unpoison+0x23/0x50
[1009.379450]  nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470]  ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485]  ? __alloc_skb+0xb8/0x1e0
[1009.379493]  ? __alloc_skb+0xb8/0x1e0
[1009.379502]  ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509]  ? unwind_get_return_address+0x2a/0x40
[1009.379517]  ? write_profile+0xc0/0xc0
[1009.379524]  ? avc_lookup+0x8f/0xc0
[1009.379532]  ? __rcu_read_unlock+0x43/0x60

Fixes: 958bee1 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 11, 2023
Add a detachment test case with miniq present to assert that with and
without the miniq we get the same error.

  # ./test_progs -t tc_opts
  #244     tc_opts_after:OK
  #245     tc_opts_append:OK
  #246     tc_opts_basic:OK
  #247     tc_opts_before:OK
  #248     tc_opts_chain_classic:OK
  #249     tc_opts_delete_empty:OK
  #250     tc_opts_demixed:OK
  #251     tc_opts_detach:OK
  #252     tc_opts_detach_after:OK
  #253     tc_opts_detach_before:OK
  #254     tc_opts_dev_cleanup:OK
  #255     tc_opts_invalid:OK
  #256     tc_opts_mixed:OK
  #257     tc_opts_prepend:OK
  #258     tc_opts_replace:OK
  #259     tc_opts_revision:OK
  Summary: 16/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
matttbe pushed a commit that referenced this issue Aug 17, 2023
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").

  # ./test_progs -t tc
  [...]
  #234     tc_links_after:OK
  #235     tc_links_append:OK
  #236     tc_links_basic:OK
  #237     tc_links_before:OK
  #238     tc_links_chain_classic:OK
  #239     tc_links_chain_mixed:OK
  #240     tc_links_dev_cleanup:OK
  #241     tc_links_dev_mixed:OK
  #242     tc_links_ingress:OK
  #243     tc_links_invalid:OK
  #244     tc_links_prepend:OK
  #245     tc_links_replace:OK
  #246     tc_links_revision:OK
  #247     tc_opts_after:OK
  #248     tc_opts_append:OK
  #249     tc_opts_basic:OK
  #250     tc_opts_before:OK
  #251     tc_opts_chain_classic:OK
  #252     tc_opts_chain_mixed:OK
  #253     tc_opts_delete_empty:OK
  #254     tc_opts_demixed:OK
  #255     tc_opts_detach:OK
  #256     tc_opts_detach_after:OK
  #257     tc_opts_detach_before:OK
  #258     tc_opts_dev_cleanup:OK
  #259     tc_opts_invalid:OK
  #260     tc_opts_mixed:OK
  #261     tc_opts_prepend:OK
  #262     tc_opts_replace:OK
  #263     tc_opts_revision:OK
  [...]
  Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Oct 6, 2023
Add various tests to check maximum number of supported programs
being attached:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  ./test_progs -t tc_opts
  [    1.185325] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz
  [    1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns
  [    1.276408] clocksource: Switched to clocksource tsc
  #252     tc_opts_after:OK
  #253     tc_opts_append:OK
  #254     tc_opts_basic:OK
  #255     tc_opts_before:OK
  #256     tc_opts_chain_classic:OK
  #257     tc_opts_chain_mixed:OK
  #258     tc_opts_delete_empty:OK
  #259     tc_opts_demixed:OK
  #260     tc_opts_detach:OK
  #261     tc_opts_detach_after:OK
  #262     tc_opts_detach_before:OK
  #263     tc_opts_dev_cleanup:OK
  #264     tc_opts_invalid:OK
  #265     tc_opts_max:OK              <--- (new test)
  #266     tc_opts_mixed:OK
  #267     tc_opts_prepend:OK
  #268     tc_opts_replace:OK
  #269     tc_opts_revision:OK
  Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
jenkins-tessares pushed a commit that referenced this issue Oct 13, 2023
Add a new test case which performs double query of the bpf_mprog through
libbpf API, but also via raw bpf(2) syscall. This is testing to gather
first the count and then in a subsequent probe the full information with
the program array without clearing passed structs in between.

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  ./test_progs -t tc_opts
  [    1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz
  [    1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns
  [    1.402734] clocksource: Switched to clocksource tsc
  [    1.426639] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #252     tc_opts_after:OK
  #253     tc_opts_append:OK
  #254     tc_opts_basic:OK
  #255     tc_opts_before:OK
  #256     tc_opts_chain_classic:OK
  #257     tc_opts_chain_mixed:OK
  #258     tc_opts_delete_empty:OK
  #259     tc_opts_demixed:OK
  #260     tc_opts_detach:OK
  #261     tc_opts_detach_after:OK
  #262     tc_opts_detach_before:OK
  #263     tc_opts_dev_cleanup:OK
  #264     tc_opts_invalid:OK
  #265     tc_opts_max:OK
  #266     tc_opts_mixed:OK
  #267     tc_opts_prepend:OK
  #268     tc_opts_query:OK            <--- (new test)
  #269     tc_opts_replace:OK
  #270     tc_opts_revision:OK
  Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Oct 13, 2023
Add a new test case to query on an empty bpf_mprog and pass the revision
directly into expected_revision for attachment to assert that this does
succeed.

  ./test_progs -t tc_opts
  [    1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz
  [    1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns
  [    1.412419] clocksource: Switched to clocksource tsc
  [    1.428671] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #252     tc_opts_after:OK
  #253     tc_opts_append:OK
  #254     tc_opts_basic:OK
  #255     tc_opts_before:OK
  #256     tc_opts_chain_classic:OK
  #257     tc_opts_chain_mixed:OK
  #258     tc_opts_delete_empty:OK
  #259     tc_opts_demixed:OK
  #260     tc_opts_detach:OK
  #261     tc_opts_detach_after:OK
  #262     tc_opts_detach_before:OK
  #263     tc_opts_dev_cleanup:OK
  #264     tc_opts_invalid:OK
  #265     tc_opts_max:OK
  #266     tc_opts_mixed:OK
  #267     tc_opts_prepend:OK
  #268     tc_opts_query:OK
  #269     tc_opts_query_attach:OK     <--- (new test)
  #270     tc_opts_replace:OK
  #271     tc_opts_revision:OK
  Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
matttbe pushed a commit that referenced this issue Oct 27, 2023
Add several new test cases which assert corner cases on the mprog query
mechanism, for example, around passing in a too small or a larger array
than the current count.

  ./test_progs -t tc_opts
  #252     tc_opts_after:OK
  #253     tc_opts_append:OK
  #254     tc_opts_basic:OK
  #255     tc_opts_before:OK
  #256     tc_opts_chain_classic:OK
  #257     tc_opts_chain_mixed:OK
  #258     tc_opts_delete_empty:OK
  #259     tc_opts_demixed:OK
  #260     tc_opts_detach:OK
  #261     tc_opts_detach_after:OK
  #262     tc_opts_detach_before:OK
  #263     tc_opts_dev_cleanup:OK
  #264     tc_opts_invalid:OK
  #265     tc_opts_max:OK
  #266     tc_opts_mixed:OK
  #267     tc_opts_prepend:OK
  #268     tc_opts_query:OK
  #269     tc_opts_query_attach:OK
  #270     tc_opts_replace:OK
  #271     tc_opts_revision:OK
  Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Alan Maguire <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants