./mptcp_connect.sh -m mmap test blocks: regression #124

matttbe · 2020-12-10T15:57:30Z

With the latest export branch, I got a similar issue than #67 :

00:32:12.889 + ./mptcp_connect.sh -m mmap
00:32:13.203 [  297.723793] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
00:32:13.324 [  297.845668] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
00:32:13.449 [  297.970239] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
00:32:13.501 # INFO: set ns3-5fd1f6c4-HuMyRo dev ns3eth2: ethtool -K  gso off
00:32:13.517 # INFO: set ns4-5fd1f6c4-HuMyRo dev ns4eth3: ethtool -K  gro off
00:32:13.588 # Created /tmp/tmp.KQFpLf0eh7 (size 6059036	/tmp/tmp.KQFpLf0eh7) containing data sent by client
00:32:13.608 # Created /tmp/tmp.j4b69lb8mi (size 90140	/tmp/tmp.j4b69lb8mi) containing data sent by server
00:32:13.703 # New MPTCP socket can be blocked via sysctl		[ OK ]
00:32:13.738 # setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
00:32:13.750 # INFO: validating network environment with pings
00:32:16.736 # INFO: Using loss of 0.96% delay 38 ms reorder 91% 96% with delay 9ms on ns3eth4
00:32:16.749 # INFO: extra options:  -m mmap
00:32:17.044 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration    75ms) [ OK ]
00:32:17.288 # ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration    29ms) [ OK ]
00:32:17.543 # ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration    28ms) [ OK ]
00:32:17.837 # ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration    68ms) [ OK ]
00:32:18.070 # ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration    29ms) [ OK ]
00:32:18.310 # ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration    28ms) [ OK ]
00:32:18.565 # ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration    34ms) [ OK ]
00:32:18.817 # ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration    29ms) [ OK ]
00:47:17.818 Timeout: sending Ctrl+C
00:47:17.819 # ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	^C/usr/lib/klibc/bin/poweroff

I still need to instrument my CI to output more info in case of timeout...

I can reproduce it with my CI, no always blocking with the same test, e.g.

00:07:17.032 + ./mptcp_connect.sh -m mmap
00:07:17.342 [  231.333351] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
00:07:17.472 [  231.463572] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
00:07:17.598 [  231.589181] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
00:07:17.656 # INFO: set ns3-5fd20e18-8txmkT dev ns3eth2: ethtool -K  gro off
00:07:17.673 # INFO: set ns4-5fd20e18-8txmkT dev ns4eth3: ethtool -K  gro off
00:07:17.741 # Created /tmp/tmp.76tE338eOt (size 6571036	/tmp/tmp.76tE338eOt) containing data sent by client
00:07:17.807 # Created /tmp/tmp.zO7wzqhHqp (size 6808604	/tmp/tmp.zO7wzqhHqp) containing data sent by server
00:07:17.884 # New MPTCP socket can be blocked via sysctl		[ OK ]
00:07:17.918 # setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
00:07:17.929 # INFO: validating network environment with pings
00:07:19.950 # INFO: Using loss of 0.67% delay 12 ms reorder 95% 92% with delay 3ms on ns3eth4
00:07:19.962 # INFO: extra options:  -m mmap
00:23:26.963 Timeout: sending Ctrl+C
00:23:26.964 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	^C/usr/lib/klibc/bin/poweroff

It looks like it has been introduced by:

bbaa110: mptcp: push pending frames when subflow has free space

The text was updated successfully, but these errors were encountered:

matttbe · 2020-12-11T09:56:55Z

Can be closed thanks to a fix from Paolo (thx!):

02754ad: "squashed" in mptcp: push pending frames when subflow has free space

When SUSPEND_DISCONNECTING bit is set that means Disconnect is pending but the code was evaluating if the list is empty before calling hci_conn_del which does the actual cleanup and remove the connection from the list thus the bit is never cleared causing the suspend procedure to always timeout when there are connections to be disconnected: Suspend/Resume - Success 5 (Pairing - Legacy) - waiting done Set the system into Suspend via force_suspend = mgmt-tester: Suspend/Resume - Success 5 (Pairing -.. 17:03:13.200458 = mgmt-tester: Set the system into Suspend via force_suspend 17:03:13.205812 < HCI Command: Write Scan E.. (0x03|0x001a) plen 1 #122 [hci0] 17:03:13.213561 Scan enable: No Scans (0x00) > HCI Event: Command Complete (0x0e) plen 4 #123 [hci0] 17:03:13.214710 Write Scan Enable (0x03|0x001a) ncmd 1 Status: Success (0x00) < HCI Command: Disconnect (0x01|0x0006) plen 3 #124 [hci0] 17:03:13.215830 Handle: 42 Reason: Remote Device Terminated due to Power Off (0x15) > HCI Event: Command Status (0x0f) plen 4 #125 [hci0] 17:03:13.216602 Disconnect (0x01|0x0006) ncmd 1 Status: Success (0x00) > HCI Event: Disconnect Complete (0x05) plen 4 #126 [hci0] 17:03:13.217342 Status: Success (0x00) Handle: 42 Reason: Remote Device Terminated due to Power Off (0x15) @ MGMT Event: Device Disconn.. (0x000c) plen 8 {0x0002} [hci0] 17:03:13.217688 BR/EDR Address: 00:AA:01:01:00:00 (Intel Corporation) Reason: Connection terminated by local host for suspend (0x05) @ MGMT Event: Device Disconn.. (0x000c) plen 8 {0x0001} [hci0] 17:03:13.217688 BR/EDR Address: 00:AA:01:01:00:00 (Intel Corporation) Reason: Connection terminated by local host for suspend (0x05) Suspend/Resume - Success 5 (Pairing - Legacy) - test timed out = mgmt-tester: Suspend/Resume - Success 5 (Pairing -.. 17:03:13.939317 Suspend/Resume - Success 5 (Pairing - Legacy) - teardown = mgmt-tester: Suspend/Resume - Success 5 (Pairing -.. 17:03:13.947267 [ 13.284291] Bluetooth: hci0: Timed out waiting for suspend events [ 13.287324] Bluetooth: hci0: Suspend timeout bit: 6 Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>

@offset

[BUG] David reported an ASSERT() get triggered during fio load on 8 devices with data/raid6 and metadata/raid1c3: fio --rw=randrw --randrepeat=1 --size=3000m \ --bsrange=512b-64k --bs_unaligned \ --ioengine=libaio --fsync=1024 \ --name=job0 --name=job1 \ The ASSERT() is from rbio_add_bio() of raid56.c: ASSERT(orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN); Which is checking if the target rbio is crossing the full stripe boundary. [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622 [100.795] ------------[ cut here ]------------ [100.796] kernel BUG at fs/btrfs/raid56.c:1622! [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124 [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1) [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs] [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246 [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01 [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001 [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400 [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000 [100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 [100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0 [100.824] Call Trace: [100.825] <TASK> [100.825] ? die+0x32/0x80 [100.826] ? do_trap+0x12d/0x160 [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.829] ? do_error_trap+0x90/0x130 [100.830] ? rbio_add_bio+0x204/0x210 [btrfs] [100.831] ? handle_invalid_op+0x2c/0x30 [100.833] ? rbio_add_bio+0x204/0x210 [btrfs] [100.835] ? exc_invalid_op+0x29/0x40 [100.836] ? asm_exc_invalid_op+0x16/0x20 [100.837] ? rbio_add_bio+0x204/0x210 [btrfs] [100.837] raid56_parity_write+0x64/0x270 [btrfs] [100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs] [100.840] ? btrfs_bio_init+0x80/0x80 [btrfs] [100.841] ? release_pages+0x503/0x6d0 [100.842] ? folio_unlock+0x2f/0x60 [100.844] ? __folio_put+0x60/0x60 [100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs] [100.847] btrfs_submit_bio+0x21/0x60 [btrfs] [100.847] submit_one_bio+0x6a/0xb0 [btrfs] [100.849] extent_write_cache_pages+0x395/0x680 [btrfs] [100.850] ? __extent_writepage+0x520/0x520 [btrfs] [100.851] ? mark_usage+0x190/0x190 [100.852] extent_writepages+0xdb/0x130 [btrfs] [100.853] ? extent_write_locked_range+0x480/0x480 [btrfs] [100.854] ? mark_usage+0x190/0x190 [100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs] [100.855] ? reacquire_held_locks+0x178/0x280 [100.856] ? writeback_sb_inodes+0x245/0x7f0 [100.857] do_writepages+0x102/0x2e0 [100.858] ? page_writeback_cpu_online+0x10/0x10 [100.859] ? __lock_release.isra.0+0x14a/0x4d0 [100.860] ? reacquire_held_locks+0x280/0x280 [100.861] ? __lock_acquired+0x1e9/0x3d0 [100.862] ? do_raw_spin_lock+0x1b0/0x1b0 [100.863] __writeback_single_inode+0x94/0x450 [100.864] writeback_sb_inodes+0x372/0x7f0 [100.864] ? lock_sync+0xd0/0xd0 [100.865] ? do_raw_spin_unlock+0x93/0xf0 [100.866] ? sync_inode_metadata+0xc0/0xc0 [100.867] ? rwsem_optimistic_spin+0x340/0x340 [100.868] __writeback_inodes_wb+0x70/0x130 [100.869] wb_writeback+0x2d1/0x530 [100.869] ? __writeback_inodes_wb+0x130/0x130 [100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0 [100.870] wb_do_writeback+0x3eb/0x480 [100.871] ? wb_writeback+0x530/0x530 [100.871] ? mark_lock_irq+0xcd0/0xcd0 [100.872] wb_workfn+0xe0/0x3f0< [CAUSE] Commit a97699d ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce u64 division. Function btrfs_max_io_len() is to get the length to the stripe boundary. It calculates the full stripe start offset (inside the chunk) by the following code: *full_stripe_start = rounddown(*stripe_nr, nr_data_stripes(map)) << BTRFS_STRIPE_LEN_SHIFT; The calculation itself is fine, but the value returned by rounddown() is dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which returned int). Thus the result is also u32, then we do the left shift, which can overflow u32. If such overflow happens, @full_stripe_start will be a value way smaller than @offset, causing later "full_stripe_len - (offset - *full_stripe_start)" to underflow, thus make later length calculation to have no stripe boundary limit, resulting a write bio to exceed stripe boundary. There are some other locations like this, with a u32 @stripe_nr got left shift, which can lead to a similar overflow. [FIX] Fix all @stripe_nr with left shift with a type cast to u64 before the left shift. Those involved @stripe_nr or similar variables are recording the stripe number inside the chunk, which is small enough to be contained by u32, but their offset inside the chunk can not fit into u32. Thus for those specific left shifts, a type cast to u64 is necessary so this patch does not touch them and the code will be cleaned up in the future to keep the fix minimal. Reported-by: David Sterba <[email protected]> Fixes: a97699d ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") Tested-by: David Sterba <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs") for s390x, add support for 64-bit pointers to kfuncs for LoongArch. Since the infrastructure is already implemented in BPF core, the only thing need to be done is to override bpf_jit_supports_far_kfunc_call(). Before this change, several test_verifier tests failed: # ./test_verifier | grep # | grep FAIL #119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL #120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL #121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL #122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL #123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL #124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL #125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL #126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL #127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL #128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL #129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL #130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL #486/p map_kptr: ref: reference state created and released on xchg FAIL This is because the kfuncs in the loaded module are far away from __bpf_call_base: ffff800002009440 t bpf_kfunc_call_test_fail1 [bpf_testmod] 9000000002e128d8 T __bpf_call_base The offset relative to __bpf_call_base does NOT fit in s32, which breaks the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts this limit. Note that to reproduce the above result, tools/testing/selftests/bpf/config should be applied, and run the test with JIT enabled, unpriv BPF enabled. With this change, the test_verifier tests now all passed: # ./test_verifier ... Summary: 777 PASSED, 0 SKIPPED, 0 FAILED Tested-by: Tiezhu Yang <[email protected]> Signed-off-by: Hengqi Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

matttbe added the bug label Dec 10, 2020

matttbe mentioned this issue Dec 10, 2020

./mptcp_connect.sh -m mmap test blocks #67

Closed

matttbe closed this as completed Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

./mptcp_connect.sh -m mmap test blocks: regression #124

./mptcp_connect.sh -m mmap test blocks: regression #124

matttbe commented Dec 10, 2020

matttbe commented Dec 11, 2020

./mptcp_connect.sh -m mmap test blocks: regression #124

./mptcp_connect.sh -m mmap test blocks: regression #124

Comments

matttbe commented Dec 10, 2020

matttbe commented Dec 11, 2020