-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't create backup path #237
Comments
Hi @vandit86 , Thank you for this bug report! Do you think you could re-do this experiment with a 5.15 kernel (or ideally with our |
(For Github tracking: Possibly linked to #191 ) |
It looks there are a few problems leading to the above behavior:
The solution should be removing the 'signal' endpoint from the server, so that the client can use the 'subflow' endpoint. Note that to allow the client 'subflow' endpoints to work correctly you additionally need some routing setup. Specifically the server address you connect to must be reachable from both the client's links. Assuming the client connects to 13.0.0.2, something alike the following should work (on top of the existing routing config): [ on client n1]
[ on router n3] |
Thank you for the quick answers! I will try switch to newer kernel and test mptcp.. I'm not so experienced working with kernel, thus it might take some time before I give you feedback. |
Hi @pabeni.
Here is like Add Address looks on first path ( i.e. server send ADD_ADDR request to client, and no backup flag is set)
Sorry, I don't understand your suggestion. If I remove "signal" flag from server endpoint, how the client will know to create new subflow, ? i.e., no ADD_ADDR will be sent, thus no subflow will be created.. Am I right ??
Client connect to server 11.0.0.2 <-->13.0.0.2 (initial connection). Server sends ADD_ADDR to client, and client always create additional subflow (through 15.0.0.2 <-->14.0.0.2), even if backup flag is set by ip mptcp , on server and on client.. |
No. The client creates additional subflows are created if:
Note the above is a simplification of the actual rules used, as mptcp limits are also taken in account, etc... The relevant part is that client side 'subflow' endpoints are sufficient for additional subflows creation. The client will try to create new subflow using the endpoint address as source and the known server address as destination. Please try the configuration specified in the previous comment and let us know the result.
That happens because the ADD_ADDR option does not carry the 'backup' flag. The client connects to the address specified by the ADD_ADDR option with default (non backup) flag. Any additional client endpoint is irrelevant for such subflow creation. |
Thank you for explanation, @pabeni !
I configure routing of topology as you suggests (also removing "endpoints" from server and adding one endpoint on client with n1:
Server (n4) :
ip mptcp monitor on n4 (with iperf3):
|
That is indeed a bug, fixed with the upstream commit 0460ce2 present in Linux v5.15-rc1and newer. Please try the most up2date kernel, thanks! |
So as a conclusion, if I'm not mistaken, it looks like no modification is needed in the kernel code but probably best to improve the ip-mptcp man page or other doc related to that. |
Thank you, @matttbe @pabeni . |
Indeed subflows created in response to incoming ADD_ADDR will always be non backup. We could improve the scenario with an additional PM netlink setting, that will allow configuring the backup status for such subflows. I'm not sure about the better way to expose this to user-space. A dummy, default (0.0.0.0) endpoint that will match only incoming ADD_ADDR ?!? Also I'm wondering if that should be instead handled at the protocol (RFC) level, expanding the ADD_ADDR option definition with a backup flag. |
I don't think we need to modify the RFC: both the client and sender can add the backup in the MPJ. So when the ADD_ADDR is received, the client can add the backup flag if the endpoint it is using has this flag. Same on the server side, no? Eventually, there could be an option to create new subflow using the initial IP with the backup flag. But here too, we could create an endpoint with the same IP and set the backup flag. WDYT? |
@pabeni: I agree with @matttbe, at the protocol level both peers have the ability to set the backup bit during the MP_JOIN handshake so there's no need to change the advertisement. When sending MP_JOIN in response to an ADD_ADDR, the backup bit and choice of local endpoint seem to be closely coupled. For example, if the new subflow and existing subflows are on the same interface for the MP_JOIN sender, that sender doesn't really need to use the backup bit to influence the peer's packet scheduler. If all new subflows are established by the in-kernel PM without considering the interface it's on, it seems like we don't gain much by having a default 0.0.0.0 endpoint backup setting. It seems like the missing link is using the in-kernel's PM existing endpoint information for outgoing subflow establishment to influence the source address for outgoing MP_JOINs rather than depending entirely on route lookup (as you noted on IRC). |
Regarding my proposal to use existing endpoint information, I think it falls within the scope of the
|
ok, regarding the same topology showed above:
Please, can you show me how to do this with ip mptcp configuration tool .. Or using mptcpd PM is the only way to do this from user space?? |
If I'm not mistaken, it is currently not possible to do that with 'ip'. |
This has been done by @pabeni https://lore.kernel.org/all/247b8dfb7254d4a1fb435b5efa756cea989b62cb.1637922870.git.pabeni@redhat.com/ |
This is arm64 version of commit fec56f5 ("bpf: Introduce BPF trampoline"). A bpf trampoline converts native calling convention to bpf calling convention and is used to implement various bpf features, such as fentry, fexit, fmod_ret and struct_ops. This patch does essentially the same thing that bpf trampoline does on x86. Tested on Raspberry Pi 4B and qemu: #18 /1 bpf_tcp_ca/dctcp:OK #18 /2 bpf_tcp_ca/cubic:OK #18 /3 bpf_tcp_ca/invalid_license:OK #18 /4 bpf_tcp_ca/dctcp_fallback:OK #18 /5 bpf_tcp_ca/rel_setsockopt:OK #18 bpf_tcp_ca:OK #51 /1 dummy_st_ops/dummy_st_ops_attach:OK #51 /2 dummy_st_ops/dummy_init_ret_value:OK #51 /3 dummy_st_ops/dummy_init_ptr_arg:OK #51 /4 dummy_st_ops/dummy_multiple_args:OK #51 dummy_st_ops:OK #57 /1 fexit_bpf2bpf/target_no_callees:OK #57 /2 fexit_bpf2bpf/target_yes_callees:OK #57 /3 fexit_bpf2bpf/func_replace:OK #57 /4 fexit_bpf2bpf/func_replace_verify:OK #57 /5 fexit_bpf2bpf/func_sockmap_update:OK #57 /6 fexit_bpf2bpf/func_replace_return_code:OK #57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK #57 /8 fexit_bpf2bpf/func_replace_multi:OK #57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK #57 fexit_bpf2bpf:OK #237 xdp_bpf2bpf:OK Signed-off-by: Xu Kuohai <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: KP Singh <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
Currently there is no synchronisation between finalize_pkvm() and kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if kvm_arm_init() fails resulting in the following warning on all the CPUs and eventually a HYP panic: | kvm [1]: IPA Size Limit: 48 bits | kvm [1]: Failed to init hyp memory protection | kvm [1]: error initializing Hyp mode: -22 | | <snip> | | WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : _kvm_host_prot_finalize+0x30/0x50 | lr : __flush_smp_call_function_queue+0xd8/0x230 | | Call trace: | _kvm_host_prot_finalize+0x3c/0x50 | on_each_cpu_cond_mask+0x3c/0x6c | pkvm_drop_host_privileges+0x4c/0x78 | finalize_pkvm+0x3c/0x5c | do_one_initcall+0xcc/0x240 | do_initcall_level+0x8c/0xac | do_initcalls+0x54/0x94 | do_basic_setup+0x1c/0x28 | kernel_init_freeable+0x100/0x16c | kernel_init+0x20/0x1a0 | ret_from_fork+0x10/0x20 | Failed to finalize Hyp protection: -22 | dtb=fvp-base-revc.dtb | kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540! | kvm [95]: nVHE call trace: | kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8 | kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac | kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160 | kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 | kvm [95]: ---[ end nVHE call trace ]--- | kvm [95]: Hyp Offset: 0xfffe8db00ffa0000 | Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | Workqueue: rpciod rpc_async_schedule | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x138/0x33c | nvhe_hyp_panic_handler+0x100/0x184 | new_slab+0x23c/0x54c | ___slab_alloc+0x3e4/0x770 | kmem_cache_alloc_node+0x1f0/0x278 | __alloc_skb+0xdc/0x294 | tcp_stream_alloc_skb+0x2c/0xf0 | tcp_sendmsg_locked+0x3d0/0xda4 | tcp_sendmsg+0x38/0x5c | inet_sendmsg+0x44/0x60 | sock_sendmsg+0x1c/0x34 | xprt_sock_sendmsg+0xdc/0x274 | xs_tcp_send_request+0x1ac/0x28c | xprt_transmit+0xcc/0x300 | call_transmit+0x78/0x90 | __rpc_execute+0x114/0x3d8 | rpc_async_schedule+0x28/0x48 | process_one_work+0x1d8/0x314 | worker_thread+0x248/0x474 | kthread+0xfc/0x184 | ret_from_fork+0x10/0x20 | SMP: stopping secondary CPUs | Kernel Offset: 0x57c5cb460000 from 0xffff800080000000 | PHYS_OFFSET: 0x80000000 | CPU features: 0x00000000,1035b7a3,ccfe773f | Memory Limit: none | ---[ end Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 ]--- Fix it by checking for the successfull initialisation of kvm_arm_init() in finalize_pkvm() before proceeding any futher. Fixes: 87727ba ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege") Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Oliver Upton <[email protected]> Cc: James Morse <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Zenghui Yu <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
Add several new tcx test cases to improve test coverage. This also includes a few new tests with ingress instead of clsact qdisc, to cover the fix from commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free"). # ./test_progs -t tc [...] #234 tc_links_after:OK #235 tc_links_append:OK #236 tc_links_basic:OK #237 tc_links_before:OK #238 tc_links_chain_classic:OK #239 tc_links_chain_mixed:OK #240 tc_links_dev_cleanup:OK #241 tc_links_dev_mixed:OK #242 tc_links_ingress:OK #243 tc_links_invalid:OK #244 tc_links_prepend:OK #245 tc_links_replace:OK #246 tc_links_revision:OK #247 tc_opts_after:OK #248 tc_opts_append:OK #249 tc_opts_basic:OK #250 tc_opts_before:OK #251 tc_opts_chain_classic:OK #252 tc_opts_chain_mixed:OK #253 tc_opts_delete_empty:OK #254 tc_opts_demixed:OK #255 tc_opts_detach:OK #256 tc_opts_detach_after:OK #257 tc_opts_detach_before:OK #258 tc_opts_dev_cleanup:OK #259 tc_opts_invalid:OK #260 tc_opts_mixed:OK #261 tc_opts_prepend:OK #262 tc_opts_replace:OK #263 tc_opts_revision:OK [...] Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <[email protected]>
…prog The result is as follows: $ tools/testing/selftests/bpf/test_progs --name=task_under_cgroup #237 task_under_cgroup:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Without the previous patch, there will be RCU warnings in dmesg when CONFIG_PROVE_RCU is enabled. While with the previous patch, there will be no warnings. Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
Hello,
I try to create a backup path but with no success, i.e. mptcp always use two available paths.
Will appreciate any suggestion..
n1 :
n4 (server) :
n1:
n4 (server)
Also, the backup flag is not set on tcpdump output.
The similar issue was reported in here https://githubmemory.com/repo/multipath-tcp/mptcp_net-next/issues/191, but I have not found any solution.
The text was updated successfully, but these errors were encountered: