Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicast don't work in AArch64 when using version 5.X or older kernel because of an eBPF problem. #33408

Closed
2 of 3 tasks
yushoyamaguchi opened this issue Jun 26, 2024 · 9 comments
Labels
area/multicast kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Comments

@yushoyamaguchi
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When running cilium in this environment, cilium-agent cannot get out from CrashLoop and emit below error log.

Cilium Version

1.16.0-rc

Kernel Version

Linux cecinode91 5.15.0-1055-raspi

Kubernetes Version

1.27.1 (Maybe)

Regression

No response

Sysdump

No response

Relevant log output

Verifier error: program tail_mcast_ep_delivery: load program: invalid argument: tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls (242 line(s) omitted)
Verifier log: load program: invalid argument:

Anything else?

I find the error message from the kernel in Linux source code,
https://elixir.bootlin.com/linux/v5.19.17/source/kernel/bpf/verifier.c#L6309

The cause of this error seems that allow_tail_call_in_subprogs() returns false.
error message : tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls
v5.19.17 : https://elixir.bootlin.com/linux/v5.19.17/source/kernel/bpf/verifier.c
v6.0-rc1 : https://elixir.bootlin.com/linux/v5.19.17/source/kernel/bpf/verifier.c

Then, I find that older version than 5.19.17 kernel always return false when using AArch64.
(The cause was which bpf_jit_supports_subprog_tailcalls() returns true or false. )
v5.19.17 : https://elixir.bootlin.com/linux/v5.19.17/C/ident/bpf_jit_supports_subprog_tailcalls , https://elixir.bootlin.com/linux/v5.19.17/source/kernel/bpf/core.c#L2720

v6.0-rc1 : https://elixir.bootlin.com/linux/v6.0-rc1/C/ident/bpf_jit_supports_subprog_tailcalls , https://elixir.bootlin.com/linux/v6.0-rc1/source/arch/arm64/net/bpf_jit_comp.c#L1637

Therefore, It seems that when using AArch64, 6.0 or newer kernel is required to run cilium multicast in kubernetes node.

This restriction is not written in any document in cilium.
Therefore I want to write this restriction on the multicast document.

Can I create the Pull Request?

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@yushoyamaguchi yushoyamaguchi added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Jun 26, 2024
@yushoyamaguchi
Copy link
Contributor Author

The commit is this.
torvalds/linux@d4609a5

@harsimran-pabla
Copy link
Contributor

Thanks, @yushoyamaguchi, for opening this issue. There are few things to check

The minimum kernel version required for multicast is 5.15 ( This is probably for x86 ). Looks like for arm64 it is 6.0+ kernel version. You can add these in the documentation.

Aside from the documentation, we may need to add a feature probe helper in the cilium/ebpf library and use it when initializing multicast. On failure to probe ( subprog -> tailcall support ), multicast cell init should fail, and an appropriate message can be logged.

@youngnick youngnick added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/multicast and removed needs/triage This issue requires triaging to establish severity and next steps. labels Jun 28, 2024
@fujitatomoya
Copy link
Contributor

Adding documentation will definitely be one of the enhancement we can take for user, i think we should update https://docs.cilium.io/en/stable/operations/system_requirements/#required-kernel-versions-for-advanced-features once version dependencies are clear. Actually this brings me an another question that all these kernel versions are meant to be for x86? that case, probably we could add note for that as well.

cilium/ebpf library and use it when initializing multicast. On failure to probe ( subprog -> tailcall support ), multicast cell init should fail, and an appropriate message can be logged.

@harsimran-pabla

if i am not mistaken. this means once we update the ConfigMap to enable multicast, some ciliumnodes would fail to initialize the cilium-agent with multicast capability, cilium-agent pods are going to in crash loop but providing error information from multicast cell can let the user know that multicast feature cannot be enabled with this ciliumnode. is my understanding correct?

so the expectation here from cilium is that user needs to configure the all ciliumnodes can be enabled with multicast if user wants to use the multicast feature? i am not sure about this graceful behavior, is this common behavior for cilium features?
another behavior would be fallback behavior that cilium-agent can be started with warning message but the feature cannot be used on that ciliumnodes.

besides if the fallback behavior is the case, this also needs to be considered with cilium/cilium-cli#2620. in case of some ciliumnodes cannot mange the multicast, it should be shown for user that as well.

thanks,

@yushoyamaguchi
Copy link
Contributor Author

yushoyamaguchi commented Jun 29, 2024

@harsimran-pabla @fujitatomoya
Thank you for your comments.

I found the detail cause of this bug.

When using eBPF with version 5 and older versions of the AArch64 kernel, it is prohibited to use tail calls within sub-program. To avoid this, in Cilium's eBPF programs, most of functions are declared as inline functions.

However, the function __mcast_ep_delivery() is not declared as an inline function.
Ref :

static long __mcast_ep_delivery(__maybe_unused void *sub_map,

The reason is that the call to __mcast_ep_delivery() is not a regular function call but is invoked as a callback for for_each_map_elem().
Since callback functions cannot be inlined, this approach is used.
Ref :

for_each_map_elem(sub_map, __mcast_ep_delivery, &cb_ctx, 0);

This is the only instance where for_each_map_elem() is used, excluding test code.

As @fujitatomoya said, I want to discuss about how to handle this bug.

cc @YutaroHayakawa I'm sorry for adding in cc,. Thank you so much for telling me the cause.

@yushoyamaguchi
Copy link
Contributor Author

yushoyamaguchi commented Jul 3, 2024

Should we allow this and check kernel compatibility on the agent side by adding functionality in cilium/ebpf ?
Otherwise, should it be fixed as an eBPF program issue?

yushoyamaguchi added a commit to yushoyamaguchi/cilium that referenced this issue Jul 3, 2024
The kernel version that can enable multicast is different between AMD64 and AArch64 due to the difference in the timing of when tail-call from eBPF sub-programs is enabled.
I wrote about it in the document.

ref : The commit which allow for tailcalls in BPF subprogram in each architecture are below
AMD64:
torvalds/linux@e411901c0b775
This commit is reflected to version 5.10 or newer kernel

AArch64:
torvalds/linux@d4609a5
This commit is reflected to 6.0 or newer kernel

This PR is a little part of the solution of cilium#33408

Signed-off-by: Yusho Yamaguchi <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Jul 3, 2024
The kernel version that can enable multicast is different between AMD64 and AArch64 due to the difference in the timing of when tail-call from eBPF sub-programs is enabled.
I wrote about it in the document.

ref : The commit which allow for tailcalls in BPF subprogram in each architecture are below
AMD64:
torvalds/linux@e411901c0b775
This commit is reflected to version 5.10 or newer kernel

AArch64:
torvalds/linux@d4609a5
This commit is reflected to 6.0 or newer kernel

This PR is a little part of the solution of #33408

Signed-off-by: Yusho Yamaguchi <[email protected]>
@YutaroHayakawa
Copy link
Member

I guess the solution here is

  1. Do feature detection in user space (whether mixed tailcall & subprog is supported or not)
  2. Define macro based on that, HAS_MIXED_TAILCALL_SUBPROG for example
  3. Use __always_inline for __mcast_ep_delivery when HAS_MIXED_TAILCALL_SUBPROG
  4. Use either bounded loop or unrolling-based loop instead of for_each_map_elem when HAS_MIXED_TAILCALL_SUBPROG

CC @ldelossa

@yushoyamaguchi
Copy link
Contributor Author

yushoyamaguchi commented Jul 4, 2024

Thank you for a great suggestion.

Do you have a plan that some community members work on this implementation?
I also can work on it after implementing connectivity test for multicast, but it will take a long time and some support.

One more important thing :
Because I think fixing this as eBPF problem will take a long time or may be given up, I added descriptions of kernel version limitation to multicast Documentation and sent PR. Then it was merged.
#33567
However, if you can fix this as you mentioned quickly, I probably shouldn't have added these descriptions.

I am very sorry for increasing procedures.
When you fix this eBPF problem, please remove these descriptions from multicast Documentation.
(I will do the best to notice and send PR to delete this.)

Thank you.

jibi pushed a commit that referenced this issue Jul 8, 2024
[ oss commit e64311d ]

The kernel version that can enable multicast is different between AMD64 and AArch64 due to the difference in the timing of when tail-call from eBPF sub-programs is enabled.
I wrote about it in the document.

ref : The commit which allow for tailcalls in BPF subprogram in each architecture are below
AMD64:
torvalds/linux@e411901c0b775
This commit is reflected to version 5.10 or newer kernel

AArch64:
torvalds/linux@d4609a5
This commit is reflected to 6.0 or newer kernel

This PR is a little part of the solution of #33408

Signed-off-by: Yusho Yamaguchi <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
jibi pushed a commit that referenced this issue Jul 9, 2024
[ upstream commit e64311d ]

The kernel version that can enable multicast is different between AMD64 and AArch64 due to the difference in the timing of when tail-call from eBPF sub-programs is enabled.
I wrote about it in the document.

ref : The commit which allow for tailcalls in BPF subprogram in each architecture are below
AMD64:
torvalds/linux@e411901c0b775
This commit is reflected to version 5.10 or newer kernel

AArch64:
torvalds/linux@d4609a5
This commit is reflected to 6.0 or newer kernel

This PR is a little part of the solution of #33408

Signed-off-by: Yusho Yamaguchi <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Jul 10, 2024
[ upstream commit e64311d ]

The kernel version that can enable multicast is different between AMD64 and AArch64 due to the difference in the timing of when tail-call from eBPF sub-programs is enabled.
I wrote about it in the document.

ref : The commit which allow for tailcalls in BPF subprogram in each architecture are below
AMD64:
torvalds/linux@e411901c0b775
This commit is reflected to version 5.10 or newer kernel

AArch64:
torvalds/linux@d4609a5
This commit is reflected to 6.0 or newer kernel

This PR is a little part of the solution of #33408

Signed-off-by: Yusho Yamaguchi <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
Copy link

github-actions bot commented Sep 4, 2024

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 4, 2024
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multicast kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

No branches or pull requests

5 participants