Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cleanup: Fix cleanup of generic XDP programs #25117

Merged
merged 1 commit into from
Apr 26, 2023

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Apr 25, 2023

The cilium cleanup command wasn't actually removing XDP programs attached to the generic hook point. This was causing issues when an XDP test in ginkgo was immediately followed by the upgrade test. The upgrade test would run cilium cleanup and remove everything except for the XDP programs. That would isolate the Cilium-managed nodes from the outside network (both SSH and intra-cluster connectivity).

Obviously, this works when we change the Cilium configuration to disable XDP. There we have some logic to remove any leftover XDP programs on agent startup. Comparing that logic with the cilium cleanup logic, we can see that the difference is that the agent removes XDP programs explicitly for each hook point (generic and driver). Let's do the same.

This fix was tested manually on a setup with Cilium's generic XDP programs installed.

Fixes: #19735.
Fixes: #24687.

The 'cilium cleanup' command wasn't actually removing XDP programs
attached to the generic hook point. This was causing issues when an XDP
test in ginkgo was immediatly followed by the upgrade test. The upgrade
test would run 'cilium cleanup' and remove everything except for the
XDP programs. That would isolate the Cilium-managed nodes from the
outside network (both SSH and intra-cluster connectivity).

Obviously, this works when we change the Cilium configuration to disable
XDP. There we have some logic to remove any leftover XDP programs on
agent startup. Comparing that logic with the 'cilium cleanup' logic, we
can see that the difference is that the agent removes XDP programs
explicitely for each hook point (generic and driver). Let's do the same.

This fix was tested manually on a setup with Cilium's generic XDP
programs installed.

Fixes: 6ed1fe5 ("cilium: Remove attached bpf_xdp upon "cilium cleanup"")
Signed-off-by: Paul Chaignon <[email protected]>
@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. sig/loader Impacts the loading of BPF programs into the kernel. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.10 This issue affects v1.10 branch needs-backport/1.13 affects/v1.13 This issue affects v1.13 branch labels Apr 25, 2023
@pchaigno pchaigno requested a review from a team as a code owner April 25, 2023 15:59
@pchaigno pchaigno requested a review from squeed April 25, 2023 15:59
@pchaigno
Copy link
Member Author

pchaigno commented Apr 25, 2023

/test

Job 'Cilium-PR-K8s-1.27-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN and endpoint routes

Failure Output

FAIL: Error deleting resource /home/jenkins/workspace/Cilium-PR-K8s-1.27-kernel-net-next/src/github.com/cilium/cilium/test/k8s/manifests/host-policies.yaml: Cannot retrieve "cilium-fjdmp"'s policy revision: cannot get policy revision: ""

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.27-kernel-net-next/132/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.27-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@squeed
Copy link
Contributor

squeed commented Apr 26, 2023

For my own edification, the code in the xdp loader that cleans these up is here.

@pchaigno
Copy link
Member Author

k8s-1.27-kernel-net-next failed with known flakes #24699 and #24697. Other tests are passing and Casey reviewed. Merging to unblock the fix for k8s-upstream.

@pchaigno pchaigno merged commit 0691784 into cilium:main Apr 26, 2023
@pchaigno pchaigno deleted the fix-cilium-cleanup-generic-xdp branch April 26, 2023 11:37
@sayboras sayboras mentioned this pull request Apr 28, 2023
3 tasks
@julianwiedmann julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 labels May 9, 2023
@pchaigno pchaigno removed affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch labels May 11, 2023
@pchaigno pchaigno removed the affects/v1.10 This issue affects v1.10 branch label May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

CI: Suite-k8s-1.26.K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master
4 participants