Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'node-init' in GKE's 'cos' images. #19017

Merged
merged 1 commit into from
Mar 4, 2022

Conversation

bmcustodio
Copy link
Contributor

@bmcustodio bmcustodio commented Mar 3, 2022

Turns out that in GKE's cos images the containerd binary is still present even though /etc/containerd/config.toml is not. Hence, the kubelet wrapper would still be installed for these images according to the current check, even though it's not necessary. What's worse, starting the kubelet would fail because the sed command targeting the aforementioned file would fail.

This PR changes the check to rely on the presence of the --container-runtime-endpoint flag in the kubelet, which is probably a more reliable way of detecting *_containerd flavours and only applying the fix in these cases.

Fixes #19015.

Fix 'node-init' in GKE's 'cos' images.

@bmcustodio bmcustodio requested a review from a team March 3, 2022 10:54
@bmcustodio bmcustodio requested a review from a team as a code owner March 3, 2022 10:54
@bmcustodio bmcustodio requested a review from nathanjsweet March 3, 2022 10:54
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 3, 2022
@bmcustodio bmcustodio requested review from kaworu and aanm March 3, 2022 10:54
@bmcustodio
Copy link
Contributor Author

bmcustodio commented Mar 3, 2022

node-init log from a GKE cluster running a cos image:

(...)
Device "cbr0" does not exist.
Changing kubelet configuration to --network-plugin=cni --cni-bin-dir=/home/kubernetes/bin
Restarting the kubelet...
Node initialization complete
!!! startup-script succeeded!

node-init log from a GKE cluster running a cos_containerd image:

(...)
Device "cbr0" does not exist.
GKE *_containerd flavor detected...
Installing the kubelet wrapper...
(...)
# Become the real kubelet, and pass it some additionally required flags (and
# place these last so they have precedence).
exec /home/kubernetes/bin/the-kubelet "${@}" --network-plugin=cni --cni-bin-dir=/home/kubernetes/bin
Restarting the kubelet...
Node initialization complete
!!! startup-script succeeded!

@bmcustodio bmcustodio added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 3, 2022
@bmcustodio
Copy link
Contributor Author

bmcustodio commented Mar 3, 2022

/test

Job 'Cilium-PR-K8s-1.21-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig AutoDirectNodeRoutes Check connectivity with sockops and direct routing

Failure Output

FAIL: Error creating resource /home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-5.4/src/github.com/cilium/cilium/test/k8s/manifests/l3-policy-demo.yaml: Cannot retrieve cilium pod cilium-cljrk policy revision: cannot get revision from json output 'Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-5.4 so I can create one.

@bmcustodio bmcustodio force-pushed the pr/bruno/fix-gke-cos-node-init branch from 3e2f9f9 to a664df3 Compare March 3, 2022 14:15
@maintainer-s-little-helper
Copy link

Commit 0bcb736535ff497260806efc36f11578bab2a3b3 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 3, 2022
Turns out that in GKE's 'cos' images the 'containerd' binary is still
present even though '/etc/containerd/config.toml' is not. Hence, the
kubelet wrapper would still be installed for these images according to
the current check, even though it's not necessary. What's worse,
starting the kubelet would fail because the 'sed' command targeting the
aforementioned file would fail.

This PR changes the check to rely on the presence of the
'--container-runtime-endpoint' flag in the kubelet, which is probably a
more reliable way of detecting '*_containerd' flavours and only applying
the fix in these cases.

Fixes cilium#19015.

Signed-off-by: Bruno M. Custódio <[email protected]>
Co-authored-by: Alexandre Perrin <[email protected]>
@bmcustodio bmcustodio force-pushed the pr/bruno/fix-gke-cos-node-init branch from 0bcb736 to b5f3479 Compare March 3, 2022 17:15
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 3, 2022
@bmcustodio
Copy link
Contributor Author

/test

@aanm aanm merged commit ea9fd6f into cilium:master Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to deploy Cilium 1.11.2 in GKE with COS
7 participants