Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unmanaged pods in GKE's containerd flavors (cilium/cilium#18486) #710

Merged
merged 2 commits into from
Feb 16, 2022

Conversation

bmcustodio
Copy link
Contributor

Porting cilium/cilium#18486 to the CLI.

@bmcustodio bmcustodio requested review from a team as code owners February 10, 2022 09:11
@bmcustodio bmcustodio requested a review from ldelossa February 10, 2022 09:11
@bmcustodio bmcustodio temporarily deployed to ci February 10, 2022 09:11 Inactive
@bmcustodio bmcustodio temporarily deployed to ci February 10, 2022 10:07 Inactive
@bmcustodio bmcustodio temporarily deployed to ci February 10, 2022 12:02 Inactive
Copy link
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcustodio

After the CNI is installed, we expect any taints to be removed from nodes correct? I do not see any updates where we update the removal of the taint to now remove "NoExecute" taint.

Am I missing something?

@bmcustodio
Copy link
Contributor Author

I do not see any updates where we update the removal of the taint to now remove "NoExecute" taint.

@ldelossa I am not sure if I got the question right, but Cilium does remove the taint regardless of its effect.

@bmcustodio
Copy link
Contributor Author

It's also probably pointing out that this is just a port of a piece of code that has been reviewed upstream.

@ldelossa
Copy link
Contributor

@bmcustodio gotchya, I was assuming the taint being removed was hard coded, and we needed to update it.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the second commit msg referencing by saying?

[ upstream commit b049574 ]

@bmcustodio
Copy link
Contributor Author

@christarazi whenever I (back-)port something, I usually always include a reference to the original commit. In this case, it's just a reference to the original c/c commit, but it has no other meaning or isn't used by any tooling that I am aware of.

@christarazi
Copy link
Member

@bmcustodio That makes sense to me, but it's not clear where that commit SHA is coming from, so I'd suggest either linking it as a GH link or mentioning it's a cilium/cilium commit.

bmcustodio and others added 2 commits February 16, 2022 08:14
The changes that we have been doing to /etc/defaults/kubelet are reset
on node reboots, as is apparently the whole /etc directory --- which
also means that /etc/cni/net.d/05-cilium.conf is removed.

This would not be a problem if the assumption we made that the node taint we
recommend placing on the nodes would come back upon reboots held true, but in
practice it doesn't.

Besides this, it seems that containerd will re-instante its CNI
configuration file, and it will do so way before Cilium has had the
chance to re-run on the node and re-create its CNI configuration,
causing pods to be assigned IPs by the default CNI rather than by Cilium
in the meantime.

This commit attempts at preventing that from happening by observing that
/home/kubernetes/bin/kubelet (i.e. the actual kubelet binary) is kept between
reboots and executed concurrently with containerd by systemd. We leverage on
this empirical observation to replace this file kubelet with a wrapper script
that, under the required conditions, disables containerd, patches its
configuration, removes undesired CNI configuration files, re-enables
containerd and becomes the kubelet.

[ cilium/cilium commit 36585e4 ]

Signed-off-by: Bruno Miguel Custódio <[email protected]>
Co-authored-by: Alexandre Perrin <[email protected]>
Co-authored-by: Chris Tarazi <[email protected]>
To prevent situations in which the GKE node is forcibly stopped and
re-created from causing unmanaged pods, and building on the observation
that the node comes back with the same name and pods are already
scheduled there, we change the recommended taint effect from NoSchedule
to NoExecute, to cause any previously scheduled pods to be evicted,
preventing them from getting IPs assigned by the default CNI. This
should not impact other environments due to the nature of 'NoExecute',
so we recommend it everywhere.

[ cilium/cilium commit b049574 ]

Signed-off-by: Bruno Miguel Custódio <[email protected]>
Co-authored-by: Tam Mach <[email protected]>
@bmcustodio bmcustodio temporarily deployed to ci February 16, 2022 08:15 Inactive
@bmcustodio
Copy link
Contributor Author

@christarazi sure, makes sense! I've rebased the PR and replaced the "upstream" mentions with "cilium/cilium" ones.

@tklauser tklauser merged commit e703548 into master Feb 16, 2022
@tklauser tklauser deleted the pr/bruno/port-18486 branch February 16, 2022 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants