Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Run kubelet outside container #2584

Merged
merged 10 commits into from
Apr 5, 2018
Merged

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Apr 3, 2018

What this PR does / why we need it: In current (v0.14.6) acs-engine implementation, kubelet starts in a Docker container.
There are a lot of regressions of containerized kubelet reported in Kubernetes v1.10 (kubernetes/kubernetes#61456) . Since there are no full e2e tests for this, it may also happen in future releases (thanks @feiskyer for reporting). This PR moves kubelet to run outside the container, which is expected to be more stable.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

@ghost ghost added the in progress label Apr 3, 2018
ExecStartPre=/bin/mv /tmp/kubectldir/hyperkube /opt/kubectl
ExecStart=/bin/chmod a+x /opt/kubectl
ExecStart=/bin/chmod a+x /opt/kubelet && /bin/chmod a+x /opt/kubectl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: adapt kubelet start for coreos filepath

ExecStartPre=/bin/mv /tmp/kubectldir/hyperkube /usr/local/bin/kubectl
ExecStart=/bin/chmod a+x /usr/local/bin/kubectl
ExecStart=/bin/chmod a+x /usr/local/bin/kubelet && /bin/chmod a+x /usr/local/bin/kubectl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shorten this to /bin/chmod a+x /usr/local/bin/kubelet /usr/local/bin/kubectl (the build of chmod on ubuntu hosts accepts a list of file targets to apply the mode changes against). Verified:

azureuser@k8s-master-18440544-0:/tmp$ ls -la test*
-rw-rw-r-- 1 azureuser azureuser 0 Apr  3 23:08 test1
-rw-rw-r-- 1 azureuser azureuser 0 Apr  3 23:08 test2
azureuser@k8s-master-18440544-0:/tmp$ /bin/chmod a+x test1 test2
azureuser@k8s-master-18440544-0:/tmp$ ls -la test*
-rwxrwxr-x 1 azureuser azureuser 0 Apr  3 23:08 test1
-rwxrwxr-x 1 azureuser azureuser 0 Apr  3 23:08 test2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch, thanks

ExecStartPre=/bin/mv /tmp/kubectldir/hyperkube /opt/kubectl
ExecStart=/bin/chmod a+x /opt/kubectl
ExecStart=/bin/chmod a+x /opt/kubelet && /bin/chmod a+x /opt/kubectl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below. Didn't verify on a coreos host, but I'm confident this is a standard pattern.

ExecStartPre=/bin/mv /tmp/kubectldir/hyperkube /opt/kubectl
ExecStart=/bin/chmod a+x /opt/kubectl
ExecStart=/bin/chmod a+x /opt/kubelet && /bin/chmod a+x /opt/kubectl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above about chmod command

ExecStartPre=/bin/mv /tmp/kubectldir/hyperkube /usr/local/bin/kubectl
ExecStart=/bin/chmod a+x /usr/local/bin/kubectl
ExecStart=/bin/chmod a+x /usr/local/bin/kubelet && /bin/chmod a+x /usr/local/bin/kubectl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the systemd extract definition, I think we want a ConditionPathExists=!/usr/local/bin/kubelet as well. (and the CoreOS flavor also)

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the directory name kubectldir (under /tmp on the host OS and /opt inside the container) to something more general, like hyperkubedir or something.

{{else}}
ExecStartPre=/bin/cp /tmp/kubectldir/hyperkube /usr/local/bin/kubelet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in one line to make it a little clearer that we are copying the same file to two different places. One example:

for dest in /usr/local/bin/kubelet /usr/local/bin/kubectl ; do cp /tmp/kubectldir/hyperkube $dest; done

And then just add a cleanup line after, like:

rm -Rf /tmp/kubectldir

(but replace kubectldir with the new, general name in both instances :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really find that

ExecStartPre=for dest in /usr/local/bin/kubelet /usr/local/bin/kubectl ; do /bin/cp /tmp/hyperkube/hyperkube $dest; done
ExecStartPre=rm -Rf /tmp/kubectldir

is much more readable than

ExecStartPre=/bin/cp /tmp/hyperkubedir/hyperkube /usr/local/bin/kubelet
ExecStartPre=/bin/mv /tmp/hyperkubedir/hyperkube /usr/local/bin/kubectl

How about:

ExecStartPre=/bin/mv /tmp/hyperkubedir/hyperkube /usr/local/bin/kubelet
ExecStartPre=/bin/cp /usr/local/bin/kubelet /usr/local/bin/kubectl

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I see where you're coming from in terms of logic, I just find that it makes the syntax more complex to have a for loop for only two files. Please feel free to disagree with me :)

@CecileRobertMichon
Copy link
Contributor Author

@jackfrancis re: In the systemd extract definition, I think we want a ConditionPathExists=!/usr/local/bin/kubelet as well. (and the CoreOS flavor also) : I thought about it when I was writing the code but I didn't add it because we want the condition to be: run if either file is missing. A ConditionPathExists will prevent the extract service from running if any of the two files is present (&&, not ||) since both conditions will need to be satisfied. But what happens if the first file gets copied but the second one doesn't (unlikely, but software is unpredictable)? we still want to run the extraction again (read retry) in that case, so I left the condition as !/usr/local/bin/kubectl since /usr/local/bin/kubectl gets created last. Thoughts?

ExecStartPre=/usr/bin/docker pull {{WrapAsVariable "kubernetesHyperkubeSpec"}}
ExecStartPre=/usr/bin/docker run --rm -v /tmp/kubectldir:/opt/kubectldir {{WrapAsVariable "kubernetesHyperkubeSpec"}} /bin/bash -c "cp /hyperkube /opt/kubectldir/"
ExecStartPre=/usr/bin/docker run --rm -v /tmp/hyperkubedir:/opt/hyperkubedir {{WrapAsVariable "kubernetesHyperkubeSpec"}} /bin/bash -c "cp /hyperkube /opt/hyperkubedir/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the extraction, dependencies should also be installed. e.g. on ubuntu, kubelet depends on iptables (>= 1.4.21), kubernetes-cni (= 0.6.0), iproute2, socat, util-linux, mount, ebtables, ethtool, init-system-helpers (>= 1.18~)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @feiskyer. Just checked and all of these are already being installed except for socat. I will add a step to install them if they are missing so we are safe. I can't find a package kubernetes-cni however, do you have more info on that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cni plugin, I think we are already installing it at

retrycmd_get_tarball 60 1 $CONTAINERNETWORKING_CNI_TGZ_TMP ${CNI_PLUGINS_URL}

@@ -1,45 +0,0 @@
[Unit]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the 1.5 version since k8s 1.5 was deprecated in #2394 and it's not in use anymore


. /opt/azure/containers/provision_source.sh
apt_get_update
retrycmd_if_failure 20 10 120 apt-get install -y iptables iproute2 socat util-linux mount ebtables ethtool init-system-helpers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure dependencies are installed on agents

@@ -177,6 +179,9 @@ MASTER_ARTIFACTS_CONFIG_PLACEHOLDER
content: |
#!/bin/bash
set -e
. /opt/azure/containers/provision_source.sh
apt_get_update
retrycmd_if_failure 20 10 120 apt-get install -y iptables iproute2 socat util-linux mount ebtables ethtool init-system-helpers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure dependencies are installed on masters


source /opt/azure/containers/provision_source.sh
apt_get_update
retrycmd_if_failure 20 10 120 apt-get install -y iptables iproute2 socat util-linux mount ebtables ethtool init-system-helpers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident that all of these packages will reliably install in under 2 mins? (speaking to the 120 second timeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes most of them should already be there so it's very quick. In my tests it took under 10 seconds. I can increase the timeout if we want to be sure.

retrycmd_get_tarball() { retries=$1; wait=$2; tarball=$3; url=$4; for i in $(seq 1 $retries); do tar -tzf $tarball; [ $? -eq 0 ] && break || retrycmd_if_failure_no_stats $retries 1 10 curl -fsSL $url -o $tarball; sleep $wait; done; }
ensure_etcd_ready() { for i in $(seq 1 1800); do if [ -e /opt/azure/containers/certs.ready ]; then break; fi; sleep 1; done }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2596 :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I just moved it out because I moved out apt_get_update so I thought might as well but we should generalize it in another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased it to 5 minutes... Should be plenty enough

jackfrancis
jackfrancis previously approved these changes Apr 4, 2018
Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if we feel good about the 2 minute timeout for apt-get install for kubelet deps

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jackfrancis jackfrancis merged commit ed0944f into Azure:master Apr 5, 2018
@ghost ghost removed the in progress label Apr 5, 2018
@CecileRobertMichon CecileRobertMichon mentioned this pull request Apr 6, 2018
3 tasks
@CecileRobertMichon CecileRobertMichon deleted the kubelet-out branch June 6, 2018 19:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants