Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

wait longer for cloud-init docker install #2641

Merged
merged 1 commit into from
Apr 9, 2018

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Apr 9, 2018

What this PR does / why we need it: We've seen intermittent long install times for docker dependencies, permit more time to ensure cluster provision succeeds.

In the process of doing the above, I also rationalized file checks.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

wait longer for cloud-init docker install

@ghost ghost assigned jackfrancis Apr 9, 2018
@ghost ghost added the in progress label Apr 9, 2018
@@ -181,8 +181,6 @@ runcmd:
# the first arg is the number of retries, the second arg is the wait duration between two retries and the rest of the args are the cmd to run
- set -x
- . /opt/azure/containers/provision_source.sh
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplifying cloud-init by removing these pre-network checks (which don't guarantee network availability during provisioning anyways)

@@ -200,11 +198,9 @@ runcmd:
- "echo \"Package: docker-engine\nPin: version {{WrapAsVariable "dockerEngineVersion"}}\nPin-Priority: 550\n\" > /etc/apt/preferences.d/docker.pref"
- apt_get_update
- retrycmd_if_failure 20 10 120 apt-get install -y ebtables docker-engine
- touch /opt/azure/containers/dockerinstall.complete
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not mark docker install as complete until all config has been applied (see line below)

- echo "ExecStartPost=/sbin/iptables -P FORWARD ACCEPT" >> /etc/systemd/system/docker.service.d/exec_start.conf
- systemctl daemon-reload
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let cse determine whether or not to restart docker systemd

- echo `date`,`hostname`, postdockerinstall>>/opt/m
- retrycmd_if_failure 100 1 10 systemctl daemon-reload && systemctl restart docker
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above

@@ -23,25 +23,13 @@ fi
ensureRunCommandCompleted()
{
echo "waiting for runcmd to finish"
for i in {1..900}; do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the new wait_for_file generic

}

ensureDockerInstallCompleted()
{
echo "waiting for docker install to finish"
for i in {1..900}; do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -170,21 +158,7 @@ function ensureFilepath() {
if $REBOOTREQUIRED; then
return
fi
found=1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -362,13 +336,7 @@ function ensureK8s() {
k8sHealthy=1
nodesActive=1
nodesReady=1
for i in {1..600}; do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -372,9 +372,7 @@ runcmd:
# the first arg is the number of retries, the second arg is the wait duration between two retries and the rest of the args are the cmd to run
- set -x
- . /opt/azure/containers/provision_source.sh
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "check for network availability before proceeding" steps because they aren't meaningful

- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
- retrycmd_if_failure 120 1 5 nc -zw1 aptdocker.azureedge.net 443
- ensure_etcd_ready
- wait_for_file 1800 1 /opt/azure/containers/certs.ready
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace ensure_etcd_ready with new wait_for_file generic

@@ -394,10 +392,8 @@ runcmd:
- "echo \"Package: docker-engine\nPin: version {{WrapAsVariable "dockerEngineVersion"}}\nPin-Priority: 550\n\" > /etc/apt/preferences.d/docker.pref"
- apt_get_update
- retrycmd_if_failure 20 10 120 apt-get install -y ebtables docker-engine
- touch /opt/azure/containers/dockerinstall.complete
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move docker install completion mark until all config has been applied

- echo "ExecStartPost=/sbin/iptables -P FORWARD ACCEPT" >> /etc/systemd/system/docker.service.d/exec_start.conf
- systemctl daemon-reload
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer systemd start to cse

@@ -3,5 +3,5 @@
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; }
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; }
retrycmd_get_tarball() { retries=$1; wait=$2; tarball=$3; url=$4; for i in $(seq 1 $retries); do tar -tzf $tarball; [ $? -eq 0 ] && break || retrycmd_if_failure_no_stats $retries 1 10 curl -fsSL $url -o $tarball; sleep $wait; done; }
ensure_etcd_ready() { for i in $(seq 1 1800); do if [ -e /opt/azure/containers/certs.ready ]; then break; fi; sleep 1; done }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace this with a generic function wait_for_file

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants