-
Notifications
You must be signed in to change notification settings - Fork 558
wait longer for cloud-init docker install #2641
Conversation
@@ -181,8 +181,6 @@ runcmd: | |||
# the first arg is the number of retries, the second arg is the wait duration between two retries and the rest of the args are the cmd to run | |||
- set -x | |||
- . /opt/azure/containers/provision_source.sh | |||
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplifying cloud-init by removing these pre-network checks (which don't guarantee network availability during provisioning anyways)
@@ -200,11 +198,9 @@ runcmd: | |||
- "echo \"Package: docker-engine\nPin: version {{WrapAsVariable "dockerEngineVersion"}}\nPin-Priority: 550\n\" > /etc/apt/preferences.d/docker.pref" | |||
- apt_get_update | |||
- retrycmd_if_failure 20 10 120 apt-get install -y ebtables docker-engine | |||
- touch /opt/azure/containers/dockerinstall.complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not mark docker install as complete until all config has been applied (see line below)
- echo "ExecStartPost=/sbin/iptables -P FORWARD ACCEPT" >> /etc/systemd/system/docker.service.d/exec_start.conf | ||
- systemctl daemon-reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let cse determine whether or not to restart docker systemd
- echo `date`,`hostname`, postdockerinstall>>/opt/m | ||
- retrycmd_if_failure 100 1 10 systemctl daemon-reload && systemctl restart docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto above
@@ -23,25 +23,13 @@ fi | |||
ensureRunCommandCompleted() | |||
{ | |||
echo "waiting for runcmd to finish" | |||
for i in {1..900}; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the new wait_for_file
generic
} | ||
|
||
ensureDockerInstallCompleted() | ||
{ | ||
echo "waiting for docker install to finish" | ||
for i in {1..900}; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -170,21 +158,7 @@ function ensureFilepath() { | |||
if $REBOOTREQUIRED; then | |||
return | |||
fi | |||
found=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -362,13 +336,7 @@ function ensureK8s() { | |||
k8sHealthy=1 | |||
nodesActive=1 | |||
nodesReady=1 | |||
for i in {1..600}; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -372,9 +372,7 @@ runcmd: | |||
# the first arg is the number of retries, the second arg is the wait duration between two retries and the rest of the args are the cmd to run | |||
- set -x | |||
- . /opt/azure/containers/provision_source.sh | |||
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "check for network availability before proceeding" steps because they aren't meaningful
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53 | ||
- retrycmd_if_failure 120 1 5 nc -zw1 aptdocker.azureedge.net 443 | ||
- ensure_etcd_ready | ||
- wait_for_file 1800 1 /opt/azure/containers/certs.ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace ensure_etcd_ready
with new wait_for_file
generic
@@ -394,10 +392,8 @@ runcmd: | |||
- "echo \"Package: docker-engine\nPin: version {{WrapAsVariable "dockerEngineVersion"}}\nPin-Priority: 550\n\" > /etc/apt/preferences.d/docker.pref" | |||
- apt_get_update | |||
- retrycmd_if_failure 20 10 120 apt-get install -y ebtables docker-engine | |||
- touch /opt/azure/containers/dockerinstall.complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move docker install completion mark until all config has been applied
- echo "ExecStartPost=/sbin/iptables -P FORWARD ACCEPT" >> /etc/systemd/system/docker.service.d/exec_start.conf | ||
- systemctl daemon-reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer systemd start to cse
@@ -3,5 +3,5 @@ | |||
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; } | |||
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; } | |||
retrycmd_get_tarball() { retries=$1; wait=$2; tarball=$3; url=$4; for i in $(seq 1 $retries); do tar -tzf $tarball; [ $? -eq 0 ] && break || retrycmd_if_failure_no_stats $retries 1 10 curl -fsSL $url -o $tarball; sleep $wait; done; } | |||
ensure_etcd_ready() { for i in $(seq 1 1800); do if [ -e /opt/azure/containers/certs.ready ]; then break; fi; sleep 1; done } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace this with a generic function wait_for_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
What this PR does / why we need it: We've seen intermittent long install times for docker dependencies, permit more time to ensure cluster provision succeeds.
In the process of doing the above, I also rationalized file checks.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
If applicable:
Release note: