-
Notifications
You must be signed in to change notification settings - Fork 558
add timeout to retry func, deal with curl + tar #2518
Conversation
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified retrycmd_if_failure()
func definition to expect a 3rd argument $timeout
, which we pass to timeout
. Essentially we are now wrapping all our executable calls inside timeout, to prevent edge-case long-running execution from subverting the retry enforcement.
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; } | ||
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for retrycmd_if_failure_no_stats()
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; } | ||
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; } | ||
retrycmd_get_tarball() { retries=$1; wait=$2; tarball=$3; url=$4; for i in $(seq 1 $retries); do tar -tzf $tarball; [ $? -eq 0 ] && break || retrycmd_if_failure_no_stats $retries 1 10 curl -fsSL $url -o $tarball; sleep $wait; done; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a new retrycmd_get_tarball()
func that downloads a gzip'd tarball to the local filesystem. This wrapper enforces that the downloaded artifact is actually able to be untar'd/gunzip'd.
@@ -184,25 +184,25 @@ runcmd: | |||
- set -x | |||
- . /opt/azure/containers/provision_source.sh | |||
- apt_get_update() { for i in $(seq 1 100); do apt-get update 2>&1 | grep -x "[WE]:.*"; [ $? -ne 0 ] && break || sleep 1; done; echo Executed apt-get update $i times; } | |||
- retrycmd_if_failure 120 1 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53 | |||
- retrycmd_if_failure 120 1 nc -zw1 aptdocker.azureedge.net 443 | |||
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all prior instances of retrycmd_if_failure
usage have been adapted to include the new 3rd arg for timeout
- echo `date`,`hostname`, aptinstall>>/opt/m | ||
- systemctl enable rpcbind | ||
- systemctl enable rpc-statd | ||
- systemctl start rpcbind | ||
- systemctl start rpc-statd | ||
- echo `date`,`hostname`, predockerinstall>>/opt/m | ||
- retrycmd_if_failure_no_stats 180 1 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg | ||
- retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all prior instances of retrycmd_if_failure_no_stats
usage have been adapted to include the new 3rd arg for timeout
@@ -319,7 +319,7 @@ MASTER_ARTIFACTS_CONFIG_PLACEHOLDER | |||
source /opt/azure/containers/provision_source.sh | |||
ETCD_VER=v{{WrapAsVariable "etcdVersion"}} | |||
DOWNLOAD_URL={{WrapAsVariable "etcdDownloadURLBase"}} | |||
retrycmd_if_failure 5 5 curl --retry 5 --retry-delay 10 --retry-max-time 30 -L ${DOWNLOAD_URL}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz | |||
retrycmd_get_tarball 60 1 /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz ${DOWNLOAD_URL}/etcd-${ETCD_VER}-linux-amd64.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted to new retrycmd_get_tarball
usage to ensure that downloaded artifact is valid
AZURE_CNI_TGZ_TMP=/tmp/azure_cni.tgz | ||
retrycmd_if_failure_no_stats 180 1 curl -fsSL ${VNET_CNI_PLUGINS_URL} > $AZURE_CNI_TGZ_TMP | ||
retrycmd_get_tarball 60 1 $AZURE_CNI_TGZ_TMP ${VNET_CNI_PLUGINS_URL} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto retrycmd_get_tarball
conversion
chown -R root:root $CNI_CONFIG_DIR | ||
chmod 755 $CNI_CONFIG_DIR | ||
|
||
# Download Azure VNET CNI plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaning up comments while I pass through regions of code to minimize CSE payload
( | ||
curl --retry 5 --retry-delay 10 --retry-max-time 30 -sSL "https://storage.googleapis.com/golang/${GO_VERSION}.linux-amd64.tar.gz" | sudo tar -v -C /usr/local -xz | ||
) | ||
retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://golang.org/VERSION?m=text > /tmp/gover.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted all of this to retrycmd
* enforcement
@@ -45,7 +45,7 @@ write_certs_to_disk_with_retry() { | |||
} | |||
|
|||
# block until all etcd is ready | |||
retrycmd_if_failure etcdctl cluster-health | |||
retrycmd_if_failure 100 5 10 etcdctl cluster-health |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed this bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
What this PR does / why we need it: adds timeout enforcement to all retry commands during provisioning; also enables retry if we download a bad tarball for CNI artifacts
If applicable:
Release note: