Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

add timeout to retry func, deal with curl + tar #2518

Merged
merged 5 commits into from
Mar 23, 2018

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Mar 22, 2018

What this PR does / why we need it: adds timeout enforcement to all retry commands during provisioning; also enables retry if we download a bad tarball for CNI artifacts

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

add timeout to provision retries

@ghost ghost assigned jackfrancis Mar 22, 2018
@ghost ghost added the in progress label Mar 22, 2018
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified retrycmd_if_failure() func definition to expect a 3rd argument $timeout, which we pass to timeout. Essentially we are now wrapping all our executable calls inside timeout, to prevent edge-case long-running execution from subverting the retry enforcement.

retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; }
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for retrycmd_if_failure_no_stats()

retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; }
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; }
retrycmd_get_tarball() { retries=$1; wait=$2; tarball=$3; url=$4; for i in $(seq 1 $retries); do tar -tzf $tarball; [ $? -eq 0 ] && break || retrycmd_if_failure_no_stats $retries 1 10 curl -fsSL $url -o $tarball; sleep $wait; done; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new retrycmd_get_tarball() func that downloads a gzip'd tarball to the local filesystem. This wrapper enforces that the downloaded artifact is actually able to be untar'd/gunzip'd.

@@ -184,25 +184,25 @@ runcmd:
- set -x
- . /opt/azure/containers/provision_source.sh
- apt_get_update() { for i in $(seq 1 100); do apt-get update 2>&1 | grep -x "[WE]:.*"; [ $? -ne 0 ] && break || sleep 1; done; echo Executed apt-get update $i times; }
- retrycmd_if_failure 120 1 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
- retrycmd_if_failure 120 1 nc -zw1 aptdocker.azureedge.net 443
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all prior instances of retrycmd_if_failure usage have been adapted to include the new 3rd arg for timeout

- echo `date`,`hostname`, aptinstall>>/opt/m
- systemctl enable rpcbind
- systemctl enable rpc-statd
- systemctl start rpcbind
- systemctl start rpc-statd
- echo `date`,`hostname`, predockerinstall>>/opt/m
- retrycmd_if_failure_no_stats 180 1 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg
- retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all prior instances of retrycmd_if_failure_no_stats usage have been adapted to include the new 3rd arg for timeout

@@ -319,7 +319,7 @@ MASTER_ARTIFACTS_CONFIG_PLACEHOLDER
source /opt/azure/containers/provision_source.sh
ETCD_VER=v{{WrapAsVariable "etcdVersion"}}
DOWNLOAD_URL={{WrapAsVariable "etcdDownloadURLBase"}}
retrycmd_if_failure 5 5 curl --retry 5 --retry-delay 10 --retry-max-time 30 -L ${DOWNLOAD_URL}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
retrycmd_get_tarball 60 1 /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz ${DOWNLOAD_URL}/etcd-${ETCD_VER}-linux-amd64.tar.gz
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted to new retrycmd_get_tarball usage to ensure that downloaded artifact is valid

AZURE_CNI_TGZ_TMP=/tmp/azure_cni.tgz
retrycmd_if_failure_no_stats 180 1 curl -fsSL ${VNET_CNI_PLUGINS_URL} > $AZURE_CNI_TGZ_TMP
retrycmd_get_tarball 60 1 $AZURE_CNI_TGZ_TMP ${VNET_CNI_PLUGINS_URL}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto retrycmd_get_tarball conversion

chown -R root:root $CNI_CONFIG_DIR
chmod 755 $CNI_CONFIG_DIR

# Download Azure VNET CNI plugins.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaning up comments while I pass through regions of code to minimize CSE payload

(
curl --retry 5 --retry-delay 10 --retry-max-time 30 -sSL "https://storage.googleapis.com/golang/${GO_VERSION}.linux-amd64.tar.gz" | sudo tar -v -C /usr/local -xz
)
retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://golang.org/VERSION?m=text > /tmp/gover.txt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted all of this to retrycmd* enforcement

@@ -45,7 +45,7 @@ write_certs_to_disk_with_retry() {
}

# block until all etcd is ready
retrycmd_if_failure etcdctl cluster-health
retrycmd_if_failure 100 5 10 etcdctl cluster-health
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this bug

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jackfrancis jackfrancis merged commit 08ac0e2 into Azure:master Mar 23, 2018
@ghost ghost removed the in progress label Mar 23, 2018
@jackfrancis jackfrancis deleted the retry-curl branch March 23, 2018 20:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants