Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

add timeout to retry func, deal with curl + tar #2518

Merged
merged 5 commits into from
Mar 23, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions parts/k8s/kubernetesagentcustomdata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -184,25 +184,25 @@ runcmd:
- set -x
- . /opt/azure/containers/provision_source.sh
- apt_get_update() { for i in $(seq 1 100); do apt-get update 2>&1 | grep -x "[WE]:.*"; [ $? -ne 0 ] && break || sleep 1; done; echo Executed apt-get update $i times; }
- retrycmd_if_failure 120 1 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
- retrycmd_if_failure 120 1 nc -zw1 aptdocker.azureedge.net 443
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all prior instances of retrycmd_if_failure usage have been adapted to include the new 3rd arg for timeout

- retrycmd_if_failure 120 1 5 nc -zw1 aptdocker.azureedge.net 443
- apt-mark hold walinuxagent{{GetKubernetesAgentPreprovisionYaml .}}
- echo `date`,`hostname`, preaptupdate>>/opt/m
- apt_get_update
- echo `date`,`hostname`, postaptupdate>>/opt/m
- retrycmd_if_failure 5 10 apt-get install -y apt-transport-https ca-certificates nfs-common
- retrycmd_if_failure 5 10 120 apt-get install -y apt-transport-https ca-certificates nfs-common
- echo `date`,`hostname`, aptinstall>>/opt/m
- systemctl enable rpcbind
- systemctl enable rpc-statd
- systemctl start rpcbind
- systemctl start rpc-statd
- echo `date`,`hostname`, predockerinstall>>/opt/m
- retrycmd_if_failure_no_stats 180 1 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg
- retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all prior instances of retrycmd_if_failure_no_stats usage have been adapted to include the new 3rd arg for timeout

- cat /tmp/aptdocker.gpg | apt-key add -
- echo "deb {{WrapAsVariable "dockerEngineDownloadRepo"}} ubuntu-xenial main" | sudo tee /etc/apt/sources.list.d/docker.list
- "echo \"Package: docker-engine\nPin: version {{WrapAsVariable "dockerEngineVersion"}}\nPin-Priority: 550\n\" > /etc/apt/preferences.d/docker.pref"
- apt_get_update
- retrycmd_if_failure 20 10 apt-get install -y ebtables docker-engine
- retrycmd_if_failure 20 10 120 apt-get install -y ebtables docker-engine
- touch /opt/azure/containers/dockerinstall.complete
- echo "ExecStartPost=/sbin/iptables -P FORWARD ACCEPT" >> /etc/systemd/system/docker.service.d/exec_start.conf
- systemctl daemon-reload
Expand Down
2 changes: 1 addition & 1 deletion parts/k8s/kubernetesjumpboxcustomdata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ write_files:

runcmd:
- . /opt/azure/containers/provision_source.sh
- retrycmd_if_failure 10 5 curl -LO https://storage.googleapis.com/kubernetes-release/release/v{{.OrchestratorProfile.OrchestratorVersion}}/bin/linux/amd64/kubectl
- retrycmd_if_failure 10 5 10 curl -LO https://storage.googleapis.com/kubernetes-release/release/v{{.OrchestratorProfile.OrchestratorVersion}}/bin/linux/amd64/kubectl
- chmod +x ./kubectl
- sudo mv ./kubectl /usr/local/bin/kubectl
16 changes: 8 additions & 8 deletions parts/k8s/kubernetesmastercustomdata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ MASTER_ARTIFACTS_CONFIG_PLACEHOLDER
source /opt/azure/containers/provision_source.sh
ETCD_VER=v{{WrapAsVariable "etcdVersion"}}
DOWNLOAD_URL={{WrapAsVariable "etcdDownloadURLBase"}}
retrycmd_if_failure 5 5 curl --retry 5 --retry-delay 10 --retry-max-time 30 -L ${DOWNLOAD_URL}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
retrycmd_get_tarball 60 1 /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz ${DOWNLOAD_URL}/etcd-${ETCD_VER}-linux-amd64.tar.gz
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted to new retrycmd_get_tarball usage to ensure that downloaded artifact is valid

tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/bin/ --strip-components=1
systemctl daemon-reload
systemctl enable etcd.service
Expand All @@ -343,7 +343,7 @@ MASTER_ARTIFACTS_CONFIG_PLACEHOLDER
sudo /bin/sed -i s/Restart=on-failure/Restart=always/g /lib/systemd/system/etcd-member.service
systemctl daemon-reload
systemctl restart etcd-member
retrycmd_if_failure 5 5 curl --retry 5 --retry-delay 10 --retry-max-time 30 --max-time 60 http://127.0.0.1:2379/v2/machines
retrycmd_if_failure 5 5 10 curl --retry 5 --retry-delay 10 --retry-max-time 10 --max-time 60 http://127.0.0.1:2379/v2/machines
mkdir -p /etc/kubernetes/manifests
usermod -aG docker {{WrapAsVariable "username"}}

Expand Down Expand Up @@ -371,8 +371,8 @@ runcmd:
- . /opt/azure/containers/provision_source.sh
- ensure_etcd_ready() { for i in $(seq 1 1800); do if [ -e /opt/azure/containers/certs.ready ]; then break; fi; sleep 1; done }
- apt_get_update() { for i in $(seq 1 100); do apt-get update 2>&1 | grep -x "[WE]:.*"; [ $? -ne 0 ] && break || sleep 1; done; echo Executed apt-get update $i times; }
- retrycmd_if_failure 120 1 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
- retrycmd_if_failure 120 1 nc -zw1 aptdocker.azureedge.net 443
- retrycmd_if_failure 120 1 5 nc -zuw1 $(grep nameserver /etc/resolv.conf | cut -d \ -f 2) 53
- retrycmd_if_failure 120 1 5 nc -zw1 aptdocker.azureedge.net 443
- ensure_etcd_ready
- /opt/azure/containers/setup-etcd.sh > /opt/azure/containers/setup-etcd.log 2>&1
- apt-mark hold walinuxagent {{GetKubernetesMasterPreprovisionYaml}}
Expand All @@ -384,15 +384,15 @@ runcmd:
- systemctl restart etcd
- MEMBER="$(sudo etcdctl member list | grep -E {{WrapAsVerbatim "variables('masterVMNames')[copyIndex(variables('masterOffset'))]"}} | cut -d{{WrapAsVariable "singleQuote"}}:{{WrapAsVariable "singleQuote"}} -f 1)"
- sudo etcdctl member update ${MEMBER} {{WrapAsVerbatim "variables('masterEtcdPeerURLs')[copyIndex(variables('masterOffset'))]"}}
- retrycmd_if_failure 5 5 curl --cacert /etc/kubernetes/certs/ca.crt --cert /etc/kubernetes/certs/etcdclient.crt --key /etc/kubernetes/certs/etcdclient.key --retry 5 --retry-delay 10 --retry-max-time 30 --max-time 60 "{{WrapAsVerbatim "variables('masterEtcdClientURLs')[copyIndex(variables('masterOffset'))]"}}"/v2/machines
- retrycmd_if_failure 5 5 10 curl --cacert /etc/kubernetes/certs/ca.crt --cert /etc/kubernetes/certs/etcdclient.crt --key /etc/kubernetes/certs/etcdclient.key --retry 5 --retry-delay 10 --retry-max-time 10 --max-time 60 "{{WrapAsVerbatim "variables('masterEtcdClientURLs')[copyIndex(variables('masterOffset'))]"}}"/v2/machines
- apt_get_update
- retrycmd_if_failure 5 10 apt-get install -y apt-transport-https ca-certificates
- retrycmd_if_failure_no_stats 180 1 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg
- retrycmd_if_failure 5 10 120 apt-get install -y apt-transport-https ca-certificates
- retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://aptdocker.azureedge.net/gpg > /tmp/aptdocker.gpg
- cat /tmp/aptdocker.gpg | apt-key add -
- echo "deb {{WrapAsVariable "dockerEngineDownloadRepo"}} ubuntu-xenial main" | sudo tee /etc/apt/sources.list.d/docker.list
- "echo \"Package: docker-engine\nPin: version {{WrapAsVariable "dockerEngineVersion"}}\nPin-Priority: 550\n\" > /etc/apt/preferences.d/docker.pref"
- apt_get_update
- retrycmd_if_failure 20 10 apt-get install -y ebtables docker-engine
- retrycmd_if_failure 20 10 120 apt-get install -y ebtables docker-engine
- touch /opt/azure/containers/dockerinstall.complete
- echo "ExecStartPost=/sbin/iptables -P FORWARD ACCEPT" >> /etc/systemd/system/docker.service.d/exec_start.conf
- systemctl daemon-reload
Expand Down
37 changes: 7 additions & 30 deletions parts/k8s/kubernetesmastercustomscript.sh
Original file line number Diff line number Diff line change
Expand Up @@ -206,33 +206,21 @@ function setDockerOpts () {
function configAzureNetworkPolicy() {
CNI_CONFIG_DIR=/etc/cni/net.d
mkdir -p $CNI_CONFIG_DIR

chown -R root:root $CNI_CONFIG_DIR
chmod 755 $CNI_CONFIG_DIR

# Download Azure VNET CNI plugins.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaning up comments while I pass through regions of code to minimize CSE payload

CNI_BIN_DIR=/opt/cni/bin
mkdir -p $CNI_BIN_DIR

# Mirror from https://github.com/Azure/azure-container-networking/releases/tag/$AZURE_PLUGIN_VER/azure-vnet-cni-linux-amd64-$AZURE_PLUGIN_VER.tgz
AZURE_CNI_TGZ_TMP=/tmp/azure_cni.tgz
retrycmd_if_failure_no_stats 180 1 curl -fsSL ${VNET_CNI_PLUGINS_URL} > $AZURE_CNI_TGZ_TMP
retrycmd_get_tarball 60 1 $AZURE_CNI_TGZ_TMP ${VNET_CNI_PLUGINS_URL}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto retrycmd_get_tarball conversion

tar -xzf $AZURE_CNI_TGZ_TMP -C $CNI_BIN_DIR
# Mirror from https://github.com/containernetworking/cni/releases/download/$CNI_RELEASE_VER/cni-amd64-$CNI_RELEASE_VERSION.tgz
CONTAINERNETWORKING_CNI_TGZ_TMP=/tmp/containernetworking_cni.tgz
retrycmd_if_failure_no_stats 180 1 curl -fsSL ${CNI_PLUGINS_URL} > $CONTAINERNETWORKING_CNI_TGZ_TMP
retrycmd_get_tarball 60 1 $CONTAINERNETWORKING_CNI_TGZ_TMP ${CNI_PLUGINS_URL}
tar -xzf $CONTAINERNETWORKING_CNI_TGZ_TMP -C $CNI_BIN_DIR ./loopback ./portmap
chown -R root:root $CNI_BIN_DIR
chmod -R 755 $CNI_BIN_DIR

# Copy config file
mv $CNI_BIN_DIR/10-azure.conflist $CNI_CONFIG_DIR/
chmod 600 $CNI_CONFIG_DIR/10-azure.conflist

# Dump ebtables rules.
/sbin/ebtables -t nat --list

# Enable CNI.
configCNINetworkPolicy
}

Expand Down Expand Up @@ -292,29 +280,18 @@ function installClearContainersRuntime() {
function installGo() {
export GO_SRC=/usr/local/go
export GOPATH="${HOME}/.go"

# Remove any old version of Go
if [[ -d "$GO_SRC" ]]; then
rm -rf "$GO_SRC"
fi

# Remove any old GOPATH
if [[ -d "$GOPATH" ]]; then
rm -rf "$GOPATH"
fi

# Get the latest Go version
GO_VERSION=$(curl --retry 5 --retry-delay 10 --retry-max-time 30 -sSL "https://golang.org/VERSION?m=text")

echo "Installing Go version $GO_VERSION..."

# subshell
(
curl --retry 5 --retry-delay 10 --retry-max-time 30 -sSL "https://storage.googleapis.com/golang/${GO_VERSION}.linux-amd64.tar.gz" | sudo tar -v -C /usr/local -xz
)
retrycmd_if_failure_no_stats 180 1 5 curl -fsSL https://golang.org/VERSION?m=text > /tmp/gover.txt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted all of this to retrycmd* enforcement

GO_VERSION=$(cat /tmp/gover.txt)
retrycmd_get_tarball 60 1 /tmp/golang.tgz https://storage.googleapis.com/golang/${GO_VERSION}.linux-amd64.tar.gz
tar -v -C /usr/local -xzf /tmp/golang.tgz

# Set GOPATH and update PATH
echo "Setting GOPATH and updating PATH"
export PATH="${GO_SRC}/bin:${PATH}:${GOPATH}/bin"
}

Expand Down Expand Up @@ -482,7 +459,7 @@ function ensureDocker() {
}

function ensureKubelet() {
retrycmd_if_failure 100 1 docker pull $HYPERKUBE_URL
retrycmd_if_failure 100 1 60 docker pull $HYPERKUBE_URL
systemctlEnableAndCheck kubelet
# only start if a reboot is not required
if ! $REBOOTREQUIRED; then
Expand Down
2 changes: 1 addition & 1 deletion parts/k8s/kubernetesmastergenerateproxycertscript.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ write_certs_to_disk_with_retry() {
}

# block until all etcd is ready
retrycmd_if_failure etcdctl cluster-health
retrycmd_if_failure 100 5 10 etcdctl cluster-health
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this bug

# Make etcd keys, adding a leading whitespace because etcd won't accept a val that begins with a '-' (hyphen)!
if etcdctl mk $ETCD_REQUESTHEADER_CLIENT_CA " $(cat ${PROXY_CRT})"; then
etcdctl mk $ETCD_PROXY_KEY " $(cat ${PROXY_CLIENT_KEY})"
Expand Down
5 changes: 3 additions & 2 deletions parts/k8s/kubernetesprovisionsource.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#!/bin/sh

retrycmd_if_failure() { retries=$1; wait=$2; shift && shift; for i in $(seq 1 $retries); do ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; }
retrycmd_if_failure_no_stats() { retries=$1; wait=$2; shift && shift; for i in $(seq 1 $retries); do ${@}; [ $? -eq 0 ] && break || sleep $wait; done; }
retrycmd_if_failure() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; echo Executed \"$@\" $i times; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified retrycmd_if_failure() func definition to expect a 3rd argument $timeout, which we pass to timeout. Essentially we are now wrapping all our executable calls inside timeout, to prevent edge-case long-running execution from subverting the retry enforcement.

retrycmd_if_failure_no_stats() { retries=$1; wait=$2; timeout=$3; shift && shift && shift; for i in $(seq 1 $retries); do timeout $timeout ${@}; [ $? -eq 0 ] && break || sleep $wait; done; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for retrycmd_if_failure_no_stats()

retrycmd_get_tarball() { retries=$1; wait=$2; tarball=$3; url=$4; for i in $(seq 1 $retries); do tar -tzf $tarball; [ $? -eq 0 ] && break || retrycmd_if_failure_no_stats $retries 1 10 curl -fsSL $url -o $tarball; sleep $wait; done; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new retrycmd_get_tarball() func that downloads a gzip'd tarball to the local filesystem. This wrapper enforces that the downloaded artifact is actually able to be untar'd/gunzip'd.