Skip to content

Commit

Permalink
Integration test for cni-repair-controller (#316)
Browse files Browse the repository at this point in the history
* Integration test for cni-repair-controller

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `cni-repair-controller-integration`, which performs the following steps:

- Rebuilds the `linkerd-cni-repair-controller` crate and `cni-plugin`
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./cni-repair-controller/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
  • Loading branch information
alpeb authored Jan 25, 2024
1 parent 188794c commit fb9c51e
Show file tree
Hide file tree
Showing 6 changed files with 132 additions and 3 deletions.
10 changes: 10 additions & 0 deletions .github/workflows/cni-plugin-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ on:
- cni-plugin/integration/flannel/Dockerfile-tester
- cni-plugin/integration/run.sh
- cni-plugin/**
- cni-repair-controller/**
- justfile*

jobs:
cni-flannel-test:
Expand Down Expand Up @@ -46,3 +48,11 @@ jobs:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
- name: Run CNI ordering tests
run: just cni-plugin-test-ordering
repair-controller:
timeout-minutes: 15
runs-on: ubuntu-latest
steps:
- uses: linkerd/dev/actions/setup-tools@v42
- uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac
- name: Run repair-controller tests
run: just cni-repair-controller-integration
11 changes: 11 additions & 0 deletions cni-plugin/integration/calico-k3s-images.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"name": "docker.io/rancher/k3s",
"channels": {
"stable": "v1.27.6-k3s1",
"latest": "v1.27.6-k3s1",
"v1.27": "v1.27.6-k3s1"
},
"digests": {
"v1.27.6-k3s1": "sha256:9486bbb9ca9b81c098ecd07f1c45441e143dab12577e22cf062586edcfd9d952"
}
}
7 changes: 7 additions & 0 deletions cni-repair-controller/integration/linkerd-cni-config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# This config adds an extra initContainer that will make linkerd-cni to delay
# its start for 15s, so to allow time for the pause DaemonSet to start before
# the full CNI config is ready and enter a failure mode
extraInitContainers:
- name: sleep
image: alpine:3.19.0
command: ["/bin/sh", "-c", "sleep 15"]
19 changes: 19 additions & 0 deletions cni-repair-controller/integration/pause-ds.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: pause
spec:
selector:
matchLabels:
app: pause-app
template:
metadata:
annotations:
linkerd.io/inject: enabled
labels:
app: pause-app
spec:
priorityClassName: system-node-critical
containers:
- name: pause-container
image: k8s.gcr.io/pause
70 changes: 70 additions & 0 deletions cni-repair-controller/integration/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#!/usr/bin/env bash

set -euo pipefail

# shellcheck disable=SC2086
function step() {
repeat=$(seq 1 ${#1})
printf "%0.s#" $repeat
printf "#####\n# %s...\n" "$1"
printf "%0.s#" $repeat
printf "#####\n"
}

if [[ ! "$1" =~ (.*):(.*) ]]; then
echo 'Usage: run.sh name:tag'
exit 1
fi
cni_plugin_image=${BASH_REMATCH[1]}
cni_image_version=${BASH_REMATCH[2]}

cd "${BASH_SOURCE[0]%/*}"

step 'Installing Calico'
kubectl apply -f https://k3d.io/v5.1.0/usage/advanced/calico.yaml
kubectl --namespace=kube-system wait --for=condition=available --timeout=120s \
deploy/calico-kube-controllers

step 'Installing latest linkerd edge'
scurl https://run.linkerd.io/install-edge | sh
export PATH=$PATH:$HOME/.linkerd2/bin
linkerd install --crds | kubectl apply -f -
# The linkerd-cni-config.yml config adds an extra initContainer that will make
# linkerd-cni to delay its start for 15s, so to allow time for the pause
# DaemonSet to start before the full CNI config is ready and enter a failure
# mode
linkerd install-cni \
--use-wait-flag \
--cni-image "$cni_plugin_image" \
--cni-image-version "$cni_image_version" \
--set repairController.enabled=true \
-f linkerd-cni-config.yml \
| kubectl apply -f -
linkerd check --pre --linkerd-cni-enabled
linkerd install --linkerd-cni-enabled | kubectl apply -f -
linkerd check

step 'Installing pause DaemonSet'
kubectl apply -f pause-ds.yml
kubectl wait --for=condition=ready --timeout=120s -l app=pause-app po

step 'Adding a node'
cluster=$(just-k3d --evaluate K3D_CLUSTER_NAME)
image=$(just --evaluate cni-plugin-image)
k3d node create node2 --cluster "$cluster"
k3d image import --cluster "$cluster" "$image"

step 'Checking new DS replica fails with code 95'
sleep 10
kubectl wait \
--for=jsonpath='{.status.initContainerStatuses[0].lastState.terminated.exitCode}'=95 \
--field-selector=spec.nodeName=k3d-node2-0 \
pod

step 'Checking new DS replica gets replaced'
for _ in {1..5}; do
if kubectl wait --for=condition=ready --timeout=10s -l app=pause-app po; then
break
fi
done
kubectl wait --for=condition=ready --timeout=10s -l app=pause-app po;
18 changes: 15 additions & 3 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ lint: sh-lint md-lint rs-clippy action-lint action-dev-check

go-lint *flags: (proxy-init-lint flags) (cni-plugin-lint flags)

test: rs-test proxy-init-test-unit proxy-init-test-integration
test: rs-test proxy-init-test-unit proxy-init-test-integration cni-repair-controller-integration

# Check whether the Go code is formatted.
go-fmt-check:
Expand Down Expand Up @@ -82,6 +82,15 @@ cni-repair-controller *args:
TARGETCRATE=linkerd-cni-repair-controller \
{{ just_executable() }} --justfile=justfile-rust {{ args }}

# The K3S_IMAGES_JSON file used instructs the creation of a cluster on version
# v1.27.6-k3s1, because after that Calico won't work.
# See https://github.com/k3d-io/k3d/issues/1375
cni-repair-controller-integration $K3S_IMAGES_JSON='./cni-plugin/integration/calico-k3s-images.json': (cni-repair-controller "package") build-cni-plugin-image
@{{ just_executable() }} K3D_CREATE_FLAGS='{{ _K3D_CREATE_FLAGS_NO_CNI }}' _k3d-cni-create
@just-k3d use
@just-k3d import {{ cni-plugin-image }}
./cni-repair-controller/integration/run.sh {{ cni-plugin-image }}

##
## cni-plugin
##
Expand Down Expand Up @@ -178,8 +187,11 @@ _cni-plugin-test-integration:
# Run cni-plugin integration tests using calico, in a dedicated k3d environment
# NOTE: we have to rely on a different set of dependencies here; specifically
# `k3d-create` instead of `_k3d-ready`, since without a CNI DNS pods won't
# start
cni-plugin-test-integration-calico:
# start.
# The K3S_IMAGES_JSON file used instructs the creation of a cluster on version
# v1.27.6-k3s1, because after that Calico won't work.
# See https://github.com/k3d-io/k3d/issues/1375
cni-plugin-test-integration-calico $K3S_IMAGES_JSON='./cni-plugin/integration/calico-k3s-images.json':
@{{ just_executable() }} \
CNI_TEST_SCENARIO='calico' \
K3D_CLUSTER_NAME='l5d-calico-test' \
Expand Down

0 comments on commit fb9c51e

Please sign in to comment.