Skip to content

Commit

Permalink
reinitialize-pods controller
Browse files Browse the repository at this point in the history
Fixes linkerd/linkerd2#11073

This fixes the issue of injected pods that cannot acquire proper network
config because `linkerd-cni` and/or the cluster's network CNI haven't
fully started. They are left in a permanent crash loop and once CNI is
ready, they need to be restarted externally, which is what this
controller does.

This controller "`linkerd-reinitialize-pods`" watches over events on
pods in the current node, which have been injected but are in a
terminated state and whose `linkerd-network-validator` container exited
with code 95, and proceeds to evict them so they can restart with a
proper network config.

The controller is to be deployed as an additional container in the
`linkerd-cni` DaemonSet (addressed in linkerd/linkerd2#xxx).

## TO-DOs

- Figure why `/metrics` is returning a 404 (should show process metrics)
- Integration test
  • Loading branch information
alpeb committed Dec 5, 2023
1 parent 39b796d commit a752eac
Show file tree
Hide file tree
Showing 11 changed files with 2,143 additions and 144 deletions.
63 changes: 63 additions & 0 deletions .github/workflows/release-reinitialize-pods.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: reinitialize-pods release

on:
pull_request:
paths:
- .github/workflows/release-reinitialize-pods.yml
push:
tags: ["reinitialize-pods/v*"]

permissions:
contents: read

jobs:
meta:
timeout-minutes: 15
runs-on: ubuntu-latest
container: ghcr.io/linkerd/dev:v42-rust
steps:
- uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac
- uses: ./.github/actions/version-mode
id: meta
with:
package: reinitialize-pods
check: true
outputs:
repo: ${{ steps.meta.outputs.repo }}
mode: ${{ steps.meta.outputs.mode }}
version: ${{ steps.meta.outputs.version }}

package:
needs: meta
strategy:
matrix:
arch: [amd64, arm64, arm]
timeout-minutes: 10
runs-on: ubuntu-latest
container: ghcr.io/linkerd/dev:v42-rust-musl
steps:
- uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac
- run: just reinitialize-pods arch=${{ matrix.arch }} profile=release version=${{ needs.meta.outputs.version }} package
- uses: actions/upload-artifact@v3
with:
name: ${{ matrix.arch }}-artifacts
path: target/package/

publish:
needs: [meta, package]
timeout-minutes: 5
permissions:
contents: write
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@9bc31d5ccc31df68ecc42ccf4149144866c47d8a
with:
path: ${{ runner.temp }}/artifacts
- run: find "$RUNNER_TEMP"/artifacts -type f -ls
- uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac
- if: needs.meta.outputs.mode == 'release'
uses: softprops/action-gh-release@de2c0eb89ae2a093876385947365aca7b0e5f844
with:
name: validator ${{ needs.meta.outputs.version }}
files: ${{ runner.temp }}/artifacts/**/*

Loading

0 comments on commit a752eac

Please sign in to comment.