forked from kubevirt/kubevirt
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[3e52bb0 kind, vgpu: Bump vgpu kind to k8s-1.25](kubevirt/kubevirtci#979) [7e486e5 k3d: Introduce k3d SR-IOV provider](kubevirt/kubevirtci#972) [42c3f70 Fix some typos](kubevirt/kubevirtci#971) [e37ca14 Remove the centos8 based k8s-1.26 provider](kubevirt/kubevirtci#969) [46a9824 Run bazelisk run //robots/cmd/kubevirtci-bumper:kubevirtci-bumper -- -ensure-last-three-minor-of v1 --k8s-provider-dir /home/prow/go/src/github.com/kubevirt/project-infra/../kubevirtci/cluster-provision/k8s](kubevirt/kubevirtci#974) ```release-note NONE ``` Signed-off-by: kubevirt-bot <[email protected]>
- Loading branch information
1 parent
35792a2
commit be36db4
Showing
37 changed files
with
5,392 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
75e9735d504b498b7e0add2a7f84551fb4066ddc | ||
15094979188934a7d0a3541e85d4353e313940ef |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
filters: | ||
".*": | ||
reviewers: | ||
- qinqon | ||
- oshoval | ||
- phoracek | ||
- ormergi | ||
approvers: | ||
- qinqon | ||
- phoracek |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# K8s 1.25.x with SR-IOV in a K3d cluster | ||
|
||
Provides a pre-deployed containerized k8s cluster with version 1.25.x that runs | ||
using [K3d](https://github.com/k3d-io/k3d) | ||
The cluster is completely ephemeral and is recreated on every cluster restart. The KubeVirt containers are built on the | ||
local machine and are then pushed to a registry which is exposed at | ||
`127.0.0.1:5000`. | ||
|
||
This version requires to have SR-IOV enabled nics (SR-IOV Physical Function) on the current host, and will move | ||
physical interfaces into the `K3d`'s cluster agent node(s) (agent node is a worker node on k3d terminology) | ||
so that they can be used through multus and SR-IOV | ||
components. | ||
|
||
This provider also deploys [multus](https://github.com/k8snetworkplumbingwg/multus-cni) | ||
, [sriov-cni](https://github.com/k8snetworkplumbingwg/sriov-cni) | ||
and [sriov-device-plugin](https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin). | ||
|
||
## Bringing the cluster up | ||
|
||
```bash | ||
export KUBEVIRT_PROVIDER=k3d-1.25-sriov | ||
export KUBECONFIG=$(realpath _ci-configs/k3d-1.25-sriov/.kubeconfig) | ||
make cluster-up | ||
``` | ||
``` | ||
$ kubectl get nodes | ||
NAME STATUS ROLES AGE VERSION | ||
k3d-sriov-server-0 Ready control-plane,master 67m v1.25.6+k3s1 | ||
k3d-sriov-agent-0 Ready worker 67m v1.25.6+k3s1 | ||
k3d-sriov-agent-1 Ready worker 67m v1.25.6+k3s1 | ||
$ kubectl get pods -n kube-system -l app=multus | ||
NAME READY STATUS RESTARTS AGE | ||
kube-multus-ds-z9hvs 1/1 Running 0 66m | ||
kube-multus-ds-7shgv 1/1 Running 0 66m | ||
kube-multus-ds-l49xj 1/1 Running 0 66m | ||
$ kubectl get pods -n sriov -l app=sriov-cni | ||
NAME READY STATUS RESTARTS AGE | ||
kube-sriov-cni-ds-amd64-4pndd 1/1 Running 0 66m | ||
kube-sriov-cni-ds-amd64-68nhh 1/1 Running 0 65m | ||
$ kubectl get pods -n sriov -l app=sriovdp | ||
NAME READY STATUS RESTARTS AGE | ||
kube-sriov-device-plugin-amd64-qk66v 1/1 Running 0 66m | ||
kube-sriov-device-plugin-amd64-d5r5b 1/1 Running 0 65m | ||
``` | ||
|
||
### Conneting to a node | ||
```bash | ||
export KUBEVIRT_PROVIDER=k3d-1.25-sriov | ||
./cluster-up/ssh.sh <node_name> /bin/sh | ||
``` | ||
|
||
## Bringing the cluster down | ||
|
||
```bash | ||
export KUBEVIRT_PROVIDER=k3d-1.25-sriov | ||
make cluster-down | ||
``` | ||
|
||
This destroys the whole cluster, and gracefully moves the SR-IOV nics to the root network namespace. | ||
|
||
Note: killing the containers / cluster without gracefully moving the nics to the root ns before it, | ||
might result in unreachable nics for few minutes. | ||
`find /sys/class/net/*/device/sriov_numvfs` can be used to see when the nics are reachable again. | ||
|
||
## Using podman | ||
Podman v4 is required. | ||
|
||
Run: | ||
```bash | ||
systemctl enable --now podman.socket | ||
ln -s /run/podman/podman.sock /var/run/docker.sock | ||
``` | ||
The rest is as usual. | ||
For more info see https://k3d.io/v5.4.1/usage/advanced/podman. | ||
|
||
## Updating the provider | ||
|
||
### Bumping K3D | ||
Update `K3D_TAG` (see `cluster-up/cluster/k3d/common.sh` for more info) | ||
|
||
### Bumping CNI | ||
Update `CNI_VERSION` (see `cluster-up/cluster/k3d/common.sh` for more info) | ||
|
||
### Bumping Multus | ||
Download the newer manifest `https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/deployments/multus-daemonset-crio.yml` | ||
replace this file `cluster-up/cluster/$KUBEVIRT_PROVIDER/sriov-components/manifests/multus/multus.yaml` | ||
and update the kustomization file `cluster-up/cluster/$KUBEVIRT_PROVIDER/sriov-components/manifests/multus/kustomization.yaml` | ||
according needs. | ||
|
||
### Bumping calico | ||
1. Fetch new calico yaml (https://docs.tigera.io/calico/3.25/getting-started/kubernetes/k3s/quickstart) | ||
Enable `allow_ip_forwarding` (See https://k3d.io/v5.4.7/usage/advanced/calico) | ||
Or use the one that is suggested here https://k3d.io/v5.4.7/usage/advanced/calico whenever it is updated. | ||
2. Prefix the images in the yaml with `quay.io/` unless they have it already. | ||
3. Update `cluster-up/cluster/k3d/manifests/calico.yaml` (see `CALICO` at `cluster-up/cluster/k3d/common.sh` for more info) | ||
|
||
Note: Make sure to follow the latest verions on the links above. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# How to troubleshoot a failing k3d job | ||
|
||
If logging and output artifacts are not enough, there is a way to connect to a running CI pod and troubleshoot directly from there. | ||
|
||
## Pre-requisites | ||
|
||
- A working (enabled) account on the [CI cluster](shift.ovirt.org), specifically enabled to the `kubevirt-prow-jobs` project. | ||
- The [mkpj tool](https://github.com/kubernetes/test-infra/tree/master/prow/cmd/mkpj) installed | ||
|
||
## Launching a custom job | ||
|
||
Through the `mkpj` tool, it's possible to craft a custom Prow Job that can be executed on the CI cluster. | ||
|
||
Just `go get` it by running `go get k8s.io/test-infra/prow/cmd/mkpj` | ||
|
||
Then run the following command from a checkout of the [project-infra repo](https://github.com/kubevirt/project-infra): | ||
|
||
```bash | ||
mkpj --pull-number $KUBEVIRT_PR_NUMBER -job pull-kubevirt-e2e-k3d-1.25-sriov -job-config-path github/ci/prow/files/jobs/kubevirt/kubevirt-presubmits.yaml --config-path github/ci/prow/files/config.yaml > debugkind.yaml | ||
``` | ||
|
||
You will end up having a ProwJob manifest in the `debugkind.yaml` file. | ||
|
||
It's strongly recommended to replace the job's name, as it will be easier to find and debug the relative pod, by replacing `metadata.name` with something more recognizeable. | ||
|
||
The `$KUBEVIRT_PR_NUMBER` can be an actual PR on the [kubevirt repo](https://github.com/kubevirt/kubevirt). | ||
|
||
In case we just want to debug the cluster provided by the CI, it's recommended to override the entry point, either in the test PR we are instrumenting (a good sample can be found [here](https://github.com/kubevirt/kubevirt/pull/3022)), or by overriding the entry point directly in the prow job's manifest. | ||
|
||
Remember that we want the cluster long living, so a long sleep must be provided as part of the entry point. | ||
|
||
Make sure you switch to the `kubevirt-prow-jobs` project, and apply the manifest: | ||
|
||
```bash | ||
kubectl apply -f debugkind.yaml | ||
``` | ||
|
||
You will end up with a ProwJob object, and a pod with the same name you gave to the ProwJob. | ||
|
||
Once the pod is up & running, connect to it via bash: | ||
|
||
```bash | ||
kubectl exec -it debugprowjobpod bash | ||
``` | ||
|
||
### Logistics | ||
|
||
Once you are in the pod, you'll be able to troubleshoot what's happening in the environment CI is running its tests. | ||
|
||
Run the follow to bring up a [k3d](https://github.com/k3d-io/k3d) cluster with SR-IOV installed. | ||
|
||
```bash | ||
KUBEVIRT_PROVIDER=k3d-1.25-sriov make cluster-up | ||
``` | ||
|
||
Use `k3d kubeconfig print sriov` to extract the kubeconfig file. | ||
The `kubectl` binary is already on board and in `$PATH`. | ||
See `README.md` for more info. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
#!/bin/bash | ||
|
||
[ $(id -u) -ne 0 ] && echo "FATAL: this script requires sudo privileges" >&2 && exit 1 | ||
|
||
set -xe | ||
|
||
PF_COUNT_PER_NODE=1 | ||
|
||
SCRIPT_PATH=$(dirname "$(realpath "$0")") | ||
|
||
source ${SCRIPT_PATH}/sriov-node/node.sh | ||
source ${SCRIPT_PATH}/sriov-components/sriov_components.sh | ||
|
||
CONFIGURE_VFS_SCRIPT_PATH="$SCRIPT_PATH/sriov-node/configure_vfs.sh" | ||
|
||
SRIOV_COMPONENTS_NAMESPACE="sriov" | ||
SRIOV_NODE_LABEL_KEY="sriov_capable" | ||
SRIOV_NODE_LABEL_VALUE="true" | ||
SRIOV_NODE_LABEL="$SRIOV_NODE_LABEL_KEY=$SRIOV_NODE_LABEL_VALUE" | ||
SRIOVDP_RESOURCE_PREFIX="kubevirt.io" | ||
SRIOVDP_RESOURCE_NAME="sriov_net" | ||
VFS_DRIVER="vfio-pci" | ||
VFS_DRIVER_KMODULE="vfio_pci" | ||
VFS_COUNT="6" | ||
|
||
function validate_nodes_sriov_allocatable_resource() { | ||
local -r resource_name="$SRIOVDP_RESOURCE_PREFIX/$SRIOVDP_RESOURCE_NAME" | ||
local -r sriov_nodes=$(_kubectl get nodes -l $SRIOV_NODE_LABEL -o custom-columns=:.metadata.name --no-headers) | ||
|
||
local num_vfs | ||
for sriov_node in $sriov_nodes; do | ||
num_vfs=$(node::total_vfs_count "$sriov_node") | ||
sriov_components::wait_allocatable_resource "$sriov_node" "$resource_name" "$num_vfs" | ||
done | ||
} | ||
|
||
worker_nodes=($(_kubectl get nodes -l node-role.kubernetes.io/worker -o custom-columns=:.metadata.name --no-headers)) | ||
worker_nodes_count=${#worker_nodes[@]} | ||
[ "$worker_nodes_count" -eq 0 ] && echo "FATAL: no worker nodes found" >&2 && exit 1 | ||
|
||
pfs_names=($(node::discover_host_pfs)) | ||
pf_count="${#pfs_names[@]}" | ||
[ "$pf_count" -eq 0 ] && echo "FATAL: Could not find available sriov PF's" >&2 && exit 1 | ||
|
||
total_pf_required=$((worker_nodes_count*PF_COUNT_PER_NODE)) | ||
[ "$pf_count" -lt "$total_pf_required" ] && \ | ||
echo "FATAL: there are not enough PF's on the host, try to reduce PF_COUNT_PER_NODE | ||
Worker nodes count: $worker_nodes_count | ||
PF per node count: $PF_COUNT_PER_NODE | ||
Total PF count required: $total_pf_required" >&2 && exit 1 | ||
|
||
## Move SR-IOV Physical Functions to worker nodes | ||
PFS_IN_USE="" | ||
node::configure_sriov_pfs "${worker_nodes[*]}" "${pfs_names[*]}" "$PF_COUNT_PER_NODE" "PFS_IN_USE" | ||
|
||
## Create VFs and configure their drivers on each SR-IOV node | ||
node::configure_sriov_vfs "${worker_nodes[*]}" "$VFS_DRIVER" "$VFS_DRIVER_KMODULE" "$VFS_COUNT" | ||
|
||
## Deploy Multus and SRIOV components | ||
sriov_components::deploy_multus | ||
sriov_components::deploy \ | ||
"$PFS_IN_USE" \ | ||
"$VFS_DRIVER" \ | ||
"$SRIOVDP_RESOURCE_PREFIX" "$SRIOVDP_RESOURCE_NAME" \ | ||
"$SRIOV_NODE_LABEL_KEY" "$SRIOV_NODE_LABEL_VALUE" | ||
|
||
# Verify that each sriov capable node has sriov VFs allocatable resource | ||
validate_nodes_sriov_allocatable_resource | ||
sriov_components::wait_pods_ready | ||
|
||
_kubectl get nodes | ||
_kubectl get pods -n $SRIOV_COMPONENTS_NAMESPACE |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e | ||
|
||
export CLUSTER_NAME="sriov" | ||
export HOST_PORT=5000 | ||
|
||
DEPLOY_SRIOV=${DEPLOY_SRIOV:-true} | ||
|
||
function print_available_nics() { | ||
echo 'STEP: Available NICs' | ||
# print hardware info for easier debugging based on logs | ||
${CRI_BIN} run --rm --cap-add=SYS_RAWIO quay.io/phoracek/lspci@sha256:0f3cacf7098202ef284308c64e3fc0ba441871a846022bb87d65ff130c79adb1 sh -c "lspci | egrep -i 'network|ethernet'" | ||
echo | ||
} | ||
|
||
function print_agents_sriov_status() { | ||
nodes=$(_get_agent_nodes) | ||
echo "STEP: Print agents SR-IOV status" | ||
for node in $nodes; do | ||
echo "Node: $node" | ||
echo "VFs:" | ||
${CRI_BIN} exec $node /bin/sh -c "ls -l /sys/class/net/*/device/virtfn*" | ||
echo "PFs PCI Addresses:" | ||
${CRI_BIN} exec $node /bin/sh -c "grep PCI_SLOT_NAME /sys/class/net/*/device/uevent" | ||
done | ||
echo | ||
} | ||
|
||
function deploy_sriov() { | ||
print_available_nics | ||
${KUBEVIRTCI_PATH}/cluster/$KUBEVIRT_PROVIDER/config_sriov_cluster.sh | ||
print_agents_sriov_status | ||
} | ||
|
||
function up() { | ||
k3d_up | ||
[ $DEPLOY_SRIOV == true ] && deploy_sriov | ||
|
||
version=$(_kubectl get node k3d-$CLUSTER_NAME-server-0 -o=custom-columns=VERSION:.status.nodeInfo.kubeletVersion --no-headers) | ||
echo "$KUBEVIRT_PROVIDER cluster '$CLUSTER_NAME' is ready ($version)" | ||
} | ||
|
||
source ${KUBEVIRTCI_PATH}/cluster/k3d/common.sh |
27 changes: 27 additions & 0 deletions
27
cluster-up/cluster/k3d-1.25-sriov/sriov-components/manifests/kustomization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
namespace: sriov | ||
resources: | ||
- sriov-ns.yaml | ||
- sriov-cni-daemonset.yaml | ||
- sriovdp-daemonset.yaml | ||
- sriovdp-config.yaml | ||
patchesJson6902: | ||
- target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-sriov-cni-ds-amd64 | ||
path: patch-node-selector.yaml | ||
- target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-sriov-device-plugin-amd64 | ||
path: patch-node-selector.yaml | ||
- target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-sriov-device-plugin-amd64 | ||
path: patch-sriovdp-resource-prefix.yaml |
14 changes: 14 additions & 0 deletions
14
cluster-up/cluster/k3d-1.25-sriov/sriov-components/manifests/multus/kustomization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
resources: | ||
- multus.yaml | ||
images: | ||
- name: ghcr.io/k8snetworkplumbingwg/multus-cni | ||
newTag: v3.8 | ||
patchesJson6902: | ||
- path: patch-args.yaml | ||
target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-multus-ds |
Oops, something went wrong.