-
Notifications
You must be signed in to change notification settings - Fork 119
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
- Loading branch information
Showing
24 changed files
with
5,338 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
filters: | ||
".*": | ||
reviewers: | ||
- qinqon | ||
- oshoval | ||
- phoracek | ||
- ormergi | ||
approvers: | ||
- qinqon | ||
- phoracek |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# K8s 1.25.x with SR-IOV in a K3d cluster | ||
|
||
Provides a pre-deployed containerized k8s cluster with version 1.25.x that runs | ||
using [K3d](https://github.com/k3d-io/k3d) | ||
The cluster is completely ephemeral and is recreated on every cluster restart. The KubeVirt containers are built on the | ||
local machine and are then pushed to a registry which is exposed at | ||
`127.0.0.1:5000`. | ||
|
||
This version requires to have SR-IOV enabled nics (SR-IOV Physical Function) on the current host, and will move | ||
physical interfaces into the `K3d`'s cluster agent node(s) (agent node is a worker node on k3d terminology) | ||
so that they can be used through multus and SR-IOV | ||
components. | ||
|
||
This provider also deploys [multus](https://github.com/k8snetworkplumbingwg/multus-cni) | ||
, [sriov-cni](https://github.com/k8snetworkplumbingwg/sriov-cni) | ||
and [sriov-device-plugin](https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin). | ||
|
||
## Bringing the cluster up | ||
|
||
```bash | ||
export KUBEVIRT_PROVIDER=k3d-1.25-sriov | ||
export KUBECONFIG=$(realpath _ci-configs/k3d-1.25-sriov/.kubeconfig) | ||
make cluster-up | ||
``` | ||
``` | ||
$ kubectl get nodes | ||
NAME STATUS ROLES AGE VERSION | ||
k3d-sriov-server-0 Ready control-plane,master 67m v1.25.6+k3s1 | ||
k3d-sriov-agent-0 Ready worker 67m v1.25.6+k3s1 | ||
k3d-sriov-agent-1 Ready worker 67m v1.25.6+k3s1 | ||
$ kubectl get pods -n kube-system -l app=multus | ||
NAME READY STATUS RESTARTS AGE | ||
kube-multus-ds-z9hvs 1/1 Running 0 66m | ||
kube-multus-ds-7shgv 1/1 Running 0 66m | ||
kube-multus-ds-l49xj 1/1 Running 0 66m | ||
$ kubectl get pods -n sriov -l app=sriov-cni | ||
NAME READY STATUS RESTARTS AGE | ||
kube-sriov-cni-ds-amd64-4pndd 1/1 Running 0 66m | ||
kube-sriov-cni-ds-amd64-68nhh 1/1 Running 0 65m | ||
$ kubectl get pods -n sriov -l app=sriovdp | ||
NAME READY STATUS RESTARTS AGE | ||
kube-sriov-device-plugin-amd64-qk66v 1/1 Running 0 66m | ||
kube-sriov-device-plugin-amd64-d5r5b 1/1 Running 0 65m | ||
``` | ||
|
||
### Conneting to a node | ||
```bash | ||
export KUBEVIRT_PROVIDER=k3d-1.25-sriov | ||
./cluster-up/ssh.sh <node_name> /bin/sh | ||
``` | ||
|
||
## Bringing the cluster down | ||
|
||
```bash | ||
export KUBEVIRT_PROVIDER=k3d-1.25-sriov | ||
make cluster-down | ||
``` | ||
|
||
This destroys the whole cluster, and gracefully moves the SR-IOV nics to the root network namespace. | ||
|
||
Note: killing the containers / cluster without gracefully moving the nics to the root ns before it, | ||
might result in unreachable nics for few minutes. | ||
`find /sys/class/net/*/device/sriov_numvfs` can be used to see when the nics are reachable again. | ||
|
||
### Bumping calico | ||
Fetch new calico yaml and: | ||
1. Enable `allow_ip_forwarding` (See https://k3d.io/v5.0.1/usage/advanced/calico) | ||
2. Prefix the images in the yaml with `quay.io/` | ||
|
||
Note: For the initial k3d provider, used the yaml that appears on the link above, and did step [2] | ||
on top of that. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# How to troubleshoot a failing k3d job | ||
|
||
If logging and output artifacts are not enough, there is a way to connect to a running CI pod and troubleshoot directly from there. | ||
|
||
## Pre-requisites | ||
|
||
- A working (enabled) account on the [CI cluster](shift.ovirt.org), specifically enabled to the `kubevirt-prow-jobs` project. | ||
- The [mkpj tool](https://github.com/kubernetes/test-infra/tree/master/prow/cmd/mkpj) installed | ||
|
||
## Launching a custom job | ||
|
||
Through the `mkpj` tool, it's possible to craft a custom Prow Job that can be executed on the CI cluster. | ||
|
||
Just `go get` it by running `go get k8s.io/test-infra/prow/cmd/mkpj` | ||
|
||
Then run the following command from a checkout of the [project-infra repo](https://github.com/kubevirt/project-infra): | ||
|
||
```bash | ||
mkpj --pull-number $KUBEVIRT_PR_NUMBER -job pull-kubevirt-e2e-k3d-1.25-sriov -job-config-path github/ci/prow/files/jobs/kubevirt/kubevirt-presubmits.yaml --config-path github/ci/prow/files/config.yaml > debugkind.yaml | ||
``` | ||
|
||
You will end up having a ProwJob manifest in the `debugkind.yaml` file. | ||
|
||
It's strongly recommended to replace the job's name, as it will be easier to find and debug the relative pod, by replacing `metadata.name` with something more recognizeable. | ||
|
||
The `$KUBEVIRT_PR_NUMBER` can be an actual PR on the [kubevirt repo](https://github.com/kubevirt/kubevirt). | ||
|
||
In case we just want to debug the cluster provided by the CI, it's recommended to override the entry point, either in the test PR we are instrumenting (a good sample can be found [here](https://github.com/kubevirt/kubevirt/pull/3022)), or by overriding the entry point directly in the prow job's manifest. | ||
|
||
Remember that we want the cluster long living, so a long sleep must be provided as part of the entry point. | ||
|
||
Make sure you switch to the `kubevirt-prow-jobs` project, and apply the manifest: | ||
|
||
```bash | ||
kubectl apply -f debugkind.yaml | ||
``` | ||
|
||
You will end up with a ProwJob object, and a pod with the same name you gave to the ProwJob. | ||
|
||
Once the pod is up & running, connect to it via bash: | ||
|
||
```bash | ||
kubectl exec -it debugprowjobpod bash | ||
``` | ||
|
||
### Logistics | ||
|
||
Once you are in the pod, you'll be able to troubleshoot what's happening in the environment CI is running its tests. | ||
|
||
Run the follow to bring up a [k3d](https://github.com/k3d-io/k3d) cluster with SR-IOV installed. | ||
|
||
```bash | ||
KUBEVIRT_PROVIDER=k3d-1.25-sriov make cluster-up | ||
``` | ||
|
||
Use `k3d kubeconfig print sriov` to extract the kubeconfig file. | ||
The `kubectl` binary is already on board and in `$PATH`. | ||
See `README.md` for more info. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
#!/bin/bash | ||
|
||
[ $(id -u) -ne 0 ] && echo "FATAL: this script requires sudo privileges" >&2 && exit 1 | ||
|
||
set -xe | ||
|
||
PF_COUNT_PER_NODE=${PF_COUNT_PER_NODE:-1} | ||
[ $PF_COUNT_PER_NODE -le 0 ] && echo "FATAL: PF_COUNT_PER_NODE must be a positive integer" >&2 && exit 1 | ||
[ $PF_COUNT_PER_NODE != 1 ] && echo "FATAL: only 1 PF per node is supported for now" >&2 && exit 1 | ||
|
||
SCRIPT_PATH=$(dirname "$(realpath "$0")") | ||
|
||
source ${SCRIPT_PATH}/sriov-node/node.sh | ||
source ${SCRIPT_PATH}/sriov-components/sriov_components.sh | ||
|
||
CONFIGURE_VFS_SCRIPT_PATH="$SCRIPT_PATH/sriov-node/configure_vfs.sh" | ||
|
||
SRIOV_COMPONENTS_NAMESPACE="sriov" | ||
SRIOV_NODE_LABEL_KEY="sriov_capable" | ||
SRIOV_NODE_LABEL_VALUE="true" | ||
SRIOV_NODE_LABEL="$SRIOV_NODE_LABEL_KEY=$SRIOV_NODE_LABEL_VALUE" | ||
SRIOVDP_RESOURCE_PREFIX="kubevirt.io" | ||
SRIOVDP_RESOURCE_NAME="sriov_net" | ||
VFS_DRIVER="vfio-pci" | ||
VFS_DRIVER_KMODULE="vfio_pci" | ||
VFS_COUNT="6" | ||
|
||
function validate_nodes_sriov_allocatable_resource() { | ||
local -r resource_name="$SRIOVDP_RESOURCE_PREFIX/$SRIOVDP_RESOURCE_NAME" | ||
local -r sriov_nodes=$(_kubectl get nodes -l $SRIOV_NODE_LABEL -o custom-columns=:.metadata.name --no-headers) | ||
|
||
local num_vfs | ||
for sriov_node in $sriov_nodes; do | ||
num_vfs=$(node::total_vfs_count "$sriov_node") | ||
sriov_components::wait_allocatable_resource "$sriov_node" "$resource_name" "$num_vfs" | ||
done | ||
} | ||
|
||
worker_nodes=($(_kubectl get nodes -l node-role.kubernetes.io/worker -o custom-columns=:.metadata.name --no-headers)) | ||
worker_nodes_count=${#worker_nodes[@]} | ||
[ "$worker_nodes_count" -eq 0 ] && echo "FATAL: no worker nodes found" >&2 && exit 1 | ||
|
||
pfs_names=($(node::discover_host_pfs)) | ||
pf_count="${#pfs_names[@]}" | ||
[ "$pf_count" -eq 0 ] && echo "FATAL: Could not find available sriov PF's" >&2 && exit 1 | ||
|
||
total_pf_required=$((worker_nodes_count*PF_COUNT_PER_NODE)) | ||
[ "$pf_count" -lt "$total_pf_required" ] && \ | ||
echo "FATAL: there are not enough PF's on the host, try to reduce PF_COUNT_PER_NODE | ||
Worker nodes count: $worker_nodes_count | ||
PF per node count: $PF_COUNT_PER_NODE | ||
Total PF count required: $total_pf_required" >&2 && exit 1 | ||
|
||
## Move SR-IOV Physical Functions to worker nodes | ||
PFS_IN_USE="" | ||
node::configure_sriov_pfs "${worker_nodes[*]}" "${pfs_names[*]}" "$PF_COUNT_PER_NODE" "PFS_IN_USE" | ||
|
||
## Create VFs and configure their drivers on each SR-IOV node | ||
node::configure_sriov_vfs "${worker_nodes[*]}" "$VFS_DRIVER" "$VFS_DRIVER_KMODULE" "$VFS_COUNT" | ||
|
||
## Deploy Multus and SRIOV components | ||
sriov_components::deploy_multus | ||
sriov_components::deploy \ | ||
"$PFS_IN_USE" \ | ||
"$VFS_DRIVER" \ | ||
"$SRIOVDP_RESOURCE_PREFIX" "$SRIOVDP_RESOURCE_NAME" \ | ||
"$SRIOV_NODE_LABEL_KEY" "$SRIOV_NODE_LABEL_VALUE" | ||
|
||
# Verify that each sriov capable node has sriov VFs allocatable resource | ||
validate_nodes_sriov_allocatable_resource | ||
sriov_components::wait_pods_ready | ||
|
||
_kubectl get nodes | ||
_kubectl get pods -n $SRIOV_COMPONENTS_NAMESPACE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
{ | ||
"Description": "DEFAULT", | ||
"UUID": "", | ||
"Version": "v0.56.9", | ||
"ResultsDir": "/tmp/sonobuoy/results", | ||
"Resources": null, | ||
"Filters": { | ||
"Namespaces": ".*", | ||
"LabelSelector": "" | ||
}, | ||
"Limits": { | ||
"PodLogs": { | ||
"Namespaces": "kube-system", | ||
"SonobuoyNamespace": true, | ||
"FieldSelectors": [], | ||
"LabelSelector": "", | ||
"Previous": false, | ||
"SinceSeconds": null, | ||
"SinceTime": null, | ||
"Timestamps": false, | ||
"TailLines": null, | ||
"LimitBytes": null | ||
} | ||
}, | ||
"QPS": 30, | ||
"Burst": 50, | ||
"Server": { | ||
"bindaddress": "0.0.0.0", | ||
"bindport": 8080, | ||
"advertiseaddress": "", | ||
"timeoutseconds": 21600 | ||
}, | ||
"Plugins": null, | ||
"PluginSearchPath": [ | ||
"./plugins.d", | ||
"/etc/sonobuoy/plugins.d", | ||
"~/sonobuoy/plugins.d" | ||
], | ||
"Namespace": "sonobuoy", | ||
"WorkerImage": "sonobuoy/sonobuoy:v0.56.9", | ||
"ImagePullPolicy": "IfNotPresent", | ||
"ImagePullSecrets": "", | ||
"AggregatorPermissions": "clusterAdmin", | ||
"ServiceAccountName": "sonobuoy-serviceaccount", | ||
"ProgressUpdatesPort": "8099", | ||
"SecurityContextMode": "nonroot" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e | ||
|
||
export CLUSTER_NAME="sriov" | ||
export HOST_PORT=5000 | ||
|
||
function print_sriov_data() { | ||
nodes=$(_get_agent_nodes) | ||
echo "STEP: Print SR-IOV data" | ||
for node in $nodes; do | ||
echo "Node: $node" | ||
echo "VFs:" | ||
${CRI_BIN} exec $node /bin/sh -c "ls -l /sys/class/net/*/device/virtfn*" | ||
echo "PFs PCI Addresses:" | ||
${CRI_BIN} exec $node /bin/sh -c "grep PCI_SLOT_NAME /sys/class/net/*/device/uevent" | ||
done | ||
echo | ||
} | ||
|
||
function print_sriov_info() { | ||
echo 'STEP: Available NICs' | ||
# print hardware info for easier debugging based on logs | ||
${CRI_BIN} run --rm --cap-add=SYS_RAWIO quay.io/phoracek/lspci@sha256:0f3cacf7098202ef284308c64e3fc0ba441871a846022bb87d65ff130c79adb1 sh -c "lspci | egrep -i 'network|ethernet'" | ||
echo | ||
} | ||
|
||
function up() { | ||
print_sriov_info | ||
k3d_up | ||
|
||
${KUBEVIRTCI_PATH}/cluster/$KUBEVIRT_PROVIDER/config_sriov_cluster.sh | ||
|
||
print_sriov_data | ||
version=$(_kubectl get node k3d-sriov-server-0 -o=custom-columns=VERSION:.status.nodeInfo.kubeletVersion --no-headers) | ||
echo "$KUBEVIRT_PROVIDER cluster '$CLUSTER_NAME' is ready ($version)" | ||
} | ||
|
||
source ${KUBEVIRTCI_PATH}/cluster/k3d/common.sh |
27 changes: 27 additions & 0 deletions
27
cluster-up/cluster/k3d-1.25-sriov/sriov-components/manifests/kustomization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
namespace: sriov | ||
resources: | ||
- sriov-ns.yaml | ||
- sriov-cni-daemonset.yaml | ||
- sriovdp-daemonset.yaml | ||
- sriovdp-config.yaml | ||
patchesJson6902: | ||
- target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-sriov-cni-ds-amd64 | ||
path: patch-node-selector.yaml | ||
- target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-sriov-device-plugin-amd64 | ||
path: patch-node-selector.yaml | ||
- target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-sriov-device-plugin-amd64 | ||
path: patch-sriovdp-resource-prefix.yaml |
14 changes: 14 additions & 0 deletions
14
cluster-up/cluster/k3d-1.25-sriov/sriov-components/manifests/multus/kustomization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
resources: | ||
- multus.yaml | ||
images: | ||
- name: ghcr.io/k8snetworkplumbingwg/multus-cni | ||
newTag: v3.8 | ||
patchesJson6902: | ||
- path: patch-args.yaml | ||
target: | ||
group: apps | ||
version: v1 | ||
kind: DaemonSet | ||
name: kube-multus-ds |
Oops, something went wrong.