Skip to content

Commit

Permalink
cosmetics
Browse files Browse the repository at this point in the history
Signed-off-by: Or Shoval <[email protected]>
  • Loading branch information
oshoval committed Feb 28, 2023
1 parent bdeaf67 commit 0738126
Show file tree
Hide file tree
Showing 9 changed files with 138 additions and 207 deletions.
103 changes: 30 additions & 73 deletions cluster-up/cluster/k3d-1.25-sriov/README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,58 @@
# K8S 1.23.13 with SR-IOV in a Kind cluster
# K8s 1.25.x with SR-IOV in a K3d cluster

Provides a pre-deployed containerized k8s cluster with version 1.23.13 that runs
using [KinD](https://github.com/kubernetes-sigs/kind)
Provides a pre-deployed containerized k8s cluster with version 1.25.x that runs
using [K3d](https://github.com/k3d-io/k3d)
The cluster is completely ephemeral and is recreated on every cluster restart. The KubeVirt containers are built on the
local machine and are then pushed to a registry which is exposed at
`localhost:5000`.
`127.0.0.1:5000`.

This version also expects to have SR-IOV enabled nics (SR-IOV Physical Function) on the current host, and will move
physical interfaces into the `KinD`'s cluster worker node(s) so that they can be used through multus and SR-IOV
This version requires to have SR-IOV enabled nics (SR-IOV Physical Function) on the current host, and will move
physical interfaces into the `K3d`'s cluster agent node(s) so that they can be used through multus and SR-IOV
components.

This providers also deploys [multus](https://github.com/k8snetworkplumbingwg/multus-cni)
This provider also deploys [multus](https://github.com/k8snetworkplumbingwg/multus-cni)
, [sriov-cni](https://github.com/k8snetworkplumbingwg/sriov-cni)
and [sriov-device-plugin](https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin).

## Bringing the cluster up

```bash
export KUBEVIRT_PROVIDER=kind-1.23-sriov
export KUBEVIRT_NUM_NODES=3
export KUBEVIRT_PROVIDER=k3d-1.25-sriov
export KUBECONFIG=$(realpath _ci-configs/k3d-1.25-sriov/.kubeconfig)
make cluster-up

$ cluster-up/kubectl.sh get nodes
NAME STATUS ROLES AGE VERSION
sriov-control-plane Ready control-plane,master 20h v1.23.13
sriov-worker Ready worker 20h v1.23.13
sriov-worker2 Ready worker 20h v1.23.13
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k3d-sriov-server-0 Ready control-plane,master 67m v1.25.6+k3s1
k3d-sriov-agent-0 Ready worker 67m v1.25.6+k3s1
k3d-sriov-agent-1 Ready worker 67m v1.25.6+k3s1

$ cluster-up/kubectl.sh get pods -n kube-system -l app=multus
NAME READY STATUS RESTARTS AGE
kube-multus-ds-amd64-d45n4 1/1 Running 0 20h
kube-multus-ds-amd64-g26xh 1/1 Running 0 20h
kube-multus-ds-amd64-mfh7c 1/1 Running 0 20h
$ kubectl get pods -n kube-system -l app=multus
NAME READY STATUS RESTARTS AGE
kube-multus-ds-z9hvs 1/1 Running 0 66m
kube-multus-ds-7shgv 1/1 Running 0 66m
kube-multus-ds-l49xj 1/1 Running 0 66m

$ cluster-up/kubectl.sh get pods -n sriov -l app=sriov-cni
$ kubectl get pods -n sriov -l app=sriov-cni
NAME READY STATUS RESTARTS AGE
kube-sriov-cni-ds-amd64-fv5cr 1/1 Running 0 20h
kube-sriov-cni-ds-amd64-q95q9 1/1 Running 0 20h
kube-sriov-cni-ds-amd64-4pndd 1/1 Running 0 66m
kube-sriov-cni-ds-amd64-68nhh 1/1 Running 0 65m

$ cluster-up/kubectl.sh get pods -n sriov -l app=sriovdp
$ kubectl get pods -n sriov -l app=sriovdp
NAME READY STATUS RESTARTS AGE
kube-sriov-device-plugin-amd64-h7h84 1/1 Running 0 20h
kube-sriov-device-plugin-amd64-xrr5z 1/1 Running 0 20h
kube-sriov-device-plugin-amd64-qk66v 1/1 Running 0 66m
kube-sriov-device-plugin-amd64-d5r5b 1/1 Running 0 65m
```

## Bringing the cluster down

```bash
export KUBEVIRT_PROVIDER=kind-1.23-sriov
export KUBEVIRT_PROVIDER=k3d-1.25-sriov
make cluster-down
```

This destroys the whole cluster, and moves the SR-IOV nics to the root network namespace.
This destroys the whole cluster, and gracefully moves the SR-IOV nics to the root network namespace.

## Setting a custom kind version

In order to use a custom kind image / kind version, export `KIND_NODE_IMAGE`, `KIND_VERSION`, `KUBECTL_PATH` before
running cluster-up. For example in order to use kind 0.9.0 (which is based on k8s-1.19.1) use:

```bash
export KIND_NODE_IMAGE="kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600"
export KIND_VERSION="0.9.0"
export KUBECTL_PATH="/usr/bin/kubectl"
```

This allows users to test or use custom images / different kind versions before making them official.
See https://github.com/kubernetes-sigs/kind/releases for details about node images according to the kind version.

## Running multi SR-IOV clusters locally

Kubevirtci SR-IOV provider supports running two clusters side by side with few known limitations.

General considerations:

- A SR-IOV PF must be available for each cluster. In order to achieve that, there are two options:

1. Assign just one PF for each worker node of each cluster by using `export PF_COUNT_PER_NODE=1` (this is the default
value).
2. Optional method: `export PF_BLACKLIST=<PF names>` the non used PFs, in order to prevent them from being allocated to
the current cluster. The user can list the PFs that should not be allocated to the current cluster, keeping in mind
that at least one (or 2 in case of migration), should not be listed, so they would be allocated for the current
cluster. Note: another reason to blacklist a PF, is in case its has a defect or should be kept for other operations (
for example sniffing).

- Clusters should be created one by another and not in parallel (to avoid races over SR-IOV PF's).
- The cluster names must be different. This can be achieved by setting `export CLUSTER_NAME=sriov2` on the 2nd cluster.
The default `CLUSTER_NAME` is `sriov`. The 2nd cluster registry would be exposed at `localhost:5001` automatically,
once the `CLUSTER_NAME`
is set to a non default value.
- Each cluster should be created on its own git clone folder, i.e:
`/root/project/kubevirtci1`
`/root/project/kubevirtci2`
In order to switch between them, change dir to that folder and set the env variables `KUBECONFIG`
and `KUBEVIRT_PROVIDER`.
- In case only one PF exists, for example if running on prow which will assign only one PF per job in its own DinD,
Kubevirtci is agnostic and nothing needs to be done, since all conditions above are met.
- Upper limit of the number of clusters that can be run on the same time equals number of PFs / number of PFs per
cluster, therefore, in case there is only one PF, only one cluster can be created. Locally the actual limit currently
supported is two clusters.
- In order to use `make cluster-down` please make sure the right `CLUSTER_NAME` is exported.
Note: killing the containers / cluster without gracefully moving the nics to the root ns before it,
might result in unreachable nics for few minutes.
`find /sys/class/net/*/device/sriov_numvfs` can be used to see when the nics are reachable again.
20 changes: 9 additions & 11 deletions cluster-up/cluster/k3d-1.25-sriov/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# How to troubleshoot a failing kind job
# How to troubleshoot a failing k3d job

If logging and output artifacts are not enough, there is a way to connect to a running CI pod and troubleshoot directly from there.

Expand All @@ -16,14 +16,14 @@ Just `go get` it by running `go get k8s.io/test-infra/prow/cmd/mkpj`
Then run the following command from a checkout of the [project-infra repo](https://github.com/kubevirt/project-infra):

```bash
mkpj --pull-number $KUBEVIRTPRNUMBER -job pull-kubevirt-e2e-kind-k8s-sriov-1.17.0 -job-config-path github/ci/prow/files/jobs/kubevirt/kubevirt-presubmits.yaml --config-path github/ci/prow/files/config.yaml > debugkind.yaml
mkpj --pull-number $KUBEVIRT_PR_NUMBER -job pull-kubevirt-e2e-k3d-1.25-sriov -job-config-path github/ci/prow/files/jobs/kubevirt/kubevirt-presubmits.yaml --config-path github/ci/prow/files/config.yaml > debugkind.yaml
```

You will end up having a ProwJob manifest in the `debugkind.yaml` file.

It's strongly recommended to replace the job's name, as it will be easier to find and debug the relative pod, by replacing `metadata.name` with something more recognizeable.

The $KUBEVIRTPRNUMBER can be an actual PR on the [kubevirt repo](https://github.com/kubevirt/kubevirt).
The `$KUBEVIRT_PR_NUMBER` can be an actual PR on the [kubevirt repo](https://github.com/kubevirt/kubevirt).

In case we just want to debug the cluster provided by the CI, it's recommended to override the entry point, either in the test PR we are instrumenting (a good sample can be found [here](https://github.com/kubevirt/kubevirt/pull/3022)), or by overriding the entry point directly in the prow job's manifest.

Expand All @@ -32,29 +32,27 @@ Remember that we want the cluster long living, so a long sleep must be provided
Make sure you switch to the `kubevirt-prow-jobs` project, and apply the manifest:

```bash
kubectl apply -f debugkind.yaml
kubectl apply -f debugkind.yaml
```

You will end up with a ProwJob object, and a pod with the same name you gave to the ProwJob.

Once the pod is up & running, connect to it via bash:

```bash
kubectl exec -it debugprowjobpod bash
kubectl exec -it debugprowjobpod bash
```

### Logistics

Once you are in the pod, you'll be able to troubleshoot what's happening in the environment CI is running its tests.

Run the follow to bring up a [kind](https://github.com/kubernetes-sigs/kind) cluster with a single node setup and the SR-IOV operator already setup to go (if it wasn't already done by the job itself).
Run the follow to bring up a [k3d](https://github.com/k3d-io/k3d) cluster with SR-IOV installed.

```bash
KUBEVIRT_PROVIDER=kind-k8s-sriov-1.17.0 make cluster-up
KUBEVIRT_PROVIDER=k3d-1.25-sriov make cluster-up
```

The kubeconfig file will be available under `/root/.kube/kind-config-sriov`.

Use `k3d kubeconfig print sriov` to extract the kubeconfig file.
The `kubectl` binary is already on board and in `$PATH`.

The container acting as node is the one named `sriov-control-plane`. You can even see what's in there by running `docker exec -it sriov-control-plane bash`.
See `README.md` for more info.
2 changes: 1 addition & 1 deletion cluster-up/cluster/k3d-1.25-sriov/config_sriov_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ set -xe

PF_COUNT_PER_NODE=${PF_COUNT_PER_NODE:-1}
[ $PF_COUNT_PER_NODE -le 0 ] && echo "FATAL: PF_COUNT_PER_NODE must be a positive integer" >&2 && exit 1
[ $PF_COUNT_PER_NODE != 1 ] && echo "FATAL: only 1 PF per node is supported for now" >&2 && exit 1

SCRIPT_PATH=$(dirname "$(realpath "$0")")

Expand Down Expand Up @@ -67,7 +68,6 @@ sriov_components::deploy \

# Verify that each sriov capable node has sriov VFs allocatable resource
validate_nodes_sriov_allocatable_resource

sriov_components::wait_pods_ready

_kubectl get nodes
Expand Down
57 changes: 14 additions & 43 deletions cluster-up/cluster/k3d-1.25-sriov/provider.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,67 +2,38 @@

set -e

DEFAULT_CLUSTER_NAME="sriov"
DEFAULT_HOST_PORT=5000
ALTERNATE_HOST_PORT=5001
export CLUSTER_NAME=${CLUSTER_NAME:-$DEFAULT_CLUSTER_NAME}

if [ $CLUSTER_NAME == $DEFAULT_CLUSTER_NAME ]; then
export HOST_PORT=$DEFAULT_HOST_PORT
else
export HOST_PORT=$ALTERNATE_HOST_PORT
fi
export CLUSTER_NAME="sriov"
export HOST_PORT=5000

function print_sriov_data() {
nodes=$(_kubectl get nodes -o=custom-columns=:.metadata.name --no-headers)
nodes=$(_get_agent_nodes)
echo "STEP: Print SR-IOV data"
for node in $nodes; do
if [[ ! "$node" =~ .*"server".* ]]; then
echo "Node: $node"
echo "VFs:"
${CRI_BIN} exec $node /bin/sh -c "ls -l /sys/class/net/*/device/virtfn*"
echo "PFs PCI Addresses:"
${CRI_BIN} exec $node /bin/sh -c "grep PCI_SLOT_NAME /sys/class/net/*/device/uevent"
fi
echo "Node: $node"
echo "VFs:"
${CRI_BIN} exec $node /bin/sh -c "ls -l /sys/class/net/*/device/virtfn*"
echo "PFs PCI Addresses:"
${CRI_BIN} exec $node /bin/sh -c "grep PCI_SLOT_NAME /sys/class/net/*/device/uevent"
done
}

# ADAPT
function configure_registry_proxy() {
[ "$CI" != "true" ] && return

echo "Configuring cluster nodes to work with CI mirror-proxy..."

local -r ci_proxy_hostname="docker-mirror-proxy.kubevirt-prow.svc"
local -r kind_binary_path="${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/.kind"
local -r configure_registry_proxy_script="${KUBEVIRTCI_PATH}/cluster/kind/configure-registry-proxy.sh"

KIND_BIN="$kind_binary_path" PROXY_HOSTNAME="$ci_proxy_hostname" $configure_registry_proxy_script
echo
}

function print_sriov_info() {
echo 'STEP: Available NICs'
# print hardware info for easier debugging based on logs
echo 'Available NICs'
${CRI_BIN} run --rm --cap-add=SYS_RAWIO quay.io/phoracek/lspci@sha256:0f3cacf7098202ef284308c64e3fc0ba441871a846022bb87d65ff130c79adb1 sh -c "lspci | egrep -i 'network|ethernet'"
echo ""
echo
}

function up() {
print_sriov_info

k3d_up

# REMOVE
# echo BYE
# exit 0

# TODO add vfio mount
# add machine-id per server
# add workers maybe

${KUBEVIRTCI_PATH}/cluster/$KUBEVIRT_PROVIDER/config_sriov_cluster.sh

print_sriov_data
echo "$KUBEVIRT_PROVIDER cluster '$CLUSTER_NAME' is ready"
version=$(_kubectl get node k3d-sriov-server-0 -o=custom-columns=VERSION:.status.nodeInfo.kubeletVersion --no-headers)
echo "$KUBEVIRT_PROVIDER cluster '$CLUSTER_NAME' is ready ($version)"
}

source ${KUBEVIRTCI_PATH}/cluster/k3d/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ PATCH_NODE_SELECTOR_TEMPLATE="${MANIFESTS_DIR}/patch-node-selector.yaml.in"
PATCH_NODE_SELECTOR="${CUSTOM_MANIFESTS}/patch-node-selector.yaml"

KUBECONFIG="${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/.kubeconfig"
#KUBECTL="${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/.kubectl --kubeconfig=${KUBECONFIG}"

function _kubectl() {
export KUBECONFIG=${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/.kubeconfig
Expand Down
38 changes: 14 additions & 24 deletions cluster-up/cluster/k3d-1.25-sriov/sriov-node/configure_vfs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,6 @@ function validate_run_with_sudo() {
return 0
}

function validate_sysfs_mount_as_rw() {
local sysfs_permissions=$(grep -Po 'sysfs.*\K(ro|rw)' /proc/mounts)
[ "$sysfs_permissions" != rw ] && echo "FATAL: sysfs is read-only, try to remount as RW" >&2 && return 1

return 0
}

function ensure_driver_is_loaded() {
local driver_name=$1
local module_name=$2
Expand All @@ -81,23 +74,20 @@ VFS_COUNT=${VFS_COUNT:-6}
[ $((VFS_COUNT)) -lt 1 ] && echo "INFO: VFS_COUNT is lower then 1, nothing to do..." && exit 0

validate_run_with_sudo
#validate_sysfs_mount_as_rw
ensure_driver_is_loaded $DRIVER $DRIVER_KMODULE

# If more than one PF per node is required, change this code to support multi PFs
sriov_pf=$(find /sys/class/net/*/device/sriov_numvfs)
#[ "${#sriov_pfs[@]}" -eq 0 ] && echo "FATAL: Could not find available sriov PFs" >&2 && exit 1

#for pf_name in $sriov_pfs; do
pf_device=$(dirname "$sriov_pf")

echo "Create VF's"
create_vfs "$pf_device" "$VFS_COUNT"

echo "Configuring VF's drivers"
# /sys/class/net/<pf name>/device/virtfn*
vfs_sys_devices=$(readlink -e $pf_device/virtfn*)
for vf in $vfs_sys_devices; do
configure_vf_driver "$vf" $DRIVER
ls -l "$vf/driver"
done
#done
pf_device=$(dirname "$sriov_pf")

echo "Create VF's"
create_vfs "$pf_device" "$VFS_COUNT"

echo "Configuring VF's drivers"
# /sys/class/net/<pf name>/device/virtfn*
vfs_sys_devices=$(readlink -e $pf_device/virtfn*)
for vf in $vfs_sys_devices; do
configure_vf_driver "$vf" $DRIVER
ls -l "$vf/driver"
done

2 changes: 1 addition & 1 deletion cluster-up/cluster/k3d-1.25-sriov/sriov-node/node.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ function node::configure_sriov_pfs() {
pfs_array_offset=$((pfs_array_offset + pf_count_per_node))
pfs_in_use+=( $pf_name )

# KIND mounts sysfs as read-only by default, remount as R/W"
node_exec="${CRI_BIN} exec $node"
# /sys is already rw on k3d but nice to have anyhow
$node_exec mount -o remount,rw /sys

ls_node_dev_vfio="${node_exec} ls -la -Z /dev/vfio"
Expand Down
Loading

0 comments on commit 0738126

Please sign in to comment.