Skip to content

Commit

Permalink
Merge pull request #39 from Mmduh-483/switchdev-k8s
Browse files Browse the repository at this point in the history
Support changing Eswitch mode of SR-IOV NICs for kubernetes deployment
  • Loading branch information
pliurh authored Mar 2, 2021
2 parents 80d51bf + 5067405 commit e213aa2
Show file tree
Hide file tree
Showing 21 changed files with 911 additions and 57 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ _build-%:
_plugin-%: vet
@hack/build-plugins.sh $*

plugins: _plugin-intel _plugin-mellanox _plugin-generic _plugin-virtual _plugin-mco
plugins: _plugin-intel _plugin-mellanox _plugin-generic _plugin-virtual _plugin-mco _plugin-k8s

clean:
@rm -rf $(TARGET_DIR)
Expand Down
4 changes: 4 additions & 0 deletions bindata/manifests/daemon/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ spec:
volumeMounts:
- name: host
mountPath: /host
lifecycle:
preStop:
exec:
command: ["/bindata/scripts/clean-k8s-services.sh"]
volumes:
- name: host
hostPath:
Expand Down
36 changes: 36 additions & 0 deletions bindata/scripts/clean-k8s-services.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

if [ "$CLUSTER_TYPE" == "openshift" ]; then
echo "openshift cluster"
exit
fi

chroot_path="/host"

function clean_services() {
# Remove switchdev service files
rm -f $chroot_path/etc/systemd/system/switchdev-configuration.service
rm -f $chroot_path/usr/local/bin/configure-switchdev.sh
rm -f $chroot_path/etc/switchdev.conf
rm -f $chroot_path/etc/udev/switchdev-vf-link-name.sh

# clean NetworkManager and ovs-vswitchd services
network_manager_service=$chroot_path/usr/lib/systemd/system/NetworkManager.service
ovs_service=$chroot_path/usr/lib/systemd/system/ovs-vswitchd.service

if [ -f $network_manager_service ]; then
sed -i.bak '/switchdev-configuration.service/d' $network_manager_service
fi

if [ -f $ovs_service ]; then
sed -i.bak '/hw-offload/d' $ovs_service
fi
}

clean_services
# Reload host services
chroot $chroot_path /bin/bash -c systemctl daemon-reload >/dev/null 2>&1 || true

# Restart system services
chroot $chroot_path /bin/bash -c systemctl restart NetworkManager.service >/dev/null 2>&1 || true
chroot $chroot_path /bin/bash -c systemctl restart ovs-vswitchd.service >/dev/null 2>&1 || true
4 changes: 2 additions & 2 deletions controllers/sriovoperatorconfig_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -417,11 +417,11 @@ func (r *SriovOperatorConfigReconciler) syncOffloadMachineConfig(dc *sriovnetwor
data.Data["HwOffloadNodeLabel"] = HwOffloadNodeLabel
mcName := "00-" + HwOffloadNodeLabel
mcpName := HwOffloadNodeLabel
mc, err := render.GenerateMachineConfig("bindata/manifests/machine-config", mcName, HwOffloadNodeLabel, dc.Spec.EnableOvsOffload, &data)
mc, err := render.GenerateMachineConfig("bindata/manifests/switchdev-config", mcName, HwOffloadNodeLabel, dc.Spec.EnableOvsOffload, &data)
if err != nil {
return err
}
mcpRaw, err := render.RenderTemplate("bindata/manifests/machine-config/machineconfigpool.yaml", &data)
mcpRaw, err := render.RenderTemplate("bindata/manifests/switchdev-config/machineconfigpool.yaml", &data)
if err != nil {
return err
}
Expand Down
135 changes: 135 additions & 0 deletions doc/ovs-hw-offload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# OVS Hardware Offload

The OVS software based solution is CPU intensive, affecting system performance
and preventing fully utilizing available bandwidth. OVS 2.8 and above support
a feature called OVS Hardware Offload which improves performance significantly.
This feature allows offloading the OVS data-plane to the NIC while maintaining
OVS control-plane unmodified. It is using SR-IOV technology with VF representor
host net-device. The VF representor plays the same role as TAP devices
in Para-Virtual (PV) setup. A packet sent through the VF representor on the host
arrives to the VF, and a packet sent through the VF is received by its representor.

## Supported Ethernet controllers

The following manufacturers are known to work:

- Mellanox ConnectX-5 and above

## Instructions for Mellanox ConnectX-5

## Prerequisites

- OpenVswitch installed
- Network Manager installed

### Deploy SriovNetworkNodePolicy

```yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ovs-hw-offload
namespace: sriov-network-operator
spec:
deviceType: netdevice
nicSelector:
deviceID: "1017"
rootDevices:
- 0000:02:00.0
- 0000:02:00.1
vendor: "15b3"
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 8
priority: 10
resourceName: cx5_sriov_switchdev
isRdma: true
eSwitchMode: switchdev
linkType: eth
```
### Create NetworkAttachementDefinition CRD with OVS CNI config
```yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ovs-net
annotations:
k8s.v1.cni.cncf.io/resourceName: openshift.io/cx5_sriov_switchdev
spec:
config: '{
"cniVersion": "0.3.1",
"type": "ovs",
"bridge": "br-sriov0",
"vlan": 10
}'
```
### Deploy POD with OVS hardware-offload
Create POD spec and request a VF
```yaml
apiVersion: v1
kind: Pod
metadata:
name: ovs-offload-pod1
annotations:
k8s.v1.cni.cncf.io/networks: ovs-net
spec:
containers:
- name: ovs-offload
image: networkstatic/iperf3
resources:
requests:
openshift.io/cx5_sriov_switchdev: '1'
limits:
openshift.io/cx5_sriov_switchdev: '1'
command:
- sh
- -c
- |
ls -l /dev/infiniband /sys/class/net
sleep 1000000
```
## Verify Hardware-Offloads is Working
Run iperf3 server on POD 1
```bash
kubectl exec -it ovs-offload-pod1 -- iperf3 -s
```

Run iperf3 client on POD 2

```bash
kubectl exec -it ovs-offload-pod2 -- iperf3 -c 192.168.1.17 -t 100
```

Check traffic on the VF representor port. Verify only TCP connection establishment appears

```text
tcpdump -i enp3s0f0_3 tcp
listening on enp3s0f0_3, link-type EN10MB (Ethernet), capture size 262144 bytes
22:24:44.969516 IP 192.168.1.16.43558 > 192.168.1.17.targus-getdata1: Flags [S], seq 89800743, win 64860, options [mss 1410,sackOK,TS val 491087056 ecr 0,nop,wscale 7], length 0
22:24:44.969773 IP 192.168.1.17.targus-getdata1 > 192.168.1.16.43558: Flags [S.], seq 1312764151, ack 89800744, win 64308, options [mss 1410,sackOK,TS val 4095895608 ecr 491087056,nop,wscale 7], length 0
22:24:45.085558 IP 192.168.1.16.43558 > 192.168.1.17.targus-getdata1: Flags [.], ack 1, win 507, options [nop,nop,TS val 491087222 ecr 4095895608], length 0
22:24:45.085592 IP 192.168.1.16.43558 > 192.168.1.17.targus-getdata1: Flags [P.], seq 1:38, ack 1, win 507, options [nop,nop,TS val 491087222 ecr 4095895608], length 37
22:24:45.086311 IP 192.168.1.16.43560 > 192.168.1.17.targus-getdata1: Flags [S], seq 3802331506, win 64860, options [mss 1410,sackOK,TS val 491087279 ecr 0,nop,wscale 7], length 0
22:24:45.086462 IP 192.168.1.17.targus-getdata1 > 192.168.1.16.43560: Flags [S.], seq 441940709, ack 3802331507, win 64308, options [mss 1410,sackOK,TS val 4095895725 ecr 491087279,nop,wscale 7], length 0
22:24:45.086624 IP 192.168.1.16.43560 > 192.168.1.17.targus-getdata1: Flags [.], ack 1, win 507, options [nop,nop,TS val 491087279 ecr 4095895725], length 0
22:24:45.086654 IP 192.168.1.16.43560 > 192.168.1.17.targus-getdata1: Flags [P.], seq 1:38, ack 1, win 507, options [nop,nop,TS val 491087279 ecr 4095895725], length 37
22:24:45.086715 IP 192.168.1.17.targus-getdata1 > 192.168.1.16.43560: Flags [.], ack 38, win 503, options [nop,nop,TS val 4095895725 ecr 491087279], length 0
```

Check datapath rules are offloaded

```text
ovs-appctl dpctl/dump-flows --names type=offloaded
recirc_id(0),in_port(eth0),eth(src=16:fd:c6:0b:60:52),eth_type(0x0800),ipv4(src=192.168.1.17,frag=no), packets:2235857, bytes:147599302, used:0.550s, actions:ct(zone=65520),recirc(0x18)
ct_state(+est+trk),ct_mark(0),recirc_id(0x18),in_port(eth0),eth(dst=42:66:d7:45:0d:7e),eth_type(0x0800),ipv4(dst=192.168.1.0/255.255.255.0,frag=no), packets:2235857, bytes:147599302, used:0.550s, actions:eth1
recirc_id(0),in_port(eth1),eth(src=42:66:d7:45:0d:7e),eth_type(0x0800),ipv4(src=192.168.1.16,frag=no), packets:133410141, bytes:195255745684, used:0.550s, actions:ct(zone=65520),recirc(0x16)
ct_state(+est+trk),ct_mark(0),recirc_id(0x16),in_port(eth1),eth(dst=16:fd:c6:0b:60:52),eth_type(0x0800),ipv4(dst=192.168.1.0/255.255.255.0,frag=no), packets:133410138, bytes:195255745483, used:0.550s, actions:eth0
```
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ require (
github.com/Masterminds/sprig v2.22.0+incompatible
github.com/blang/semver v3.5.0+incompatible
github.com/cenkalti/backoff v2.2.1+incompatible
github.com/coreos/go-systemd/v22 v22.0.0
github.com/fsnotify/fsnotify v1.4.9
github.com/go-logr/logr v0.2.1
github.com/go-logr/zapr v0.2.0 // indirect
Expand All @@ -29,6 +30,7 @@ require (
golang.org/x/time v0.0.0-20191024005414-555d28b269f0
google.golang.org/genproto v0.0.0-20200610104632-a5b850bcf112 // indirect
google.golang.org/protobuf v1.25.0 // indirect
gopkg.in/yaml.v2 v2.3.0
k8s.io/api v0.19.0
k8s.io/apimachinery v0.19.0
k8s.io/client-go v0.19.0
Expand Down
2 changes: 2 additions & 0 deletions pkg/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -653,6 +653,8 @@ func (dn *Daemon) loadVendorPlugins(ns *sriovnetworkv1.SriovNetworkNodeState) er
pl = registerPlugins(ns)
if utils.ClusterType == utils.ClusterTypeOpenshift {
pl = append(pl, McoPlugin)
} else {
pl = append(pl, K8sPlugin)
}
pl = append(pl, GenericPlugin)
}
Expand Down
1 change: 1 addition & 0 deletions pkg/daemon/plugin.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ const (
GenericPlugin = "generic_plugin"
VirtualPlugin = "virtual_plugin"
McoPlugin = "mco_plugin"
K8sPlugin = "k8s_plugin"
)

// loadPlugin loads a single plugin from a file path
Expand Down
Loading

0 comments on commit e213aa2

Please sign in to comment.