Skip to content

Commit

Permalink
design doc for the externally-manage-pf support
Browse files Browse the repository at this point in the history
Signed-off-by: Sebastian Sch <[email protected]>
  • Loading branch information
SchSeba committed Jul 18, 2023
1 parent 37ddcd4 commit 81d3b31
Showing 1 changed file with 174 additions and 0 deletions.
174 changes: 174 additions & 0 deletions doc/design/externally-manage-pf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
title: Externally Manage PF
authors:
- SchSeba
reviewers:
- zeeke
- adrianchiris
creation-date: 12-07-2023
last-updated: 12-07-2023
---

# Externally Manage PF

## Summary

Allow the SR-IOV operator to configure and allocate a subset of virtual functions from a physical function not
configured by the operator

## Motivation

The feature is needed to allow the operator to only configure a subset of virtual functions.
This allows a third party component like nmstate, kubernetes-nmstate, NetworkManager to handle the creation
and the usage of the virtual functions on the system.

Before this change the SR-IOV operator is the only component that should use/configure VFs. not allowing the user
to use some of the VFs for host networking.

### Use Cases

* As a user I want to use a virtual function for SDN network
* As a user I want to create the virtual functions via nmstate
* As a user I want pods to use virtual functions from a pre-configured PF

### Goals

* Allow the SR-IOV operator to handle the configure and pod allocation of a subset of virtual functions.
* Allow the user to Allocate the number of virtual functions he wants for the system and the subset he wants for pods
* Not resetting the numOfVfs for PFs that the operator didn't configure

### Non-Goals

* Supporting switchdev mode (may change in the future if there is a request)

## Proposal

Create a flow in the SR-IOV operator where the user can request a configuration for a subset of virtual functions.

The operator will first validate and the requested PF contains the requested amount of virtual functions allocated,
Then the operator will configure the subset of virtual functions with the request driver and will update the device plugin
configmap with the expected information to create the relevant pools.

### Workflow Description

The user will allocate the virtual functions on the system with any third party tool like nmstate, Kubnernetes-nmstate,
systemd scripts, etc..

Then the user will be able to create a policy telling the operator that the PF is externally managed by the user.

Policy Example:
```yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-nic-1
namespace: sriov-network-operator
spec:
deviceType: netdevice
nicSelector:
pfNames: ["ens3f0#5-9"]
nodeSelector:
node-role.kubernetes.io/worker: ""
numVfs: 10
priority: 99
resourceName: sriov_nic_1
externallyManaged: true
```
#### Validation
The SR-IOV operator will do a validation webhook to check if the requested `numVfs` is equal to what the user allocate
if not it will reject the policy creation.

The SR-IOV operator will do a validation webhook to check if the requested MTU is equal to what exist on the PF
if not it will reject the policy creation.


*Note:* Same validation will be done in the SR-IOV config-daemon container to cover cases where the user doesn't want to deploy"
the webhook. If the verification failed in the policy apply stage
the `sriovNetworkNodeState.status.SyncStatus` field will be report a `Failed` status and the error description will
get exposed in `sriovNetworkNodeState.status.LastSyncError`


#### Configuration

The SR-IOV operator config daemon will reconcile on the SriovNetworkNodeState update and will follow the regular
flow of virtual functions *SKIPPING* only the Virtual function allocation.

The SR-IOV operator will update the SR-IOV Network Device Plugin with the pool information

Another change with the operator beavior is when we delete a policy with had `externallyManaged: true` the SR-IOV operator
will *NOT* reset the `numVfs`

### API Extensions

For SriovNetworkNodePolicy

```golang
// SriovNetworkNodePolicySpec defines the desired state of SriovNetworkNodePolicy
type SriovNetworkNodePolicySpec struct {
// SRIOV Network device plugin endpoint resource name
ResourceName string `json:"resourceName"`
// NodeSelector selects the nodes to be configured
NodeSelector map[string]string `json:"nodeSelector"`
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=99
// Priority of the policy, higher priority policies can override lower ones.
Priority int `json:"priority,omitempty"`
// +kubebuilder:validation:Minimum=1
// MTU of VF
Mtu int `json:"mtu,omitempty"`
// +kubebuilder:validation:Minimum=0
// Number of VFs for each PF
NumVfs int `json:"numVfs"`
// NicSelector selects the NICs to be configured
NicSelector SriovNetworkNicSelector `json:"nicSelector"`
// +kubebuilder:validation:Enum=netdevice;vfio-pci
// The driver type for configured VFs. Allowed value "netdevice", "vfio-pci". Defaults to netdevice.
DeviceType string `json:"deviceType,omitempty"`
// RDMA mode. Defaults to false.
IsRdma bool `json:"isRdma,omitempty"`
// mount vhost-net device. Defaults to false.
NeedVhostNet bool `json:"needVhostNet,omitempty"`
// +kubebuilder:validation:Enum=eth;ETH;ib;IB
// NIC Link Type. Allowed value "eth", "ETH", "ib", and "IB".
LinkType string `json:"linkType,omitempty"`
// +kubebuilder:validation:Enum=legacy;switchdev
// NIC Device Mode. Allowed value "legacy","switchdev".
EswitchMode string `json:"eSwitchMode,omitempty"`
// +kubebuilder:validation:Enum=virtio
// VDPA device type. Allowed value "virtio"
VdpaType string `json:"vdpaType,omitempty"`
// Exclude device's NUMA node when advertising this resource by SRIOV network device plugin. Default to false.
ExcludeTopology bool `json:"excludeTopology,omitempty"`
+ // don't create the virtual function only assign to the driver and allocated them to device plugin. Defaults to false.
+ ExternallyManaged bool `json:"externallyManaged,omitempty"`
}
```

For SriovNetworkNodeState

```golang
type Interface struct {
PciAddress string `json:"pciAddress"`
NumVfs int `json:"numVfs,omitempty"`
Mtu int `json:"mtu,omitempty"`
Name string `json:"name,omitempty"`
LinkType string `json:"linkType,omitempty"`
EswitchMode string `json:"eSwitchMode,omitempty"`
VfGroups []VfGroup `json:"vfGroups,omitempty"`
+ ExternallyManaged bool `json:"externallyManaged,omitempty"`
}
```

### Implementation Details/Notes/Constraints

### Upgrade & Downgrade considerations

The feature supports both Upgrade and Downgrade as we are introducing a new field in the API

### Test Plan

* Should not allow to create a policy if there are no vfs configured
* Should create a policy if the number of requested vfs is equal
* Should create a policy if the number of requested vfs is equal and not delete them when the policy is removed
* should reset the virtual functions if externallyCreated is false

0 comments on commit 81d3b31

Please sign in to comment.