From 81d3b319ced5fdfc83ebd0f98caa130e53c8b51d Mon Sep 17 00:00:00 2001 From: Sebastian Sch Date: Wed, 12 Jul 2023 18:09:01 +0300 Subject: [PATCH] design doc for the externally-manage-pf support Signed-off-by: Sebastian Sch --- doc/design/externally-manage-pf.md | 174 +++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 doc/design/externally-manage-pf.md diff --git a/doc/design/externally-manage-pf.md b/doc/design/externally-manage-pf.md new file mode 100644 index 0000000000..10a78ea614 --- /dev/null +++ b/doc/design/externally-manage-pf.md @@ -0,0 +1,174 @@ +--- +title: Externally Manage PF +authors: + - SchSeba +reviewers: + - zeeke + - adrianchiris +creation-date: 12-07-2023 +last-updated: 12-07-2023 +--- + +# Externally Manage PF + +## Summary + +Allow the SR-IOV operator to configure and allocate a subset of virtual functions from a physical function not +configured by the operator + +## Motivation + +The feature is needed to allow the operator to only configure a subset of virtual functions. +This allows a third party component like nmstate, kubernetes-nmstate, NetworkManager to handle the creation +and the usage of the virtual functions on the system. + +Before this change the SR-IOV operator is the only component that should use/configure VFs. not allowing the user +to use some of the VFs for host networking. + +### Use Cases + +* As a user I want to use a virtual function for SDN network +* As a user I want to create the virtual functions via nmstate +* As a user I want pods to use virtual functions from a pre-configured PF + +### Goals + +* Allow the SR-IOV operator to handle the configure and pod allocation of a subset of virtual functions. +* Allow the user to Allocate the number of virtual functions he wants for the system and the subset he wants for pods +* Not resetting the numOfVfs for PFs that the operator didn't configure + +### Non-Goals + +* Supporting switchdev mode (may change in the future if there is a request) + +## Proposal + +Create a flow in the SR-IOV operator where the user can request a configuration for a subset of virtual functions. + +The operator will first validate and the requested PF contains the requested amount of virtual functions allocated, +Then the operator will configure the subset of virtual functions with the request driver and will update the device plugin +configmap with the expected information to create the relevant pools. + +### Workflow Description + +The user will allocate the virtual functions on the system with any third party tool like nmstate, Kubnernetes-nmstate, +systemd scripts, etc.. + +Then the user will be able to create a policy telling the operator that the PF is externally managed by the user. + +Policy Example: +```yaml +apiVersion: sriovnetwork.openshift.io/v1 +kind: SriovNetworkNodePolicy +metadata: + name: sriov-nic-1 + namespace: sriov-network-operator +spec: + deviceType: netdevice + nicSelector: + pfNames: ["ens3f0#5-9"] + nodeSelector: + node-role.kubernetes.io/worker: "" + numVfs: 10 + priority: 99 + resourceName: sriov_nic_1 + externallyManaged: true +``` + +#### Validation +The SR-IOV operator will do a validation webhook to check if the requested `numVfs` is equal to what the user allocate +if not it will reject the policy creation. + +The SR-IOV operator will do a validation webhook to check if the requested MTU is equal to what exist on the PF +if not it will reject the policy creation. + + +*Note:* Same validation will be done in the SR-IOV config-daemon container to cover cases where the user doesn't want to deploy" +the webhook. If the verification failed in the policy apply stage +the `sriovNetworkNodeState.status.SyncStatus` field will be report a `Failed` status and the error description will +get exposed in `sriovNetworkNodeState.status.LastSyncError` + + +#### Configuration + +The SR-IOV operator config daemon will reconcile on the SriovNetworkNodeState update and will follow the regular +flow of virtual functions *SKIPPING* only the Virtual function allocation. + +The SR-IOV operator will update the SR-IOV Network Device Plugin with the pool information + +Another change with the operator beavior is when we delete a policy with had `externallyManaged: true` the SR-IOV operator +will *NOT* reset the `numVfs` + +### API Extensions + +For SriovNetworkNodePolicy + +```golang +// SriovNetworkNodePolicySpec defines the desired state of SriovNetworkNodePolicy +type SriovNetworkNodePolicySpec struct { +// SRIOV Network device plugin endpoint resource name +ResourceName string `json:"resourceName"` +// NodeSelector selects the nodes to be configured +NodeSelector map[string]string `json:"nodeSelector"` +// +kubebuilder:validation:Minimum=0 +// +kubebuilder:validation:Maximum=99 +// Priority of the policy, higher priority policies can override lower ones. +Priority int `json:"priority,omitempty"` +// +kubebuilder:validation:Minimum=1 +// MTU of VF +Mtu int `json:"mtu,omitempty"` +// +kubebuilder:validation:Minimum=0 +// Number of VFs for each PF +NumVfs int `json:"numVfs"` +// NicSelector selects the NICs to be configured +NicSelector SriovNetworkNicSelector `json:"nicSelector"` +// +kubebuilder:validation:Enum=netdevice;vfio-pci +// The driver type for configured VFs. Allowed value "netdevice", "vfio-pci". Defaults to netdevice. +DeviceType string `json:"deviceType,omitempty"` +// RDMA mode. Defaults to false. +IsRdma bool `json:"isRdma,omitempty"` +// mount vhost-net device. Defaults to false. +NeedVhostNet bool `json:"needVhostNet,omitempty"` +// +kubebuilder:validation:Enum=eth;ETH;ib;IB +// NIC Link Type. Allowed value "eth", "ETH", "ib", and "IB". +LinkType string `json:"linkType,omitempty"` +// +kubebuilder:validation:Enum=legacy;switchdev +// NIC Device Mode. Allowed value "legacy","switchdev". +EswitchMode string `json:"eSwitchMode,omitempty"` +// +kubebuilder:validation:Enum=virtio +// VDPA device type. Allowed value "virtio" +VdpaType string `json:"vdpaType,omitempty"` +// Exclude device's NUMA node when advertising this resource by SRIOV network device plugin. Default to false. +ExcludeTopology bool `json:"excludeTopology,omitempty"` ++ // don't create the virtual function only assign to the driver and allocated them to device plugin. Defaults to false. ++ ExternallyManaged bool `json:"externallyManaged,omitempty"` +} +``` + +For SriovNetworkNodeState + +```golang +type Interface struct { +PciAddress string `json:"pciAddress"` +NumVfs int `json:"numVfs,omitempty"` +Mtu int `json:"mtu,omitempty"` +Name string `json:"name,omitempty"` +LinkType string `json:"linkType,omitempty"` +EswitchMode string `json:"eSwitchMode,omitempty"` +VfGroups []VfGroup `json:"vfGroups,omitempty"` ++ ExternallyManaged bool `json:"externallyManaged,omitempty"` +} +``` + +### Implementation Details/Notes/Constraints + +### Upgrade & Downgrade considerations + +The feature supports both Upgrade and Downgrade as we are introducing a new field in the API + +### Test Plan + +* Should not allow to create a policy if there are no vfs configured +* Should create a policy if the number of requested vfs is equal +* Should create a policy if the number of requested vfs is equal and not delete them when the policy is removed +* should reset the virtual functions if externallyCreated is false \ No newline at end of file