Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'IB VF GUID Configuration' design doc #653

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions doc/design/ib-vf-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
title: IB VF GUID Configuration
authors:
- almaslennikov
reviewers:
- ykulazhenkov
creation-date: 11-03-2024
last-updated: 13-03-2024
---

# IB VF GUID Configuration

## Summary
Allow SR-IOV Network Operator to use a static configuration file from the host filesystem to assign GUIDs to IB VFs

## Motivation
We have customers using the SR-IOV operator to create IB VFs, and they need a way to automate GUID assignment,
so that IB VFs are automatically bound to the required PKeys and no additional manual configuration is needed.
We would like SR-IOV Network Operator to configure VFs with the set of assigned guids based on provided configuration.
In this use case, the GUID configuration is static, GUIDs are generated and assigned to pkeys by the user (e.g network administrator).
Now the GUIDs are assigned by the sriov-network-config-daemon randomly.

### Use Cases

### Goals

* SR-IOV Network Operator will configure GUIDs for Infiniband VFs based on a json file on the host.

### Non-Goals

* Dynamic GUID allocation is out of scope of this proposal


### Assumptions

* Per node IB GUID configuration is static and created in advance
* Static config file is available at the same path on every host
* The contents of the config file are expected not to change throughout the host's lifecycle

## Proposal

Mount a configuration file as a part of `/host` hostPath in the config daemon and read it to retrieve the GUID configuration.

### Workflow Description

In the [`pkg/host/internal/sriov/sriov.go:configSriovVFDevices`](https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/82a6d6fdce71bd88a0d9368fb1750488e9a8e4e2/pkg/host/internal/sriov/sriov.go#L458) read from the static IB config file
and assign the GUIDs according to the configuration. Each PF is described by either PCI address
or the PF GUID and has either a list or a range of VF GUIDs.

1. A script creating the config file is deployed to the host and creates a static GUID config file
2. SR-IOV network operator reads the file and assign GUIDs when IB VFs are created
3. Users employ [PKey selector](https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/517) in sriov-network-device-plugin to create PKey-specific resource pools
3.1. Note: When a VF gets assigned with GUID, it communicates with subnet manager and receives the PKey associated with that GUID. This PKey is stored in the sysfs and device-plugin reads it to populate pools.
4. Users create an IB network CR
5. Users create pods referencing the IB network and request resources from the specific pools.

### Implementation Details/Notes/Constraints

There can be fewer VFs created than GUIDs. To persist the dynamic nature of the SR-IOV Network operator,
it’s proposed not to return an error in this case but assign as many GUIDs as possible.
To ensure that nothing breaks when users add/remove VFs, the GUID distribution order should always be the same for each individual host.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this will work with the parallel nic configuration that was implemented in the sriov operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if we synchronize the access to the GUID pool, we shouldn't face any issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this one to the test or validation section so we remember to test this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we have changed the impl so that the pool object is immutable. No additional synchronization measures are needed. Tested that on a baremetal cluster


Parallel NIC configuration should be supported by making the GUID immutable. VF id should be used to determine the correct GUID for each VF.

If there are fewer GUIDs than VFs, should fail with an error.

If the contents of the file are invalid, should fail with an error.

If config file doesn't exist in the predefined location, proceed as usual and assign random GUIDs to VFs.

### Config file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the know location of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to use /var/opt/infiniband_guids. It should be writable across different cloud platforms. Noted that in the doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have folder that is in used by the sriov operator maybe we can use that one and not create another one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's store the file in /etc/sriov-operator/infiniband/guids then


Config file can be stored at `/etc/sriov-operator/infiniband/guids`. This location is writable across most cloud platforms.

VF to GUID assignment, based on this file, is ordered. VF0 takes the GUID, VF1 takes the GUID1 etc.

Example of the config file:

```json
[
{
"pci_address": "<pci_address_1>",
"guids": [
"02:00:00:00:00:00:00:00",
"02:00:00:00:00:00:00:01"
]
},
{
"pf_guid": "<pf_guid_2>",
"guidsRange": {
"start": "02:00:00:00:00:aa:00:02",
"end": "02:00:00:00:00:aa:00:0a"
}
}
]
```

```go
type ibPfGUIDJSONConfig struct {
PciAddress string `json:"pciAddress,omitempty"`
PfGUID string `json:"pfGuid,omitempty"`
GUIDs []string `json:"guids,omitempty"`
GUIDsRange *GUIDRange `json:"guidsRange,omitempty"`
}

type GUIDRange struct {
Start string `json:"start,omitempty"`
End string `json:"end,omitempty"`
}
```

Requirements for the config file:

* `pci_adress` and `pf_guid` cannot be set at the same time for a single device - should return an error
* if the list contains multiple entries for the same device, the first one shall be taken
* `rangeStart` and `rangeEnd` are both included in the range
* `guids` list and range cannot be both set at the same time for a single device - should return an error

### Test Plan

* Unit tests will be implemented for new logic.

## Alternative solution

The alternative solution is also based on the GUID configuration file being deployed on the host.
The difference here is that GUID assignment is done on the cni level when a VF is allocated to a pod.
ib-sriov-cni manages a host-local per-PF pool of allocated/free GUIDs and dynamically allocates the next free GUID to an allocated VF.

### Workflow:

1. A script is deployed to the host and creates a static GUID config file. This step is out of scope of the operator and can also be carried out manually. The script would usually need to be custom and based on the specific GUID provisioning system in place.
2. SR-IOV network operator creates IB VFs with random GUIDs (as done now)
3. ib-sriov-cni is deployed, reads the config file and creates a cached GUID pool
4. Users create a single resource pool / per PF resource pools for IB VFs
5. Users create an IB network CR (and provide a PKey here)
6. Users create a pod requesting an IB VF
7. ib-sriov-cni allocates a next free GUID from the pool depending on the PKey and assigns it to the allocated VF

## Comparison between the two alternatives

The SR-IOV Network Operator approach:
* Easier to implement and less error-prone
* Manages the whole lifecycle of the VF (GUID is assigned at creation and never changes throughout the lifecycle)
* Operator has better visibility into the amount of configured VFs

The IB-SRIOV-CNI approach:
* Offers more flexibility (Only when a VF is requested for an IB network will it be assigned a GUID)
* Easier to maintain complex use cases
* 2 PFs on the node evenly split between 2 PKeys. The CNI approach will require 2 per-PF resource pools and 4 network attachments. The operator approach will require 4 resource pools and 4 network attachments, one for each PKey-PF pair.
Loading