Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add NVIDIA Spectrum-X Operator deployment #1364

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions api/v1alpha1/nicclusterpolicy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,12 @@ type DOCATelemetryServiceSpec struct {
Config *DOCATelemetryServiceConfig `json:"config"`
}

// SpectrumXOperatorSpec describes configuration options for NVIDIA Spectrum-X Operator
type SpectrumXOperatorSpec struct {
// Image information for NVIDIA Spectrum-X Operator
ImageSpec `json:""`
}

// NicClusterPolicySpec defines the desired state of NicClusterPolicy
type NicClusterPolicySpec struct {
// OFEDDriver is a specialized driver for NVIDIA NICs which can replace the inbox driver that comes with an OS.
Expand Down Expand Up @@ -325,6 +331,9 @@ type NicClusterPolicySpec struct {
// DOCATelemetryService exposes telemetry from NVIDIA networking components to prometheus.
// See: https://docs.nvidia.com/doca/sdk/nvidia+doca+telemetry+service+guide/index.html
DOCATelemetryService *DOCATelemetryServiceSpec `json:"docaTelemetryService,omitempty"`
// DSpectrumXOperator exposes NVIDIA Spectrum-X Operator.
// See: https://github.com/Mellanox/spectrum-x-operator/
SpectrumXOperator *SpectrumXOperatorSpec `json:"spectrumXOperator,omitempty"`
// NodeAffinity rules to inject to the DaemonSets objects that are managed by the operator
NodeAffinity *v1.NodeAffinity `json:"nodeAffinity,omitempty"`
// Tolerations to inject to the DaemonSets objects that are managed by the operator
Expand Down
21 changes: 21 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

68 changes: 68 additions & 0 deletions config/crd/bases/mellanox.com_nicclusterpolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1243,6 +1243,74 @@ spec:
- version
type: object
type: object
spectrumXOperator:
description: |-
DSpectrumXOperator exposes NVIDIA Spectrum-X Operator.
See: https://github.com/Mellanox/spectrum-x-operator/
properties:
containerResources:
description: ResourceRequirements describes the compute resource
requirements
items:
description: ResourceRequirements describes the compute resource
requirements.
properties:
limits:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Limits describes the maximum amount of compute resources allowed.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
name:
description: Name of the container the requirements are
set for
type: string
requests:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Requests describes the minimum amount of compute resources required.
If Requests is omitted for a container, it defaults to Limits if that is explicitly specified,
otherwise to an implementation-defined value. Requests cannot exceed Limits.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
required:
- name
type: object
type: array
image:
description: Name of the image
pattern: '[a-zA-Z0-9\-]+'
type: string
imagePullSecrets:
default: []
description: |-
ImagePullSecrets is an optional list of references to secrets in the same
namespace to use for pulling the image
items:
type: string
type: array
repository:
description: Address of the registry that stores the image
pattern: '[a-zA-Z0-9\.\-\/]+'
type: string
version:
description: Version of the image to use
type: string
required:
- image
- repository
- version
type: object
sriovDevicePlugin:
description: |-
SriovDevicePlugin manages SRIOV through the Kubernetes device plugin framework.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1243,6 +1243,74 @@ spec:
- version
type: object
type: object
spectrumXOperator:
description: |-
DSpectrumXOperator exposes NVIDIA Spectrum-X Operator.
See: https://github.com/Mellanox/spectrum-x-operator/
properties:
containerResources:
description: ResourceRequirements describes the compute resource
requirements
items:
description: ResourceRequirements describes the compute resource
requirements.
properties:
limits:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Limits describes the maximum amount of compute resources allowed.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
name:
description: Name of the container the requirements are
set for
type: string
requests:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Requests describes the minimum amount of compute resources required.
If Requests is omitted for a container, it defaults to Limits if that is explicitly specified,
otherwise to an implementation-defined value. Requests cannot exceed Limits.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
required:
- name
type: object
type: array
image:
description: Name of the image
pattern: '[a-zA-Z0-9\-]+'
type: string
imagePullSecrets:
default: []
description: |-
ImagePullSecrets is an optional list of references to secrets in the same
namespace to use for pulling the image
items:
type: string
type: array
repository:
description: Address of the registry that stores the image
pattern: '[a-zA-Z0-9\.\-\/]+'
type: string
version:
description: Version of the image to use
type: string
required:
- image
- repository
- version
type: object
sriovDevicePlugin:
description: |-
SriovDevicePlugin manages SRIOV through the Kubernetes device plugin framework.
Expand Down
2 changes: 2 additions & 0 deletions hack/release.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ type Release struct {
NicConfigurationOperator *ReleaseImageSpec
NicConfigurationConfigDaemon *ReleaseImageSpec
MaintenanceOperator *ReleaseImageSpec
SpectrumXOperator *ReleaseImageSpec
}

// DocaDriverMatrix represent the expected DOCA-Driver OS/arch combinations
Expand Down Expand Up @@ -146,6 +147,7 @@ func readEnvironmentVariables(release *Release) {
initWithEnvVariale("NIC_CONFIGURATION_OPERATOR", release.NicConfigurationOperator)
initWithEnvVariale("NIC_CONFIGURATION_CONFIG_DAEMON", release.NicConfigurationConfigDaemon)
initWithEnvVariale("MAINTENANCE_OPERATOR", release.MaintenanceOperator)
initWithEnvVariale("SPECTRUM_X_OPERATOR", release.SpectrumXOperator)
}

func main() {
Expand Down
4 changes: 4 additions & 0 deletions hack/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,7 @@ maintenanceOperator:
image: maintenance-operator
repository: ghcr.io/mellanox
version: v0.2.0
spectrumXOperator:
image: spectrum-x-operator
repository: ghcr.io/mellanox
version: latest
79 changes: 79 additions & 0 deletions manifests/state-spectrum-x-operator/0010-spectrumx-operator.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: spectrum-x-operator
namespace: {{ .RuntimeSpec.Namespace }}
labels:
control-plane: spectrum-x-operator
app.kubernetes.io/name: spectrum-x-operator
spec:
selector:
matchLabels:
control-plane: spectrum-x-operator
replicas: 1
template:
metadata:
annotations:
kubectl.kubernetes.io/default-container: spectrum-x-operator
labels:
control-plane: spectrum-x-operator
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
securityContext:
runAsNonRoot: true
# TODO(user): For common cases that do not require escalating privileges
# it is recommended to ensure that all your Pods/Containers are restrictive.
# More info: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
# Please uncomment the following code if your project does NOT have to work on old Kubernetes
# versions < 1.19 or on vendors versions which do NOT support this field by default (i.e. Openshift < 4.11 ).
# seccompProfile:
# type: RuntimeDefault
containers:
- command:
- /manager
args:
- --leader-elect
- --health-probe-bind-address=:8081
- --cm-namespace={{ .RuntimeSpec.Namespace }}
- --cidrpools-namespace={{ .RuntimeSpec.Namespace }}
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: {{ imagePath .CrSpec.Repository .CrSpec.Image .CrSpec.Version }}
name: manager
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- "ALL"
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
# TODO(user): Configure the resources accordingly based on the project requirements.
# More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 10m
memory: 64Mi
serviceAccountName: spectrum-x-operator
terminationGracePeriodSeconds: 10
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: spectrum-x-operator
namespace: {{ .RuntimeSpec.Namespace }}
Loading
Loading