[PERFSCALE-3052] Add new core RDS workload (#125)

* Add new core RDS workload Signed-off-by: Jose Castillo Lema <[email protected]> * Move annotations from deploy to pod Signed-off-by: Jose Castillo Lema <[email protected]> * Move annotations from deploy to pod (2) Signed-off-by: Jose Castillo Lema <[email protected]> --------- Signed-off-by: Jose Castillo Lema <[email protected]> Co-authored-by: Raúl Sevilla <[email protected]> Co-authored-by: vishnuchalla <[email protected]>
kube-burner · Nov 7, 2024 · b2e4001 · b2e4001
1 parent bf01001
commit b2e4001
Show file tree

Hide file tree

Showing 19 changed files with 740 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -259,10 +259,72 @@ Pre-requisites:
 	- 12 service(15 ports each) with 8 pod endpoints, 12 service(15 ports each) with 6 pod endpoints, 12 service(15 ports each) with 5 pod endpoints
 	- 29 service(15 ports each) with 4 pod endpoints, 29 service(15 ports each) with 6 pod endpoints
 
-## Workers Scale 
+## Core RDS workloads
+
+The telco core reference design specification (RDS) describes OpenShift Container Platform clusters running on commodity hardware that can support large scale telco applications including control plane and some centralized data plane functions. It captures the recommended, tested, and supported configurations to get reliable and repeatable performance for clusters running the telco core profile.
+
+Pre-requisites:
+ - A **PerformanceProfile** with isolated and reserved cores, 1G hugepages and and `topologyPolicy=single-numa-node`. Hugepages should be allocated in the first NUMA node (the one that would be used by DPDK deployments):
+     ```yaml
+      hugepages:
+      defaultHugepagesSize: 1G
+      pages:
+      - count: 160
+        node: 0
+        size: 1G
+      - count: 6
+        node: 1
+        size: 1G
+     ```
+ - **MetalLB operator** limiting speaker pods to specific nodes (aprox. 10%, 12 in the case of 120 node iterations with the corresponding ***worker-metallb*** label):
+     ```yaml
+     apiVersion: metallb.io/v1beta1
+     kind: MetalLB
+     metadata:
+       name: metallb
+       namespace: metallb-system
+     spec:
+       nodeSelector:
+         node-role.kubernetes.io/worker-metallb: ""
+       speakerTolerations:
+       - key: "Example"
+         operator: "Exists"
+         effect: "NoExecute"
+     ```
+ - **SRIOV operator** with its corresponding *SriovNetworkNodePolicy*
+ - Some nodes (i.e.: 25% of them) with the ***worker-dpdk*** label to host the DPDK pods, i.e.:
+     ```
+     $ kubectl label node worker1 node-role.kubernetes.io/worker-dpdk=
+     ```
+
+Object count:
+| Iterations / nodes / namespaces   | 1    | 120                                 |
+| --------------------------------- | ---- | ----------------------------------- |
+| configmaps                        | 30   | 3600                                |
+| deployments_best_effort           | 25   | 3000                                |
+| deployments_dpdk                  | 2    | 240 (assuming 24 worker-dpdk nodes) |
+| endpoints (210x service)          | 4200 | 504000                              |
+| endpoints lb (90 x service)       | 90   | 10800                               |
+| networkPolicy                     | 3    | 360                                 |
+| namespaces                        | 1    | 120                                 |
+| pods_best_effort (2 x deployment) | 50   | 6000                                |
+| pods_dpdk (1 x deployment)        | 2    | 240 (assuming 24 worker-dpdk nodes) |
+| route                             | 2    | 240                                 |
+| services                          | 20   | 2400                                |
+| services (lb)                     | 1    | 120                                 |
+| secrets                           | 42   | 5040                                |
+
+
+Input parameters specific to the workload:
+| Parameter           | Description                                                                                      | Default value |
+| ------------------- | ------------------------------------------------------------------------------------------------ | ------------- |
+| dpdk-cores          | Number of cores assigned for each DPDK pod (should fill all the isolated cores of one NUMA node) | 2             |
+| performance-profile | Name of the performance profile implemented on the cluster                                       | default       |
+
+## Workers Scale
 As a day2 operation, we can use this option to scale our cluster's worker nodes to a desired count and capture their bootup times.
 
-!!! Note    
+!!! Note
 
     This is only supported for openshift clusters hosted on AWS at the moment.
 

diff --git a/cmd/config/rds-core/bgpadvertisement.yml b/cmd/config/rds-core/bgpadvertisement.yml
@@ -0,0 +1,8 @@
+apiVersion: metallb.io/v1beta1
+kind: BGPAdvertisement
+metadata:
+  name: bgpadvertisement-basic
+  namespace: metallb-system
+spec:
+  ipAddressPools:
+    - adress-pool
diff --git a/cmd/config/rds-core/bgppeer.yml b/cmd/config/rds-core/bgppeer.yml
@@ -0,0 +1,10 @@
+apiVersion: metallb.io/v1beta2
+kind: BGPPeer
+metadata:
+  namespace: metallb-system
+  name: bpg-peer
+spec:
+  peerAddress: 10.0.0.1
+  peerASN: 64501
+  myASN: 64500
+  routerID: 10.10.10.10
diff --git a/cmd/config/rds-core/configmap.yml b/cmd/config/rds-core/configmap.yml
@@ -0,0 +1,7 @@
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: {{.JobName}}-{{.Replica}}
+data:
+  key1: "{{randAlphaNum 2048}}"
diff --git a/cmd/config/rds-core/deployment-client.yml b/cmd/config/rds-core/deployment-client.yml
@@ -0,0 +1,121 @@
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: client-{{.Replica}}
+spec:
+  replicas: {{.podReplicas}}
+  selector:
+    matchLabels:
+      name: client-{{.Replica}}
+  template:
+    metadata:
+      labels:
+        name: client-{{.Replica}}
+        app: client
+    spec:
+      topologySpreadConstraints:
+      - maxSkew: 1
+        topologyKey: kubernetes.io/hostname
+        whenUnsatisfiable: ScheduleAnyway
+        labelSelector:
+          matchLabels:
+            app: client
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: node-role.kubernetes.io/worker
+                operator: Exists
+              - key: node-role.kubernetes.io/infra
+                operator: DoesNotExist
+              - key: node-role.kubernetes.io/workload
+                operator: DoesNotExist
+      containers:
+      - name: client-app
+        image: quay.io/cloud-bulldozer/curl:latest
+        command: ["sleep", "inf"]
+        resources:
+          requests:
+            memory: "10Mi"
+            cpu: "10m"
+        env:
+        imagePullPolicy: IfNotPresent
+        securityContext:
+          privileged: false
+        readinessProbe:
+          exec:
+            command:
+            - "/bin/sh"
+            - "-c"
+            - "curl --fail -sS ${SERVICE_ENDPOINT} -o /dev/null && curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null"
+          periodSeconds: 10
+          timeoutSeconds: 5
+          failureThreshold: 3
+        volumeMounts:
+        - name: secret-1
+          mountPath: /secret1
+        - name: secret-2
+          mountPath: /secret2
+        - name: secret-3
+          mountPath: /secret3
+        - name: secret-4
+          mountPath: /secret4
+        - name: configmap-1
+          mountPath: /configmap1
+        - name: configmap-2
+          mountPath: /configmap2
+        - name: configmap-3
+          mountPath: /configmap3
+        - name: configmap-4
+          mountPath: /configmap4
+        - name: podinfo
+          mountPath: /etc/podlabels
+        env:
+        - name: ENVVAR1
+          value: "{{randAlphaNum 250}}"
+        - name: ENVVAR2
+          value: "{{randAlphaNum 250}}"
+        - name: ENVVAR3
+          value: "{{randAlphaNum 250}}"
+        - name: ENVVAR4
+          value: "{{randAlphaNum 250}}"
+        - name: ROUTE_ENDPOINT
+          value: "https://rds-{{randInt 1 2}}-rds-{{.Iteration}}.{{ .ingressDomain }}/256.html"
+        - name: SERVICE_ENDPOINT
+          value: "http://rds-{{randInt 1 22}}/256.html"
+      volumes:
+      - name: secret-1
+        secret:
+          secretName: {{.JobName}}-1
+      - name: secret-2
+        secret:
+          secretName: {{.JobName}}-2
+      - name: secret-3
+        secret:
+          secretName: {{.JobName}}-3
+      - name: secret-4
+        secret:
+          secretName: {{.JobName}}-4
+      - name: configmap-1
+        configMap:
+          name: {{.JobName}}-1
+      - name: configmap-2
+        configMap:
+          name: {{.JobName}}-2
+      - name: configmap-3
+        configMap:
+          name: {{.JobName}}-3
+      - name: configmap-4
+        configMap:
+          name: {{.JobName}}-4
+      - name: podinfo
+        downwardAPI:
+          items:
+            - path: "labels"
+              fieldRef:
+                fieldPath: metadata.labels
+      restartPolicy: Always
+  strategy:
+    type: RollingUpdate
+
diff --git a/cmd/config/rds-core/deployment-dpdk.yml b/cmd/config/rds-core/deployment-dpdk.yml
@@ -0,0 +1,75 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: dpdk-{{.Replica}}
+  labels:
+    group: load
+    svc: dpdk-{{.Replica}}
+spec:
+  replicas: {{.podReplicas}}
+  selector:
+    matchLabels:
+      name: dpdk-{{.Replica}}
+  template:
+    metadata:
+      labels:
+        group: load
+        name: dpdk-{{.Replica}}
+      annotations:
+        irq-load-balancing.crio.io: "disable"
+        cpu-load-balancing.crio.io: "disable"
+        cpu-quota.crio.io: "disable"
+        k8s.v1.cni.cncf.io/networks: '[
+          { "name": "sriov-net-{{ .Iteration }}-1" },
+          { "name": "sriov-net-{{ .Iteration }}-2" }
+        ]'
+    spec:
+      runtimeClassName: performance-{{.perf_profile}}
+      containers:
+        - name: dpdk
+          image: ghcr.io/abraham2512/fedora-stress-ng:master
+          imagePullPolicy: Always
+          # Request and Limits must be identical for the Pod to be assigned to the QoS Guarantee
+          resources:
+            requests:
+              cpu: {{.dpdk_cores}}
+              memory: 1024M
+              hugepages-1Gi: 16Gi
+            limits:
+              cpu: {{.dpdk_cores}}
+              memory: 1024M
+              hugepages-1Gi: 16Gi
+          env:
+            - name: stress_cpu
+              value: "4"
+            - name: stress_vm
+              value: "1"
+            - name: stress_vm-bytes
+              value: "512M"
+          volumeMounts:
+            - mountPath: /hugepages
+              name: hugepage
+      dnsPolicy: Default
+      terminationGracePeriodSeconds: 1
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: node-role.kubernetes.io/worker-dpdk
+                operator: Exists
+      # Add not-ready/unreachable tolerations for 15 minutes so that node
+      # failure doesn't trigger pod deletion.
+      tolerations:
+        - key: "node.kubernetes.io/not-ready"
+          operator: "Exists"
+          effect: "NoExecute"
+          tolerationSeconds: 900
+        - key: "node.kubernetes.io/unreachable"
+          operator: "Exists"
+          effect: "NoExecute"
+          tolerationSeconds: 900
+      volumes:
+        - name: hugepage
+          emptyDir:
+            medium: HugePages