From 6ed6bdf5674de802d34da437f1bcc48848ea9012 Mon Sep 17 00:00:00 2001 From: Pawel Palucki Date: Tue, 21 May 2024 15:03:32 +0200 Subject: [PATCH] Refactoring: - explicit values file for privileged direct method, - hide (into docs directory) "unprivileged" direct method (and fixes), - remove unnessesary mounts (mcfg, /dev/cpu/dev/mem for privileged access), - add instructions to collection methods, - fixes (extra builder) for build local development image, - silent mode - move collection methods to the top --- deployment/pcm/README.md | 57 +++++++++++----- .../docs/direct-unprivileged-deployment.md | 4 +- .../values-device-injector.yaml | 0 .../values-direct-unprivileged.yaml} | 12 +++- .../values-smarter-devices-cpu-mem.yaml | 0 deployment/pcm/templates/_helpers.tpl | 1 + deployment/pcm/templates/daemonset.yaml | 66 +++++++++++-------- deployment/pcm/values-direct-privileged.yaml | 16 +++++ deployment/pcm/values-vm.yaml | 1 + deployment/pcm/values.yaml | 17 +++-- 10 files changed, 124 insertions(+), 50 deletions(-) rename deployment/pcm/{ => docs/direct-unprivileged-examples}/values-device-injector.yaml (100%) rename deployment/pcm/{values-direct.yaml => docs/direct-unprivileged-examples/values-direct-unprivileged.yaml} (52%) rename deployment/pcm/{ => docs/direct-unprivileged-examples}/values-smarter-devices-cpu-mem.yaml (100%) create mode 100644 deployment/pcm/values-direct-privileged.yaml diff --git a/deployment/pcm/README.md b/deployment/pcm/README.md index 52fef803..7f845f9d 100644 --- a/deployment/pcm/README.md +++ b/deployment/pcm/README.md @@ -4,12 +4,21 @@ Helm chart instructions ### Features: -- Configurable as non-privileged container (value: `privileged=false` / default) and privileged container, -- Support for bare-metal and VM host configurations (files: [values-metal.yaml](values-metal.yaml), [values-vm.yaml](values-metal.yaml)), +- Configurable as non-privileged container (value: `privileged=false`, default) and privileged container, +- Support for bare-metal and VM host configurations (files: [values-metal.yaml](values-metal.yaml), [values-vm.yaml](values-vm.yaml)), - Ability to deploy multiple releases alongside configured differently to handle different kinds of machines (bare-metal, VM) at the [same time](#heterogeneous-mixed-vmmetal-instances-cluster), -- Controllable set of metrics and method of collection (RDT, uncore), support direct (msr) and indirect (Linux abstractions perf/resctrl) counter accesses (file: [values-indirect.yaml](values-indirect.yaml)). - Linux Watchdog handling (controlled with `PCM_KEEP_NMI_WATCHDOG`, `PCM_NO_AWS_WORKAROUND`, `nmiWatchdogMount` values). - Deploy to own namespace with "helm install ... **-n pcm --create-namespace**" +- Silent mode (value: `silent=false`, default) + +Here are available methods in this chart of metrics collection w.r.t interfaces and required access: + +| Method | Used interfaces | default | Notes | instructions | +|-------------------------|----------------------| ------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | +| unprivileged "indirect" | perf, resctrl | v | recommended, missing metrics: energy metrics (TODO link to issues/PR or node_exporter/rapl_collector) | `helm install . pcm` | +| privileged "indirect" | perf, resctrl | | not recommended, unsecure, no advantages over unprivileged), missing metrics: energy metrics | `helm install . pcm --set privileged=true` | +| privileged "direct" | msr | | not recommended, unsecure and requires msr module pre loaded on host | `helm install . pcm -f values-direct-privileged.yaml` | +| unprivileged "direct" | msr | | not recommended, requires msr module and access to /dev/cpu and /dev/mem (non trivial, like using 3rd plugins) | [link for detailed documentation](docs/direct-unprivileged-deployment.md) | For more information about direct/indirect collection methods please see [here](#metric-collection-methods-capabilites-vs-requirements) @@ -47,7 +56,7 @@ helm install ... --set nfd=true --set podMonitor=true ### Requirements - Full set of metrics (uncore/UPI, RDT, energy) requires bare-metal or .metal cloud instance. -- /sys/fs/resctrl has to be mounted on host OS (for default indirect deployment method), +- /sys/fs/resctrl has to be mounted on host OS (for default indirect deployment method) - pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace in other words: Pod Security Standards allow to run on privileged level, ``` @@ -78,12 +87,14 @@ More information here: https://kubernetes.io/docs/tutorials/security/ns-level-ps #### 1) (Optionally) mount resctrl filesystem (for RDT metrics) to unload "msr" kernel module for validation ``` +echo 0 > /proc/sys/kernel/perf_event_paranoid mount -t resctrl resctrl /sys/fs/resctrl ``` -For validation to verify that all metrics are available without msr, unload "msr" module from kernel: +For validation to verify that all metrics are available without msr, unload "msr" module from kernel and perf_event_paranoid has default value ``` rmmod msr +echo 2 > /proc/sys/kernel/perf_event_paranoid ``` #### 2) Create kind based Kubernetes cluster @@ -123,11 +134,24 @@ bash kind-with-registry.sh Check that resctrl is available inside kind node: ``` docker exec kind-control-plane ls /sys/fs/resctrl/info +# expected output: +# L3_MON +# MB +# ... +``` + + +and optionally local registry is running (to be used with local pcm build images, more detail [below](development-with-local-images-and-testing)) +``` +docker ps | grep kind-registry +# expected output: +# e57529be23ea registry:2 "/entrypoint.sh /etc…" 3 weeks ago Up 3 weeks 127.0.0.1:5001->5000/tcp kind-registry ``` Export kind kubeconfig as default for further kubectl commands: ``` kind export kubeconfig +kubectl get pods -A ``` #### 3) (Optionally) Deploy Node Feature Discovery (nfd) @@ -200,9 +224,9 @@ promtool query instant http://127.0.0.1:8001/api/v1/namespaces/default/services/ ### Deploy alternative options -#### Direct as privileged container +#### Direct (msr access) as privileged container ``` -helm install pcm . -f values-direct.yaml --set privileged=true +helm install pcm . -f values-direct-privileged.yaml ``` #### Homogeneous bare metal instances cluster (full set of metrics) @@ -243,14 +267,21 @@ wget https://kind.sigs.k8s.io/examples/kind-with-registry.sh bash kind-with-registry.sh ``` -2) Build docker image and upload to local registry +2) Build docker image and upload to local registry (from project root directory) ``` docker build . -t localhost:5001/pcm-local docker push localhost:5001/pcm-local -# or with single line +# optionally create buildx based builder +mkdir ~/.docker/cli-plugins +curl -sL https://github.com/docker/buildx/releases/download/v0.14.0/buildx-v0.14.0.linux-amd64 -o ~/.docker/cli-plugins/docker-buildx +chmod +x ~/.docker/cli-plugins/docker-buildx +docker buildx create --driver docker-container --name mydocker --use --bootstrap + +# or with single line (from deployment/pcm/ directory) # Build local image for tests/development + fix /pcm/resctrl mounting (assuming project was configured with cmake previously): -(cd ../.. ; (cd build ; make -j pcm pcm-sensor-server) ; docker build . -t localhost:5001/pcm-local && docker push localhost:5001/pcm-local; docker run -ti --rm --name pcmtest --entrypoint bash localhost:5001/pcm-local -c "pcm 2>&1 | head -5" ) +# Note: Warning: we're using patched Dockerfile (TODO to be removed, because "build" directory conflits with existing root "build" directory and for caching ability) +(cd ../.. ; (cd build ; make -j pcm pcm-sensor-server) ; docker build . -t localhost:5001/pcm-local && docker push localhost:5001/pcm-local) ``` 3) When deploying to kind cluster pcm use values to switch to local pcm-local image @@ -274,12 +305,8 @@ kubectl exec -ti ds/pcm -- bash kubectl logs ds/pcm ``` -#### Metric collection methods (capabilities vs requirements) +### Metric collection methods (capabilities vs requirements) -| Method | Used interfaces | default | Notes | -|---------------|------------------------------------------------------------| -------- | ------------------------------------------------------------------------------------- | -| indirect | perf, resctrl | v | missing energy metrics, | -| direct | msr | | requires msr module and access to /dev/cpu (non trivial) or privileged access | | Metrics | Available on Hardware | Available through interface | Available through method | diff --git a/deployment/pcm/docs/direct-unprivileged-deployment.md b/deployment/pcm/docs/direct-unprivileged-deployment.md index ab84dc91..fd760a17 100644 --- a/deployment/pcm/docs/direct-unprivileged-deployment.md +++ b/deployment/pcm/docs/direct-unprivileged-deployment.md @@ -28,7 +28,7 @@ helm install smarter-device-plugin --create-namespace --namespace smarter-device kubectl get node kind-control-plane -o json | jq .status.capacity # Install pcm helm chart in unprivileged mode with extraResources for cpu and memory devices. -helm install pcm . --set privileged=false -f values-direct.yaml -f values-smarter-devices-cpu-mem.yaml +helm install pcm . -f docs/direct-unprivileged-examples/values-direct-unprivileged.yaml -f docs/direct-unprivileged-examples/values-smarter-devices-cpu-mem.yaml ``` ##### b) Device injection using NRI plugin device-injection @@ -63,5 +63,5 @@ docker exec kind-control-plane systemctl restart containerd docker exec kind-control-plane systemd-run -u device-injector /device-injector -idx 10 -verbose docker exec kind-control-plane systemctl status device-injector -helm install pcm-device-injector . --set privileged=false --set hostPort= --set debugSleep=true -f values-opcm-local-image.yaml -f values-device-injector.yaml +helm install pcm . -f docs/direct-unprivileged-examples/values-direct-unprivileged.yaml -f docs/direct-unprivileged-examples/values-device-injector.yaml ``` diff --git a/deployment/pcm/values-device-injector.yaml b/deployment/pcm/docs/direct-unprivileged-examples/values-device-injector.yaml similarity index 100% rename from deployment/pcm/values-device-injector.yaml rename to deployment/pcm/docs/direct-unprivileged-examples/values-device-injector.yaml diff --git a/deployment/pcm/values-direct.yaml b/deployment/pcm/docs/direct-unprivileged-examples/values-direct-unprivileged.yaml similarity index 52% rename from deployment/pcm/values-direct.yaml rename to deployment/pcm/docs/direct-unprivileged-examples/values-direct-unprivileged.yaml index 04aa5494..c4153651 100644 --- a/deployment/pcm/values-direct.yaml +++ b/deployment/pcm/docs/direct-unprivileged-examples/values-direct-unprivileged.yaml @@ -1,9 +1,19 @@ +# Warning: this file is to be used or direct unprivilegd access which requires 3rd party plugin +# e.g. device-injector NRI or smarter-devices-cpu-mem +privileged: false + +# Swtich to using MSR PCM_NO_MSR: 0 # use MSR PCM_NO_PERF: 1 # do not use Linux perf PCM_USE_UNCORE_PERF: 0 # also use MSR for uncore PCM_NO_RDT: 0 # Collect RDT data PCM_USE_RESCTRL: 0 # using MSR (no resctrl) -resctrlHostMount: false # with MSR resctrl mount is not needed + +# RDT metrics will be used by direct msr programming +resctrlHostMount: false resctrlInsideMount: false + +# sys and pci mounts are required for uncore PMU devices discovery sysMount: true # /pcm/sys is required pciMount: true # /pcm/proc/bus/pci is required + diff --git a/deployment/pcm/values-smarter-devices-cpu-mem.yaml b/deployment/pcm/docs/direct-unprivileged-examples/values-smarter-devices-cpu-mem.yaml similarity index 100% rename from deployment/pcm/values-smarter-devices-cpu-mem.yaml rename to deployment/pcm/docs/direct-unprivileged-examples/values-smarter-devices-cpu-mem.yaml diff --git a/deployment/pcm/templates/_helpers.tpl b/deployment/pcm/templates/_helpers.tpl index 0c05b3d5..446325cc 100644 --- a/deployment/pcm/templates/_helpers.tpl +++ b/deployment/pcm/templates/_helpers.tpl @@ -62,6 +62,7 @@ securityContext: add: - SYS_ADMIN - SYS_RAWIO + #- PERFMON {{- end }} {{- end }} diff --git a/deployment/pcm/templates/daemonset.yaml b/deployment/pcm/templates/daemonset.yaml index e9f194d9..3ed2b56e 100644 --- a/deployment/pcm/templates/daemonset.yaml +++ b/deployment/pcm/templates/daemonset.yaml @@ -54,6 +54,14 @@ spec: image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" imagePullPolicy: {{ .Values.image.pullPolicy }} {{- include "pcm.securityContext" . | nindent 8 }} + {{- if .Values.silent }} + command: + - "/usr/local/bin/pcm-sensor-server" + - "-p" + - "9738" + - "-r" + - "-silent" + {{- end -}} {{- if .Values.debugSleep }} command: - /usr/bin/sleep @@ -63,7 +71,7 @@ spec: command: - /bin/bash - -c - - "/usr/local/bin/pcm 2 -r -nc -nsys" + - "/usr/local/bin/pcm 2 -r -nc -nsys{{ if .Values.silent }} -silent{{ end }}" {{- end -}} {{- if .Values.resctrlInternalMount }} # Ugly hack to mount resctrl inside only for baremetal when we want use resctrl abstraction and is not mounted on HOST: TBC conflicts with @@ -116,14 +124,14 @@ spec: protocol: TCP {{- end }} volumeMounts: - {{- if .Values.privileged }} - - mountPath: /pcm/dev/cpu - name: dev-cpu - readOnly: false - - mountPath: /pcm/dev/mem - name: dev-mem - readOnly: false - {{- end }} + # {{- if .Values.privileged }} + # - mountPath: /pcm/dev/cpu + # name: dev-cpu + # readOnly: false + # - mountPath: /pcm/dev/mem + # name: dev-mem + # readOnly: false + # {{- end }} {{- if .Values.pciMount }} - mountPath: /pcm/proc/bus/pci name: proc-pci @@ -136,26 +144,27 @@ spec: {{- if .Values.nmiWatchdogMount }} - mountPath: /pcm/proc/sys/kernel/nmi_watchdog name: nmi-watchdog - readOnly: true # RW? + readOnly: true # RW? # TODO {{- end }} {{- if .Values.resctrlHostMount }} - mountPath: /sys/fs/resctrl name: sysfs-resctrl {{- end }} - {{- if .Values.mcfgMount }} - - mountPath: /pcm/sys/firmware/acpi/tables/MCFG - name: sys-acpi - readOnly: true - {{- end }} + # TODO: to be removed, already handled by /sysMount + # {{- if .Values.mcfgMount }} + # - mountPath: /pcm/sys/firmware/acpi/tables/MCFG + # name: sys-acpi + # readOnly: true + # {{- end }} volumes: - {{- if .Values.privileged }} - - name: dev-cpu - hostPath: - path: /dev/cpu - - name: dev-mem - hostPath: - path: /dev/mem - {{- end}} + # {{- if .Values.privileged }} + # - name: dev-cpu + # hostPath: + # path: /dev/cpu + # - name: dev-mem + # hostPath: + # path: /dev/mem + # {{- end}} {{- if .Values.sysMount }} - name: sysfs hostPath: @@ -171,11 +180,12 @@ spec: hostPath: path: /proc/sys/kernel/nmi_watchdog {{- end }} - {{- if .Values.mcfgMount }} - - name: sys-acpi - hostPath: - path: /sys/firmware/acpi/tables/MCFG - {{- end }} + # TODO: to be removed, already handled by /sysMount + # {{- if .Values.mcfgMount }} + # - name: sys-acpi + # hostPath: + # path: /sys/firmware/acpi/tables/MCFG + # {{- end }} {{- if .Values.resctrlHostMount }} - name: sysfs-resctrl hostPath: diff --git a/deployment/pcm/values-direct-privileged.yaml b/deployment/pcm/values-direct-privileged.yaml new file mode 100644 index 00000000..0e587836 --- /dev/null +++ b/deployment/pcm/values-direct-privileged.yaml @@ -0,0 +1,16 @@ +#### Tunning for "direct" privilaged access +privileged: true + +# Switch PCM to use msr access always +PCM_NO_MSR: 0 # use MSR +PCM_NO_PERF: 1 # do not use Linux perf +PCM_USE_UNCORE_PERF: 0 # also use MSR for uncore +PCM_NO_RDT: 0 # Enable RDT metrics ... +PCM_USE_RESCTRL: 0 # but using MSR (no resctrl filesystem) + +# with privileged container addtional mounts aren't required +resctrlHostMount: false # with MSR resctrl mount is not needed +resctrlInsideMount: false +sysMount: false +pciMount: false +mcfgMount: false diff --git a/deployment/pcm/values-vm.yaml b/deployment/pcm/values-vm.yaml index 0c6d4139..58b7c7ce 100644 --- a/deployment/pcm/values-vm.yaml +++ b/deployment/pcm/values-vm.yaml @@ -1,5 +1,6 @@ #### ================ Tunning for VM ================ nmiWatchdogMount: true + # Disable RDT because is not avaiable for VM instances PCM_NO_RDT: 1 resctrlHostMount: false diff --git a/deployment/pcm/values.yaml b/deployment/pcm/values.yaml index 4baf3a0d..917cf243 100644 --- a/deployment/pcm/values.yaml +++ b/deployment/pcm/values.yaml @@ -18,6 +18,10 @@ imagePullSecrets: {} # Configures SecurityContext to not privileged (by default) so SYS_ADMIN/SYS_RAWIO capabilietes are required for running pod privileged: false +# Run pcm in silent mode (additional -silent argument to pcm-sensor-server binary) +# Removes some of debug outputs (like warnings about unability to open some /sys... /proc... files) +silent: false + ### -------------- Required OS affinity ------- # Should only running on linux nodeSelector: @@ -29,10 +33,15 @@ probes: false ### ================ Metrics configuration ====================== ### -------------- Metrics: Uncore ------------ -# required for uncore metrics, only in baremetal, not available for VM -mcfgMount: false -sysMount: false -pciMount: false +# Mounts section +# NOTE: only required for direct mode +# required for uncore metrics discovery and working only in baremetal, not available for VM +sysMount: false # mounts host /sys into container /pcm/sys/ +pciMount: false # mounts host /proc/bus/pci into container /pcm/proc/bus/pci/ + +# NOTE this is only required for direct unprivileged mode ?!?!?! +# TODO: to be removed!!!?!?!!?!? (already coverred sysMounts !!!!) +#mcfgMount: false # mounts hosts: /sys/firmware/acpi/tables/MCFG -> /pcm/sys/firmware/acpi/tables/MCFG ### linux Perf (indirect) vs msr(direct) # Lets try "indirect" as default