Skip to content

Commit

Permalink
address jcfunk comments: interval and extra labels for PodMonitor + r…
Browse files Browse the repository at this point in the history
…efactor readme
  • Loading branch information
ppalucki committed May 8, 2024
1 parent f2747c7 commit 7f2c707
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 67 deletions.
77 changes: 11 additions & 66 deletions deployment/pcm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ helm install prometheus prometheus-community/kube-prometheus-stack --set prometh
kubectl get sts prometheus-prometheus-kube-prometheus-prometheus
```

Note: `podMonitorSelectorNilUsesHelmValues` is disabled (set to false) so Prometheus operator will be able to handle PCM podMonitor deployed without extra `podMonitorLabels` or otherwise pcm need to be deployed like this:
`helm install pcm . --set podMonitor=true --set podMonitorLabels.release=prometheus` (assuming Prometheus operator was deployed as "prometheus")

#### 5) Deploy PCM helm chart

```
Expand Down Expand Up @@ -217,72 +220,14 @@ helm install pcm-metal . -f values-metal.yaml

#### Direct method as non-privileged container (not recommended)

**TODO**: TO BE MOVED TO EXTERNAL FILE/SECTION

**Note** PCM requires access to /dev/cpu device in read writer mode (MSR access) but it is no possible currently to mount devices in Kubernetes pods/containers in vanila Kubernetes. Please read this isses for more information https://github.com/kubernetes/kubernetes/issues/5607.

##### a) Device injection using 3rd party device-plugin


TO run PCM with as non privileged pod, we can third party devices plugins e.g.:

- https://github.com/smarter-project/smarter-device-manager
- https://github.com/squat/generic-device-plugin
- https://github.com/everpeace/k8s-host-device-plugin

**Warning** This plugins were NOT audited for security concerns, **use it at your own risk**.

Below is example how to pass /dev/cpu and /dev/mem using smarter-device-manager in kind based Kubernetes test cluster.

```
# Label node to deploy device plugin on that node
kubectl label node kind-control-plane smarter-device-manager=enabled
# Install "smarter-device-manager" device plugin with only /dev/cpu and /dev/mem devices enabled:
git clone https://github.com/smarter-project/smarter-device-manager
helm install smarter-device-plugin --create-namespace --namespace smarter-device-plugin smarter-device-manager/charts/smarter-device-manager --set 'config[0].devicematch=^cpu$' --set 'config[0].nummaxdevices=1' --set 'config[1].devicematch=^mem$' --set 'config[1].nummaxdevices=1'
# Check that cpu and mem devices are available - should return "1"
kubectl get node kind-control-plane -o json | jq .status.capacity
**Note** PCM requires access to /dev/cpu device in read-write mode (MSR access) but it is no possible currently to mount devices in Kubernetes pods/containers in vanilla Kubernetes for unprivileged containers. Please find more about this limitation https://github.com/kubernetes/kubernetes/issues/5607.

# Install pcm helm chart in unprivileged mode with extraResources for cpu and memory devices.
helm install pcm . --set privileged=false -f values-direct.yaml -f values-smarter-devices-cpu-mem.yaml
```
To expose necessary devices to pcm-sensor-server, one can use:

##### b) Device injection using NRI plugin device-injection
a) Kubernetes device plugin (using Kubernetes [CDI](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/) interface),
b) containerd plugin (using [NRI](https://github.com/containerd/nri/) interface),

**TODO**: **Warning** This is work in progress, because it is needed to manually specific all /dev/cpu/XX/msr devices, which is unpractical in production (TO BE MOVED TO EXTERNAL FILE).

```
git clone https://github.com/containerd/nri/
(cd nri/plugins/device-injector/ && go build )
docker cp kind-control-plane:/etc/containerd/config.toml config.toml
cat >>config.toml <<EOF
[plugins."io.containerd.nri.v1.nri"]
# Disable NRI support in containerd.
disable = false
# Allow connections from externally launched NRI plugins.
disable_connections = false
# plugin_config_path is the directory to search for plugin-specific configuration.
plugin_config_path = "/etc/nri/conf.d"
# plugin_path is the directory to search for plugins to launch on startup.
plugin_path = "/opt/nri/plugins"
# plugin_registration_timeout is the timeout for a plugin to register after connection.
plugin_registration_timeout = "5s"
# plugin_requst_timeout is the timeout for a plugin to handle an event/request.
plugin_request_timeout = "2s"
# socket_path is the path of the NRI socket to create for plugins to connect to.
socket_path = "/var/run/nri/nri.sock"
EOF
docker cp config.toml kind-control-plane:/etc/containerd/config.toml
docker exec kind-control-plane systemctl restart containerd
docker exec kind-control-plane systemd-run -u device-injector /device-injector -idx 10 -verbose
docker exec kind-control-plane systemctl status device-injector
helm install pcm-device-injector . --set privileged=false --set hostPort= --set debugSleep=true -f values-opcm-local-image.yaml -f values-device-injector.yaml
```
Examples can be find [here](docs/direct-unprivileged-deployment.md).

#### Development (with local images) and testing

Expand Down Expand Up @@ -313,17 +258,17 @@ helm upgrade --install pcm . --set debugPcm=true
helm upgrade --install pcm . --set debugSleep=true
```

**TODO:** consiert debug options to be removed before release for security reasons
**TODO:** consider debug options to be removed before release for security reasons

5) Check logs or intercat with container directly:
5) Check logs or interact with container directly:
```
# exec into pcm container
kubectl exec -ti ds/pcm -- bash
# or check logs
kubectl logs ds/pcm
```

#### Metric collection methods (capabilites vs requirements)
#### Metric collection methods (capabilities vs requirements)

| Method | Used interfaces | default | Notes |
|---------------|------------------------------------------------------------| -------- | ------------------------------------------------------------------------------------- |
Expand Down
67 changes: 67 additions & 0 deletions deployment/pcm/docs/direct-unprivileged-deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
--------------------------------------------------------------------------------
Examples of deploying with direct MSR access as non-privileged container
--------------------------------------------------------------------------------

#### Direct method as non-privileged container (not recommended)

##### a) Device injection using 3rd party device-plugin

TO run PCM with as non privileged pod, we can third party devices plugins e.g.:

- https://github.com/smarter-project/smarter-device-manager
- https://github.com/squat/generic-device-plugin
- https://github.com/everpeace/k8s-host-device-plugin

**Warning** This plugins were NOT audited for security concerns, **use it at your own risk**.

Below is example how to pass /dev/cpu and /dev/mem using smarter-device-manager in kind based Kubernetes test cluster.

```
# Label node to deploy device plugin on that node
kubectl label node kind-control-plane smarter-device-manager=enabled
# Install "smarter-device-manager" device plugin with only /dev/cpu and /dev/mem devices enabled:
git clone https://github.com/smarter-project/smarter-device-manager
helm install smarter-device-plugin --create-namespace --namespace smarter-device-plugin smarter-device-manager/charts/smarter-device-manager --set 'config[0].devicematch=^cpu$' --set 'config[0].nummaxdevices=1' --set 'config[1].devicematch=^mem$' --set 'config[1].nummaxdevices=1'
# Check that cpu and mem devices are available - should return "1"
kubectl get node kind-control-plane -o json | jq .status.capacity
# Install pcm helm chart in unprivileged mode with extraResources for cpu and memory devices.
helm install pcm . --set privileged=false -f values-direct.yaml -f values-smarter-devices-cpu-mem.yaml
```

##### b) Device injection using NRI plugin device-injection

**TODO**: **Warning** This is work in progress, because it is needed to manually specific all /dev/cpu/XX/msr devices, which is unpractical in production (TO BE MOVED TO EXTERNAL FILE).

```
git clone https://github.com/containerd/nri/
(cd nri/plugins/device-injector/ && go build )
docker cp kind-control-plane:/etc/containerd/config.toml config.toml
cat >>config.toml <<EOF
[plugins."io.containerd.nri.v1.nri"]
# Disable NRI support in containerd.
disable = false
# Allow connections from externally launched NRI plugins.
disable_connections = false
# plugin_config_path is the directory to search for plugin-specific configuration.
plugin_config_path = "/etc/nri/conf.d"
# plugin_path is the directory to search for plugins to launch on startup.
plugin_path = "/opt/nri/plugins"
# plugin_registration_timeout is the timeout for a plugin to register after connection.
plugin_registration_timeout = "5s"
# plugin_requst_timeout is the timeout for a plugin to handle an event/request.
plugin_request_timeout = "2s"
# socket_path is the path of the NRI socket to create for plugins to connect to.
socket_path = "/var/run/nri/nri.sock"
EOF
docker cp config.toml kind-control-plane:/etc/containerd/config.toml
docker exec kind-control-plane systemctl restart containerd
docker exec kind-control-plane systemd-run -u device-injector /device-injector -idx 10 -verbose
docker exec kind-control-plane systemctl status device-injector
helm install pcm-device-injector . --set privileged=false --set hostPort= --set debugSleep=true -f values-opcm-local-image.yaml -f values-device-injector.yaml
```
5 changes: 4 additions & 1 deletion deployment/pcm/templates/podmonitor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ metadata:
{{- include "pcm.labels" . | nindent 4 }}
app.kubernetes.io/component: metrics
jobLabel: pcm
{{- with .Values.podMonitorLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
attachMetadata:
node: true
Expand All @@ -24,7 +27,7 @@ spec:
honorTimestamps: true
path: /metrics
port: pcm-metrics
interval: 1s
interval: {{ .Values.podMonitorInterval | quote }}
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
Expand Down
9 changes: 9 additions & 0 deletions deployment/pcm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,15 @@ extraResources: {}
hostPort: 9738
# Deploy PromtheusOperator PodMonitor (requires hostPort to be not empty)
podMonitor: false
# Extra PodMonitor labels to let Prometheus operator filter based on that
# e.g. default "kube-prometheus-stack" helm chart requires additional release:"{name of chart release}" label in podMonitor to be considered
# here is example how to check extra labels required to be added to PodMonitor
# 1) kubectl get prometheus -o jsonpath='{.items[].spec.podMonitorSelector.matchLabels}' # e.g. release: prometheus
# 2) helm install pcm . --set podMonitor=true --set podMonitorLabels.release=prometheus
podMonitorLabels: {}
# Default interval for Prometheus scrapping configuration
podMonitorInterval: 30s


### -------------- NRI balloons policy plugin -------------
# PCM deployment to be intergrated with NRI balloons resource policy intergration
Expand Down

0 comments on commit 7f2c707

Please sign in to comment.