Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on resizing prometheus' disk #4886

Merged
merged 1 commit into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ sre-guide/support/index.md
sre-guide/manage-k8s/index.md
sre-guide/node-scale-up/index.md
sre-guide/common-problems-solutions.md
sre-guide/prometheus-disk-resize.md
```

(hub-deployment-guide)=
Expand Down
103 changes: 103 additions & 0 deletions docs/sre-guide/prometheus-disk-resize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Resizing Prometheus' disk

We may need to resize Prometheus' disk that collects metrics data as we store more and more data.

On GCP clusters, the storage classes are set by default to permit auto-expansion.
Therefore, simply defining a new persistent volume size in the support chart values and redeploying it, should suffice.
However, this may not be the case on other cloud providers.
The below steps will walk you through resizing the disk.

## Resizing the disk

```bash
# Set the KUBE_EDITOR env var to point to a text editor you're comfortable with
export KUBE_EDITOR="/usr/bin/nano"

# Set the name of the cluster to work against
export CLUSTER_NAME=...

# Authenticate against the cluster
deployer use-cluster-credentials $CLUSTER_NAME
```

1. Set the desired size of the Prometheus server persistent volume in the relevant `support.values.yaml` file.

```yaml
prometheus:
server:
persistentVolume:
size: <desired-size>
```
1. Check the reclaim policy on the persistent volume.
```bash
# List all the PVs. They are not namespaced.
kubectl get pv
```

1. Edit persistent volume's reclaim policy to be `Retain` if it is not already.
This will prevent us from losing the data Prometheus has already collected.

```bash
kubectl edit pv <pv-name>
```

1. Check the value of `ALLOWVOLUMEEXPANSION` of the default storage class, identified by `(default)` next to it's name.

```bash
kubectl get storageclass
```

1. Set `ALLOWVOLUMEEXPANSION` to `true` if it is not.
This will allow the persistent volumes to be dynamically resized.

```bash
kubectl patch storageclass <storage-class-name> --patch '{\"allowVolumeExpansion\": true}'
```

```{note}
At the point, you could try to redeploy the support chart and see if it succeeds.
If it doesn't, continue with the steps.
```

1. Delete the persistent volume claim for the prometheus server.
Persistent volume claims cannot be patched so we will need to recreate it.

```bash
# List all PVCs in the support namespace
kubectl -n support get pvc

# Delete the prometheus server PVC
kubectl -n support delete pvc support-prometheus-server
```

1. In another terminal with the `CLUSTER_NAME` variable set, redeploy the support chart.
It should fail with the PVC in a `Pending` state.

```bash
deployer deploy-support $CLUSTER_NAME
```

1. Edit the persistent volume to have the same UID and resource version as the newly created PVC under`spec.claimRef`.

```bash
# Get the UID and resource version of the PVC
kubectl -n support get pvc support-prometheus-server -o yaml

# Edit the PV to reference these values under `spec.claimRef`
kubectl edit pv <pv-name>
```

1. Delete the prometheus server pod and check that it comes back up.

```bash
kubectl -n support delete pod support-prometheus-server-<hash>
kubectl -n support get pods --watch
```

1. Redeploy the support chart again and this time it should succeed.

```bash
deployer deploy-support $CLUSTER_NAME
```