Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Prometheus retention period configurable #2953

Closed
alexandre-allard opened this issue Nov 30, 2020 · 0 comments · Fixed by #2968
Closed

Make Prometheus retention period configurable #2953

alexandre-allard opened this issue Nov 30, 2020 · 0 comments · Fixed by #2968
Assignees
Labels
complexity:easy Something that requires less than a day to fix topic:monitoring Everything related to monitoring of services in a running cluster

Comments

@alexandre-allard
Copy link
Contributor

alexandre-allard commented Nov 30, 2020

Component: monitoring, csc, salt

Why this is needed:
To allow user to change the retention period of Prometheus (which is 10 days by default).

What should be done:
Make it configurable through CSC ConfigMap metalk8s-prometheus-config.
We should also update documentation to explain how to customize retention.

Implementation proposal (strongly recommended):
Add a default in salt/metalk8s/addons/prometheus-operator/config/prometheus.yaml:

apiVersion: addons.metalk8s.scality.com
kind: PrometheusConfig
spec:
  config:
    retention: 10d

Then render salt/metalk8s/addons/prometheus-operator/deployed/chart.sls using this entry:

       release: prometheus-operator
   portName: web
   replicas: {% endraw -%}{{ prometheus.spec.deployment.replicas }}{%- raw %}
-  retention: 10d
+  retention: {% endraw -%}{{ prometheus.spec.config.retention }}{%- raw %}
   routePrefix: /
   ruleNamespaceSelector: {}
   ruleSelector:

The option will then be automatically configurable through CSC.

Bonus: we could also expose retentionSize (disabled by default) which is as the name suggests a retention based on the size, both options can be set at the same time. If we do so, we need to warn the user in the documentation that actually Prometheus does not take in account the WAL size, so the required size is greater than retentionSize (add at least 10% to be safe).

Test plan:

@alexandre-allard alexandre-allard added complexity:easy Something that requires less than a day to fix topic:monitoring Everything related to monitoring of services in a running cluster labels Nov 30, 2020
@alexandre-allard alexandre-allard self-assigned this Dec 2, 2020
alexandre-allard added a commit that referenced this issue Dec 9, 2020
We set these with special variables that will be
transformed into Jinja template by the chart
renderer script which will allow to customize
these fields at runtime through CSC mechanisms.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 9, 2020
Since we changed retention time and size in the
charts, we need to regen it to apply changes.

```
./charts/render.py prometheus-operator \
  charts/kube-prometheus-stack.yaml \
  charts/kube-prometheus-stack/ \
  --namespace metalk8s-monitoring \
  --service-config grafana \
  metalk8s-grafana-config \
  metalk8s/addons/prometheus-operator/config/grafana.yaml \
  metalk8s-monitoring \
  --service-config prometheus \
  metalk8s-prometheus-config \
  metalk8s/addons/prometheus-operator/config/prometheus.yaml \
  metalk8s-monitoring \
  --service-config alertmanager \
  metalk8s-alertmanager-config \
  metalk8s/addons/prometheus-operator/config/alertmanager.yaml \
  metalk8s-monitoring \
  --service-config dex \
  metalk8s-dex-config \
  metalk8s/addons/dex/config/dex.yaml.j2 metalk8s-auth \
  --drop-prometheus-rules charts/drop-prometheus-rules.yaml \
  > salt/metalk8s/addons/prometheus-operator/deployed/chart.sls
```

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 9, 2020
Explain how to change retention time and activate
retention size using CSC ConfigMap.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 14, 2020
We set these with special variables that will be
transformed into Jinja template by the chart
renderer script which will allow to customize
these fields at runtime through CSC mechanisms.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 14, 2020
Since we changed retention time and size in the
charts, we need to regen it to apply changes.

```
./charts/render.py prometheus-operator \
  charts/kube-prometheus-stack.yaml \
  charts/kube-prometheus-stack/ \
  --namespace metalk8s-monitoring \
  --service-config grafana \
  metalk8s-grafana-config \
  metalk8s/addons/prometheus-operator/config/grafana.yaml \
  metalk8s-monitoring \
  --service-config prometheus \
  metalk8s-prometheus-config \
  metalk8s/addons/prometheus-operator/config/prometheus.yaml \
  metalk8s-monitoring \
  --service-config alertmanager \
  metalk8s-alertmanager-config \
  metalk8s/addons/prometheus-operator/config/alertmanager.yaml \
  metalk8s-monitoring \
  --service-config dex \
  metalk8s-dex-config \
  metalk8s/addons/dex/config/dex.yaml.j2 metalk8s-auth \
  --drop-prometheus-rules charts/drop-prometheus-rules.yaml \
  > salt/metalk8s/addons/prometheus-operator/deployed/chart.sls
```

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 14, 2020
Explain how to change retention time and activate
retention size using CSC ConfigMap.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 15, 2020
We set these with special variables that will be
transformed into Jinja template by the chart
renderer script which will allow to customize
these fields at runtime through CSC mechanisms.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 15, 2020
Since we changed retention time and size in the
charts, we need to regen it to apply changes.

```
./charts/render.py prometheus-operator \
  charts/kube-prometheus-stack.yaml \
  charts/kube-prometheus-stack/ \
  --namespace metalk8s-monitoring \
  --service-config grafana \
  metalk8s-grafana-config \
  metalk8s/addons/prometheus-operator/config/grafana.yaml \
  metalk8s-monitoring \
  --service-config prometheus \
  metalk8s-prometheus-config \
  metalk8s/addons/prometheus-operator/config/prometheus.yaml \
  metalk8s-monitoring \
  --service-config alertmanager \
  metalk8s-alertmanager-config \
  metalk8s/addons/prometheus-operator/config/alertmanager.yaml \
  metalk8s-monitoring \
  --service-config dex \
  metalk8s-dex-config \
  metalk8s/addons/dex/config/dex.yaml.j2 metalk8s-auth \
  --drop-prometheus-rules charts/drop-prometheus-rules.yaml \
  > salt/metalk8s/addons/prometheus-operator/deployed/chart.sls
```

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 15, 2020
Explain how to change retention time and activate
retention size using CSC ConfigMap.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 15, 2020
We set these with special variables that will be
transformed into Jinja template by the chart
renderer script which will allow to customize
these fields at runtime through CSC mechanisms.

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 15, 2020
Since we changed retention time and size in the
charts, we need to regen it to apply changes.

```
./charts/render.py prometheus-operator \
  charts/kube-prometheus-stack.yaml \
  charts/kube-prometheus-stack/ \
  --namespace metalk8s-monitoring \
  --service-config grafana \
  metalk8s-grafana-config \
  metalk8s/addons/prometheus-operator/config/grafana.yaml \
  metalk8s-monitoring \
  --service-config prometheus \
  metalk8s-prometheus-config \
  metalk8s/addons/prometheus-operator/config/prometheus.yaml \
  metalk8s-monitoring \
  --service-config alertmanager \
  metalk8s-alertmanager-config \
  metalk8s/addons/prometheus-operator/config/alertmanager.yaml \
  metalk8s-monitoring \
  --service-config dex \
  metalk8s-dex-config \
  metalk8s/addons/dex/config/dex.yaml.j2 metalk8s-auth \
  --drop-prometheus-rules charts/drop-prometheus-rules.yaml \
  > salt/metalk8s/addons/prometheus-operator/deployed/chart.sls
```

Refs: #2953
alexandre-allard added a commit that referenced this issue Dec 15, 2020
Explain how to change retention time and activate
retention size using CSC ConfigMap.

Refs: #2953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:easy Something that requires less than a day to fix topic:monitoring Everything related to monitoring of services in a running cluster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant