-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
charts,salt,build: Bump kube-prometheus-stack version to 16.9.1 #3422
Conversation
Hello teddyandrieux,My role is to assist you with the merge of this Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
|
8fc73b9
to
4c8a063
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, you just need to update the customizable Prometheus rules as follows:
diff --git a/salt/metalk8s/addons/prometheus-operator/config/prometheus.yaml b/salt/metalk8s/addons/prometheus-operator/config/prometheus.yaml
index 7dcd6dca5..f356c4ede 100644
--- a/salt/metalk8s/addons/prometheus-operator/config/prometheus.yaml
+++ b/salt/metalk8s/addons/prometheus-operator/config/prometheus.yaml
@@ -39,10 +39,10 @@ spec:
available: 3
node_network_receive_errors:
warning:
- errors: 10 # Number of receive errors for the last 2m
+ errors: 0.01 # Rate of receive errors for the last 2m
node_network_transmit_errors:
warning:
- errors: 10 # Number of transmit errors for the last 2m
+ errors: 0.01 # Rate of transmit errors for the last 2m
node_high_number_conntrack_entries_used:
warning:
threshold: 0.75
diff --git a/salt/metalk8s/addons/prometheus-operator/deployed/prometheus-rules.sls b/salt/metalk8s/addons/prometheus-operator/deployed/prometheus-rules.sls
index 6549e8336..a5576dd77 100644
--- a/salt/metalk8s/addons/prometheus-operator/deployed/prometheus-rules.sls
+++ b/salt/metalk8s/addons/prometheus-operator/deployed/prometheus-rules.sls
@@ -168,7 +168,8 @@ spec:
{{ printf "%.0f" $value }} receive errors in the last two minutes.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodenetworkreceiveerrs
summary: Network interface is reporting many receive errors.
- expr: increase(node_network_receive_errs_total[2m]) > {% endraw %}{{ rules.node_exporter.node_network_receive_errors.warning.errors }}{% raw %}
+ expr: increase(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m])
+ > {% endraw %}{{ rules.node_exporter.node_network_receive_errors.warning.errors }}{% raw %}
for: 1h
labels:
severity: warning
@@ -178,7 +179,8 @@ spec:
{{ printf "%.0f" $value }} transmit errors in the last two minutes.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodenetworktransmiterrs
summary: Network interface is reporting many transmit errors.
- expr: increase(node_network_transmit_errs_total[2m]) > {% endraw %}{{ rules.node_exporter.node_network_transmit_errors.warning.errors }}{% raw %}
+ expr: increase(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m])
+ > {% endraw %}{{ rules.node_exporter.node_network_transmit_errors.warning.errors }}{% raw %}
for: 1h
labels:
severity: warning
@@ -217,7 +219,10 @@ spec:
is configured on this host.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodeclocknotsynchronising
summary: Clock not synchronising.
- expr: min_over_time(node_timex_sync_status[5m]) == {% endraw %}{{ rules.node_exporter.node_clock_not_synchronising.warning.threshold }}{% raw %}
+ expr: |-
+ min_over_time(node_timex_sync_status[5m]) == {% endraw %}{{ rules.node_exporter.node_clock_not_synchronising.warning.threshold }}{% raw %}
+ and
+ node_timex_maxerror_seconds >= 16
for: 10m
labels:
severity: warning
@@ -247,7 +252,7 @@ spec:
Array '{{ $labels.device }}' needs attention and possibly a disk swap.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-noderaiddiskfailure
summary: Failed device in RAID array
- expr: node_md_disks{state="fail"} >= {% endraw %}{{ rules.node_exporter.node_raid_disk_failure.warning.threshold }}{% raw %}
+ expr: node_md_disks{state="failed"} >= {% endraw %}{{ rules.node_exporter.node_raid_disk_failure.warning.threshold }}{% raw %}
labels:
severity: warning
{%- endraw %}
I'm not 100% sure we want to change NodeNetworkTransmitErrs
and NodeNetworkReceiveErrs
as it breaks the compatibility with what we had before.
If a user has customized node_network_transmit_errors.warning.errors
or node_network_receive_errors.warning.errors
, it will not work as expected.
The thing is we can't really magically convert the old value, so either we keep it like that, either we do a "breaking" change (anyway I'm not sure anyone is using this).
Otherwise we could also rename errors
to something else like error_rate
, so at least the old customized value is not taken into account and we fallback on the default behavior.
4c8a063
to
d8dad8b
Compare
Update kube-prometheus-stack chart to 16.9.1 ``` rm -rf charts/kube-prometheus-stack helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm fetch -d charts --untar prometheus-community/kube-prometheus-stack ``` Re-render chart to salt state using ``` ./charts/render.py prometheus-operator \ charts/kube-prometheus-stack.yaml \ charts/kube-prometheus-stack/ \ --namespace metalk8s-monitoring \ --service-config grafana \ metalk8s-grafana-config \ metalk8s/addons/prometheus-operator/config/grafana.yaml \ metalk8s-monitoring \ --service-config prometheus \ metalk8s-prometheus-config \ metalk8s/addons/prometheus-operator/config/prometheus.yaml \ metalk8s-monitoring \ --service-config alertmanager \ metalk8s-alertmanager-config \ metalk8s/addons/prometheus-operator/config/alertmanager.yaml \ metalk8s-monitoring \ --service-config dex \ metalk8s-dex-config \ metalk8s/addons/dex/config/dex.yaml.j2 metalk8s-auth \ --drop-prometheus-rules charts/drop-prometheus-rules.yaml \ > salt/metalk8s/addons/prometheus-operator/deployed/chart.sls ``` Update vendored rules Update alert rules extract ``` ./tools/rule_extractor/rule_extractor.py \ -i <control-plane-ip> -p 8443 -t rules ```
d8dad8b
to
83809dd
Compare
/approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue None. Goodbye teddyandrieux. |
Bump kube-prometheus-stack charts version to 16.9.1
The following images have also been bumped accordingly: