Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade kube-prometheus-stack to 67.5.0 #2381

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

anders-elastisys
Copy link
Contributor

@anders-elastisys anders-elastisys commented Dec 30, 2024

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • [kind/adr](set-me)

Application Developer notice

Prometheus has been upgraded to version 3.0. This includes changes to the Prometheus UI.

What does this PR do / why do we need this PR?

Noticed that the kube-prometheus-stack was falling behind a bit, this PR upgrades the Helm chart to v67.5.0 which also upgrades Prometheus to v3..
I checked the v3 migration guide and I did not see that we are currently using any of breaking flags or configurations in our default Welkin config, but please verify if this is used in some environments.

This fixes some ARP metrics and a log issue caused by this in the node-exporter (this is mentioned in the linked issue).

Alertmanager in the Mangement cluster is not upgraded, instead the image version is fixed to previous v0.26.0 due to v0.27.0 deprecating the v1 API endpoint, which is still used by Thanos.
Once we upgrade Thanos to v0.35 or higher, the v2 endpoint will be default (see related upstream issue) and we can remove the image override.

Information to reviewers

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change updates CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts required no updates)
    • The metrics names did change (Grafana dashboards and Prometheus alerts required an update)
  • Logs checks:
    • The logs do not show any errors after the change
  • PodSecurityPolicy checks:
    • Any changed Pod is covered by Kubernetes Pod Security Standards
    • Any changed Pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any Pods to be blocked by Pod Security Standards or Policies
  • NetworkPolicy checks:
    • Any changed Pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@anders-elastisys anders-elastisys force-pushed the anders-elastisys/upgrade-kube-prometheus-stack-prometheus-v3 branch from ba80518 to e6fa82f Compare December 30, 2024 14:31
@OlleLarsson
Copy link
Contributor

Did you account for the change that they mention here? We seem to set these to false in the wc kps config

current_version=$(helm_do "${cluster}" get metadata -n monitoring kube-prometheus-stack -ojson | jq '.version' | tr -d '"')

log_info " - Checking if kube-promethes-stack needs to be upgraded"
if [[ ! "${current_version}" < "$(echo -e "${new_version}\n${current_version}" | sort -V | tail -n1)" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does this comparison work with versions?

Wouldn't this just need to be

Suggested change
if [[ ! "${current_version}" < "$(echo -e "${new_version}\n${current_version}" | sort -V | tail -n1)" ]]; then
if [[ "${current_version}" != "${new_version}" ]]; then

Copy link
Contributor

@Elias-elastisys Elias-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about the PromQL change to the dot token?

Just looking quickly I found usage in both alerts and dashboards, I'm sure they are everywhere.

"expr": "min((time()-kube_job_status_completion_time{job_name=~\"harbor-backup-cronjob-.*\", cluster=~\"$cluster\"})/3600)",

Have you checked if all dashboards behave the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Kube-prometheus-stack-60.0.0
4 participants