Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields in Kubernetes state_pod metricset #39316

Merged

Conversation

MichaelKatsoulis
Copy link
Contributor

@MichaelKatsoulis MichaelKatsoulis commented Apr 30, 2024

Proposed commit message

  • WHAT: Enhance kubernetes state_pod metricset with kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields.
  • WHY: Useful new metrics that indicate the reason a pod might not be in a desired state and the time it took for a pod to become ready.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

  1. create local k8s cluster: kind create cluster
  2. install stack: elastic-package-0.98.2 stack up -d -v --version=8.14.0-SNAPSHOT
  3. edit dev-tools/kubernetes/Tiltfile to run in mode="run" and run:
cd dev-tools/kubernetes
tilt up
  1. Create a pod with a low resources.limits.memory to cause OOMKilled

Related issues

Use cases

Screenshots

ready time

Logs

Note

I did not manage to find any way that the kube_pod_status_reason has any other value than zero. The possible reasons of getting a value of 1 are Evicted, NodeAffinity, NodeLost, Shutdown, UnexpectedAdmissionError.
I tried to create such a situation but in all cases the pod failing reason have been either Error or Unschedulable.
These specific reasons are not collected by kube-state-metrics.

The problem though is not related to kube-state-metrics, rather to Kubernetes. When a pod gets evicted Kubernetes should add status reason to Evicted but that does not happen.
Anyway, I recommend we introduce the new fields as there may be cases that the status reason gets one of the expected values that I cannot reproduce in a local non production cluster.

@MichaelKatsoulis MichaelKatsoulis requested a review from a team as a code owner April 30, 2024 12:55
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 30, 2024
@MichaelKatsoulis MichaelKatsoulis marked this pull request as draft April 30, 2024 12:55
@MichaelKatsoulis MichaelKatsoulis removed the request for review from gsantoro April 30, 2024 12:55
Copy link
Contributor

mergify bot commented Apr 30, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @MichaelKatsoulis? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 30, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-05-08T08:14:46.589+0000

  • Duration: 100 min 48 sec

Test stats 🧪

Test Results
Failed 0
Passed 4618
Skipped 904
Total 5522

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@MichaelKatsoulis MichaelKatsoulis marked this pull request as ready for review May 1, 2024 10:18
@MichaelKatsoulis MichaelKatsoulis requested a review from gizas May 1, 2024 10:18
@MichaelKatsoulis MichaelKatsoulis added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label May 1, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 1, 2024
*`kubernetes.pod.status.ready_time`*::
+
--
Time in unix timestamp for a pod to achieve readiness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe would be better:

Suggested change
Time in unix timestamp for a pod to achieve readiness
Time in unix timestamp when a pod achieved readiness

or the same as in the ksm: Readiness achieved time in unix timestamp for a pod

@MichaelKatsoulis MichaelKatsoulis merged commit d1089d1 into elastic:main May 8, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants