Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pod acknowledged time metric #1803

Merged
merged 24 commits into from
Nov 15, 2024

Conversation

njtran
Copy link
Contributor

@njtran njtran commented Nov 8, 2024

Fixes #N/A

Description
Keeps track of pods in cluster state depending on work we've made towards scheduling the pod.

  • Pod Ack: when do we first see the pending pod in the cache (first write wins)
  • Pod Scheduling Attempted: when did we first actually pop it off the scheduling queue and try to schedule it (first write wins)
  • Pod Scheduling Success: when did we first think that the pod could schedule to a node (theoretical or not) (first write wins)

This will emit metrics that detail:

  • Pod Scheduling Decision Latency = the time it takes for Karpenter to first decide if a pod is schedulable or not
  • Pod Provisioning Binding Latency = how long it took for the pod to bind after we first thought that we could schedule the pod
  • Pod Provisioning Startup Latency = how long it took for the pod to go running after we first thought that we could schedule the pod

This also changes a metric name (can change since it hasn't been released yet):

  • karpenter_pods_current_unbound_duration_seconds -> karpenter_pods_unbound_duration_seconds

How was this change tested?
make presubmit

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 8, 2024
@coveralls
Copy link

coveralls commented Nov 8, 2024

Pull Request Test Coverage Report for Build 11850563098

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 132 of 133 (99.25%) changed or added relevant lines in 4 files are covered.
  • 33 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.2%) to 81.05%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/controllers.go 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/utils/nodeclaim/nodeclaim.go 2 37.07%
pkg/controllers/controllers.go 3 0.0%
pkg/controllers/nodeclaim/lifecycle/launch.go 4 93.1%
pkg/controllers/provisioning/scheduling/scheduler.go 4 95.18%
pkg/controllers/nodeclaim/lifecycle/initialization.go 8 86.11%
pkg/controllers/nodeclaim/lifecycle/controller.go 12 71.07%
Totals Coverage Status
Change from base Build 11749223840: 0.2%
Covered Lines: 8601
Relevant Lines: 10612

💛 - Coveralls

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 8, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2024
pkg/controllers/state/metrics.go Outdated Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/metrics.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/provisioning/provisioner.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/metrics.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Outdated Show resolved Hide resolved
pkg/controllers/metrics/pod/controller.go Show resolved Hide resolved
pkg/controllers/provisioning/scheduling/metrics.go Outdated Show resolved Hide resolved
pkg/controllers/provisioning/scheduling/scheduler.go Outdated Show resolved Hide resolved
pkg/controllers/state/metrics.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2024
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
pkg/controllers/state/cluster.go Outdated Show resolved Hide resolved
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 15, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jonathan-innis, njtran

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jonathan-innis,njtran]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a8ad2ee into kubernetes-sigs:main Nov 15, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants