Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add podsLister to KPA and report pod state metrics #6476

Merged
merged 1 commit into from
Jan 21, 2020

Conversation

nimakaviani
Copy link
Contributor

/lint

following the conversation in the WG, podsLister is going to be useful when patching pods both for the graceful scaledown and QP death proposals.

This PR adds the podsLister to KPA and uses it to emit metrics on the state of pods for a service revision. It also updates the scaling debugging dashboard in grafana to plot these metrics.

/assign @vagababov

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jan 9, 2020
@knative-prow-robot knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 9, 2020
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nimakaviani: 4 warnings.

In response to this:

/lint

following the conversation in the WG, podsLister is going to be useful when patching pods both for the graceful scaledown and QP death proposals.

This PR adds the podsLister to KPA and uses it to emit metrics on the state of pods for a service revision. It also updates the scaling debugging dashboard in grafana to plot these metrics.

/assign @vagababov

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/autoscaler/stats_reporter.go Show resolved Hide resolved
pkg/autoscaler/stats_reporter.go Show resolved Hide resolved
pkg/autoscaler/stats_reporter.go Show resolved Hide resolved
pkg/reconciler/testing/v1alpha1/listers.go Show resolved Hide resolved
pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
pkg/resources/endpoints_test.go Show resolved Hide resolved
Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @yanweiguo
For metrics.

@vagababov vagababov changed the title add podsLister to kpa and report pod state metrics Add podsLister to KPA and report pod state metrics Jan 9, 2020
pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
pkg/resources/endpoints_test.go Outdated Show resolved Hide resolved
pkg/resources/pods.go Outdated Show resolved Hide resolved
pkg/resources/pods.go Outdated Show resolved Hide resolved
pkg/resources/pods.go Outdated Show resolved Hide resolved
&view.View{
Description: "Number of pods that are not ready currently",
Measure: notReadyPodCountM,
Aggregation: view.LastValue(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a known issue for LastValue metric: #4470

I'd @ssmall to put some opinions here.

pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
pkg/autoscaler/stats_reporter.go Show resolved Hide resolved
pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
pkg/reconciler/autoscaling/kpa/kpa.go Show resolved Hide resolved
@nimakaviani
Copy link
Contributor Author

made all changes except for this

What if we do this crazy idea where we do ReportPodCounts(desired, requested, actual, notReady, terminating, pending int64) ? Or at least just 2 methods one for "desired/requested" and one for "Actual/not ready/etc" ?

pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
pkg/reconciler/autoscaling/kpa/resources/pod_count.go Outdated Show resolved Hide resolved
pkg/resources/endpoints.go Outdated Show resolved Hide resolved
pkg/resources/pods.go Outdated Show resolved Hide resolved
// Terminating state
func (pc *scopedPodCounter) PendingTerminatingCount() (int, int, error) {
pods, err := pc.podsLister.Pods(pc.namespace).List(labels.SelectorFromSet(labels.Set{
serving.RevisionLabelKey: pc.serviceName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be revisionKey then, rather than serviceName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we call is serviceName when we get it from SKS in KPA. same thing for endpoints as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this definitely should not be serviceName but rather revisionName. This is fair game for the EndpointsCounter, as the serviceName == endpointsName (this is always the case AFAIK).

In this case however, we're specifically querying based on the revision and thus should make sure we pass the revisionName in here.

pkg/resources/pods.go Outdated Show resolved Hide resolved
pkg/resources/pods.go Outdated Show resolved Hide resolved
@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-unit-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-unit-tests:

pkg/autoscaler.TestMultiScalerScaling

Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
A few nits
/assign @markusthoemmes
I've spent quite a lot looking so I might miss something, please take a look and approve.

pkg/resources/pods_test.go Outdated Show resolved Hide resolved
pkg/resources/pods_test.go Outdated Show resolved Hide resolved
@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2020
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2020
@vagababov
Copy link
Contributor

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2020
Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, I left a few suggestions.

The biggest grief on my end is passing the ServiceName where we actually want to pass the name of the revision we're dealing with. If we really wanted to play nicely, we'd need to resolve the selector via the ScaleTargetRef. I'd be okay though to not do that but instead resolve the proper revisionName at least.

pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
// Terminating state
func (pc *scopedPodCounter) PendingTerminatingCount() (int, int, error) {
pods, err := pc.podsLister.Pods(pc.namespace).List(labels.SelectorFromSet(labels.Set{
serving.RevisionLabelKey: pc.serviceName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this definitely should not be serviceName but rather revisionName. This is fair game for the EndpointsCounter, as the serviceName == endpointsName (this is always the case AFAIK).

In this case however, we're specifically querying based on the revision and thus should make sure we pass the revisionName in here.

pkg/resources/pods.go Outdated Show resolved Hide resolved
pkg/resources/pods_test.go Outdated Show resolved Hide resolved
pkg/resources/pods_test.go Show resolved Hide resolved
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2020
Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit, the error seems to build "the wrong way". LGTM otherwise. Let's merge this after the release though.

pkg/reconciler/autoscaling/kpa/kpa.go Outdated Show resolved Hide resolved
Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly OK, modulo my and Markus's questions

if terminating != test.wantTerminating {
t.Errorf("TerminatingCount() = %d, want: %d", terminating, test.wantTerminating)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want running check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we dont expose any functions to get the count of the ready pods. so maybe lets wait until we need them.

Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nimakaviani, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2020
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/autoscaler/stats_reporter.go 96.4% 96.8% 0.3
pkg/reconciler/autoscaling/kpa/kpa.go 92.8% 91.9% -0.9
pkg/resources/pods.go Do not exist 92.3%

@nimakaviani
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/monitoring cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants