-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add podsLister to KPA and report pod state metrics #6476
Add podsLister to KPA and report pod state metrics #6476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nimakaviani: 4 warnings.
In response to this:
/lint
following the conversation in the WG, podsLister is going to be useful when patching pods both for the graceful scaledown and QP death proposals.
This PR adds the podsLister to KPA and uses it to emit metrics on the state of pods for a service revision. It also updates the scaling debugging dashboard in grafana to plot these metrics.
/assign @vagababov
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
627e676
to
7b1d3b6
Compare
7b1d3b6
to
87391ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign @yanweiguo
For metrics.
87391ed
to
ec8777c
Compare
&view.View{ | ||
Description: "Number of pods that are not ready currently", | ||
Measure: notReadyPodCountM, | ||
Aggregation: view.LastValue(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec8777c
to
61958d9
Compare
61958d9
to
4143162
Compare
made all changes except for this
|
pkg/resources/pods.go
Outdated
// Terminating state | ||
func (pc *scopedPodCounter) PendingTerminatingCount() (int, int, error) { | ||
pods, err := pc.podsLister.Pods(pc.namespace).List(labels.SelectorFromSet(labels.Set{ | ||
serving.RevisionLabelKey: pc.serviceName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be revisionKey
then, rather than serviceName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we call is serviceName
when we get it from SKS in KPA. same thing for endpoints as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this definitely should not be serviceName
but rather revisionName
. This is fair game for the EndpointsCounter, as the serviceName == endpointsName (this is always the case AFAIK).
In this case however, we're specifically querying based on the revision and thus should make sure we pass the revisionName in here.
4143162
to
2860060
Compare
The following jobs failed:
Failed non-flaky tests preventing automatic retry of pull-knative-serving-unit-tests:
|
2860060
to
b8f1101
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
A few nits
/assign @markusthoemmes
I've spent quite a lot looking so I might miss something, please take a look and approve.
b8f1101
to
78c54d3
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, I left a few suggestions.
The biggest grief on my end is passing the ServiceName where we actually want to pass the name of the revision we're dealing with. If we really wanted to play nicely, we'd need to resolve the selector via the ScaleTargetRef. I'd be okay though to not do that but instead resolve the proper revisionName at least.
pkg/resources/pods.go
Outdated
// Terminating state | ||
func (pc *scopedPodCounter) PendingTerminatingCount() (int, int, error) { | ||
pods, err := pc.podsLister.Pods(pc.namespace).List(labels.SelectorFromSet(labels.Set{ | ||
serving.RevisionLabelKey: pc.serviceName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this definitely should not be serviceName
but rather revisionName
. This is fair game for the EndpointsCounter, as the serviceName == endpointsName (this is always the case AFAIK).
In this case however, we're specifically querying based on the revision and thus should make sure we pass the revisionName in here.
78c54d3
to
8aeadac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last nit, the error seems to build "the wrong way". LGTM otherwise. Let's merge this after the release though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly OK, modulo my and Markus's questions
pkg/resources/pods_test.go
Outdated
if terminating != test.wantTerminating { | ||
t.Errorf("TerminatingCount() = %d, want: %d", terminating, test.wantTerminating) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want running check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we dont expose any functions to get the count of the ready pods. so maybe lets wait until we need them.
8aeadac
to
5539216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nimakaviani, vagababov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following is the coverage report on the affected files.
|
/retest |
/lint
following the conversation in the WG, podsLister is going to be useful when patching pods both for the graceful scaledown and QP death proposals.
This PR adds the podsLister to KPA and uses it to emit metrics on the state of pods for a service revision. It also updates the scaling debugging dashboard in grafana to plot these metrics.
/assign @vagababov