-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: fix tracking of concurrent events #120413
scheduler: fix tracking of concurrent events #120413
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Error(nil, "In flight Pod isn't found in the scheduling queue. If you see this error log, it's likely a bug in the scheduler.", "pod", klog.KObj(pInfo.Pod)) | ||
return framework.QueueAfterBackoff | ||
} | ||
if p.inFlightEvents.Len() > len(p.inFlightPods) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am replicating the previous if len(pInfo.UnschedulablePlugins) == 0
shortcut here, but only if absolutely necessary (= when pod is not in flight).
I am not exactly sure whether there are legitimate reasons for having it. One unit test depends on it, but perhaps that test is broken (see comment below).
// Construct a Pod, but don't associate its scheduler failure to any plugin | ||
hpp1 := highPriorityPodInfo.Pod.DeepCopy() | ||
hpp1.Name = "hpp1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having two pods with the same UID breaks the inFlightPod
tracking of pods. Somehow it worked before, but not anymore with the revised implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. We shouldn't be testing that.
q.activeQ.Add(q.newQueuedPodInfo(hpp1)) | ||
if p, err := q.Pop(); err != nil || p.Pod != hpp1 { | ||
t.Errorf("Expected: %v after Pop, but got: %v", hpp1, p.Pod.Name) | ||
} | ||
expectInFlightPods(t, q, hpp1.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used expectInFlightPods
to figure out where the test flow went astray when it still used two pods with the same UID. I kept it because it provides some additional test coverage for both inFlightPods
and inFlightEvents
and because it makes the test easier to read.
q.AddUnschedulableIfNotPresent(logger, q.newQueuedPodInfo(unschedulablePodInfo.Pod, "fooPlugin"), q.SchedulingCycle()) | ||
expectInFlightPods(t, q, medPriorityPodInfo.Pod.UID, highPriorityPodInfo.Pod.UID, hpp1.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about the line above with q.newQueuedPodInfo
: creating a new PodInfo
instance feels wrong.
The test should more realistic and use the podInfo
instance that it got from Pop
, just like a real scheduling attempt.
I kept it because I don't want to rock the boat too much in one PR, but this might be worth a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg
// This can happen while updating a pod. In that case pInfo.UnschedulablePlugins should | ||
// be empty. If it is not, we may have a problem. | ||
if len(pInfo.UnschedulablePlugins) != 0 { | ||
logger.Error(nil, "In flight Pod isn't found in the scheduling queue. If you see this error log, it's likely a bug in the scheduler.", "pod", klog.KObj(pInfo.Pod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code coverage and dlv show that this line is hit when running tests, for example:
2669: for _, op := range test.operations {
2670: for _, pInfo := range pInfos {
=>2671: op(logger, queue, pInfo)
2672: }
2673: }
This is another instance of "pInfo
is not the result of Pop
". Doesn't look easy to fix, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line being hit consistently? Do you have the test case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code hits it reliably:
op(logger, queue, pInfo) |
That's addUnschedulablePodBackToBackoffQ
called for a fabricated QueuedPodInfo
, without making sure that "inFlightPods" is set up.
Current master also hits the error log message. This seems to be the only test which does, so it might be easier to fix than I thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see. Yeah, we should make sure to add a Pop before calling unschedulable. I remember Kensei added a lot of those in several tests. These probably slipped.
But it only gets removed once that event no longer matters. Or is the calculation for "does it still matter" wrong? |
The current definition of "does it still matter" is "the pods that where in flight when the event was received are still in flight". That's what the What happens is this:
|
// This can happen while updating a pod. In that case pInfo.UnschedulablePlugins should | ||
// be empty. If it is not, we may have a problem. | ||
if len(pInfo.UnschedulablePlugins) != 0 { | ||
logger.Error(nil, "In flight Pod isn't found in the scheduling queue. If you see this error log, it's likely a bug in the scheduler.", "pod", klog.KObj(pInfo.Pod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line being hit consistently? Do you have the test case for it?
} | ||
return clusterEvents | ||
return values | ||
} | ||
|
||
func Test_InFlightPods(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add a test that would have failed before this change?
I think one scenario is
- pop Pod-A
- event X
- pop Pod-B
- event Y
- requeue pod-A
- requeue pod-B (should be hinted according to event Y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one now. It's a bit more complicated, see second commit message:
To make it work without assumptions about the implementation, the QueuedPodInfo
returned by Pop must be the one passed to AddUnschedulableIfNotPresent
after (potentially) populating UnschedulablePlugins. This is done via callback
functions which bind to the same shared variable.
t.Errorf("Unexpected diff in inFlightPods (-want, +got):\n%s", diff) | ||
} | ||
|
||
if diff := cmp.Diff(listToClusterEvents(test.wantReceivedEvents), listToClusterEvents(q.receivedEvents), cmp.AllowUnexported(clusterEvent{})); diff != "" { | ||
t.Errorf("Unexpected diff in receivedEvents (-want, +got):\n%s", diff) | ||
wantInFlightEvents := make([]interface{}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wantInFlightEvents := make([]interface{}, 0) | |
var wantInFlightEvents []interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I need to make the same change in listToValues
and remove the pre-allocation, otherwise I end up with nil
!= []interface{}` differences in some cases - done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmpopts.EquateEmpty
} | ||
sortUIDs := cmpopts.SortSlices(func(a, b types.UID) bool { return a < b }) | ||
if diff := cmp.Diff(uids, actualUIDs, sortUIDs); diff != "" { | ||
t.Fatalf("Unexpected content of inFlightPods (--- want, +++ have):\n%s", diff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Fatalf("Unexpected content of inFlightPods (--- want, +++ have):\n%s", diff) | |
t.Fatalf("Unexpected content of inFlightPods (-want, +got):\n%s", diff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
I think the bug was on the fact that we assumed that Next would still work (point to the previous next). In any case, the new approach looks good. However, let's try to get into why the pod UID is sometimes not found in |
The previous approach was based on the assumption that an in-flight pod can use the head of the received event list as marker for identifying all events that occur while the pod is in flight. That assumption is incorrect: when that existing element gets removed from the list because all pods that were in-flight when it was received are done, that marker's Next method returns nil and the code which should have seen several concurrent events (if there were any) missed all of those. As a result, a pod with concurrent events could incorrectly get moved to the unschedulable queue where it could got stuck until the next periodic purging after 5 minutes if there was no other event for it. The approach with maintaining a single list of concurrent events can be fixed by inserting each in-flight pod into the list and using that element to identify "more recent" events for the pod.
c97552d
to
14ba163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
wantActiveQPodNames: []string{pod2.Name}, | ||
wantInFlightPods: nil, | ||
wantInFlightEvents: nil, | ||
queueingHintMap: QueueingHintMapPerProfile{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we didn't do this in other tests, but we should be listing outputs (wantX
) last.
// Construct a Pod, but don't associate its scheduler failure to any plugin | ||
hpp1 := highPriorityPodInfo.Pod.DeepCopy() | ||
hpp1.Name = "hpp1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. We shouldn't be testing that.
q.AddUnschedulableIfNotPresent(logger, q.newQueuedPodInfo(unschedulablePodInfo.Pod, "fooPlugin"), q.SchedulingCycle()) | ||
expectInFlightPods(t, q, medPriorityPodInfo.Pod.UID, highPriorityPodInfo.Pod.UID, hpp1.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg
LGTM label has been added. Git tree hash: bd8b719532b631f6f7bda4f1314bbc7b0bdbfbf7
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The problematic scenario was having one pod in flight, one event in the list, and then detecting a concurrent event for a second pod after the first pod is done. The new test case covers that. To make it work without assumptions about the implementation, the QueuedPodInfo returned by Pop must be the one passed to AddUnschedulableIfNotPresent after (potentially) populating UnschedulablePlugins. This is done via callback functions which bind to the same shared variable.
14ba163
to
c131c92
Compare
/hold pull-kubernetes-linter-hints is right, my new code lacks errors checking:
|
/hold cancel @alculquicondor: needs another LGTM. |
/lgtm |
LGTM label has been added. Git tree hash: f222a7c29149f1a37c581888213d26e5850e1087
|
Cherry-pick is in #120445 |
…3-origin-release-1.28 Automated cherry pick of #120413: scheduler: fix tracking of concurrent events
What type of PR is this?
/kind bug
What this PR does / why we need it:
The previous approach was based on the assumption that an in-flight pod can use the head of the received event list as marker for identifying all events that occur while the pod is in flight. That assumption is incorrect: when that existing element gets removed from the list because all pods that were in-flight when it was received are done, that marker's Next method returns nil and the code which should have seen several concurrent events (if there were any) missed all of those.
Which issue(s) this PR fixes:
A pod with concurrent events could incorrectly get moved to the unschedulable queue where it could got stuck until the next periodic purging after 5 minutes if there was no other event for it.
Special notes for your reviewer:
The approach with maintaining a single list of concurrent events can be fixed by inserting each in-flight pod into the list and using that element to identify "more recent" events for the pod.
This affects Kubernetes >= 1.28. We need to backport the fix.
Does this PR introduce a user-facing change?