Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

Detailed 'unschedulable' events #468

Merged

Conversation

adam-marek
Copy link
Contributor

What this PR does / why we need it:
kube-batch currently provides very little information about the resource constraints that keep specific pods from being scheduled. The 'unschedulable' events only state that "there are insufficient resources" for a job without giving any details which makes it hard to troubleshoot "stuck" jobs without resorting to kube-batch logifile analysis.

This PR attempts to expose some of these details to the user and bring the "unschedulable" events more in line with what is available in the stock scheduler.
It does so by introducing the following changes:

  • 'unschedulable' events are reported both for the PodGroup/PDB and the ingredient pods
  • 'unschedulable' events provide information about the number of nodes available for scheduling and which resource unavailability caused scheduling to fail for the job
  • 'unschedulanle' events include information about the number of gang members that couldn't be successfully allocated existing resources.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 7, 2018
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 7, 2018
@TravisBuddy
Copy link

Hey @adam-marek,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 2f8efb00-e2ce-11e8-a645-4b1061a8d41c

//resources an operand representing resources being requested. Any
//field that is less than 0 after the operation represents an
//insufficient resource.
func (r *Resource) FitDelta(rr *Resource) *Resource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a helper function in scheduler instead of a func of resources. FitDelta is more about scheduling part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second thought though, it is generic in the sense that it computes the delta or deficit of resource regardless of the context ... also it depends on a number of intimate details of resources (such as minMilliCPU, etc.) which would need to be exported if we put that in a different package ..
thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's different with Sub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub does not allow for negative results (the left operand must be greater or equal to the right operand).

@@ -144,8 +145,10 @@ func (gp *gangPlugin) OnSessionOpen(ssn *framework.Session) {

func (gp *gangPlugin) OnSessionClose(ssn *framework.Session) {
for _, job := range ssn.Jobs {
if len(job.TaskStatusIndex[api.Allocated]) != 0 {
ssn.Backoff(job, arbcorev1.UnschedulableEvent, "not enough resource for job")
if len(job.TaskStatusIndex[api.Pending]) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Gang-scheduling did not meet, we need to record an event at job level; if gang-scheduling does not meet, the task is allocated but not bind to the host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we need to support totalTasksNum > minMemberNum case for elastic jobs, e.g. Spark, in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Gang-scheduling did not meet, we need to record an event at job level; if gang-scheduling does not meet, the task is allocated but not bind to the host.

Not sure what you mean here. There may be gang jobs that failed scheduling with no Allocated tasks (all Pending), right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we need to support totalTasksNum > minMemberNum case for elastic jobs, e.g. Spark, in future.

When calculating the total number of task, the size of JobInfo->Tasks is used, which should cover the case you mention, correct ?

@@ -550,5 +550,11 @@ func (sc *SchedulerCache) Backoff(job *arbapi.JobInfo, event arbcorev1.Event, re
return fmt.Errorf("no scheduling specification for job")
}

for _, tasks := range job.TaskStatusIndex {
for _, t := range tasks {
sc.recorder.Eventf(t.Pod, v1.EventTypeWarning, string(event), reason)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we record event for all task/pods ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in order to match the stock scheduler behaviour which kind of sets the norm in K8s world in terms of users' expectations. No events recorded for Pods which "hang" in Pending state for a long time might seem weird ...
Also this aids in quicker troubleshooting, as one does not need to track down the correct higher level entity to find out why a pod appears to be stuck.

@TravisBuddy
Copy link

Hey @adam-marek,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: f19f4ea0-e348-11e8-8936-1328ec1a4b2b

@k82cn
Copy link
Contributor

k82cn commented Nov 13, 2018

/approve

@adam-marek, thanks very much for your contribution, we definitely need this PR :)
I'm going to review it again when I'm back from kubecon china, sorry for that.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adam-marek, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2018
@k82cn
Copy link
Contributor

k82cn commented Nov 16, 2018

/lgtm

LGTM overall, let's get it merged :)

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 16, 2018
@k8s-ci-robot k8s-ci-robot merged commit 70177dc into kubernetes-retired:master Nov 16, 2018
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
…source-info

Detailed 'unschedulable' events
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
…source-info

Detailed 'unschedulable' events
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
…source-info

Detailed 'unschedulable' events
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants