Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: warn when system jobs cannot place an alloc #11111

Merged
merged 6 commits into from
Sep 13, 2021

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Aug 31, 2021

When a system or sysbatch job specify constraints that none of the current nodes meet, report a warning to the user.

Also, for sysbatch job, mark the job as dead if no allocation is placed at all. System jobs behavior isn't affected by this.

A sample run would look like:

$ nomad job run ./example.nomad
==> 2021-08-31T16:57:35-04:00: Monitoring evaluation "b48e8882"
    2021-08-31T16:57:35-04:00: Evaluation triggered by job "example"
==> 2021-08-31T16:57:36-04:00: Monitoring evaluation "b48e8882"
    2021-08-31T16:57:36-04:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-31T16:57:36-04:00: Evaluation "b48e8882" finished with status "complete" but failed to place all allocations:
    2021-08-31T16:57:36-04:00: Task Group "cache" (failed to place 1 allocation):
      * Constraint "${meta.tag} = bar": 2 nodes excluded by filter
      * Constraint "${attr.kernel.name} = linux": 1 nodes excluded by filter

$ nomad job status example
ID            = example
Name          = example
Submit Date   = 2021-08-31T16:57:35-04:00
Type          = sysbatch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         0        0       0         0

Allocations
No allocations placed

When a system or sysbatch job specify constraints that none of the
current nodes meet, report a warning to the user.

Also, for sysbatch job, mark the job as dead as a result.

A sample run would look like:

```
$ nomad job run ./example.nomad
==> 2021-08-31T16:57:35-04:00: Monitoring evaluation "b48e8882"
    2021-08-31T16:57:35-04:00: Evaluation triggered by job "example"
==> 2021-08-31T16:57:36-04:00: Monitoring evaluation "b48e8882"
    2021-08-31T16:57:36-04:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-31T16:57:36-04:00: Evaluation "b48e8882" finished with status "complete" but failed to place all allocations:
    2021-08-31T16:57:36-04:00: Task Group "cache" (failed to place 1 allocation):
      * Constraint "${meta.tag} = bar": 2 nodes excluded by filter
      * Constraint "${attr.kernel.name} = linux": 1 nodes excluded by filter

$ nomad job status example
ID            = example
Name          = example
Submit Date   = 2021-08-31T16:57:35-04:00
Type          = sysbatch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         0        0       0         0

Allocations
No allocations placed
```
@notnoop notnoop requested review from schmichael and lgfa29 August 31, 2021 21:01
@notnoop notnoop self-assigned this Aug 31, 2021
scheduler/scheduler_sysbatch_test.go Show resolved Hide resolved
scheduler/scheduler_sysbatch_test.go Outdated Show resolved Hide resolved
@lgfa29
Copy link
Contributor

lgfa29 commented Sep 2, 2021

@notnoop I pushed a changelog entry, but I'm not sure if it's good. Feel free to change it before merge.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notnoop is out, so I'll try to move this forward! I'm not sure my comment is a blocker, but I'd like to check it out before merging.

scheduler/scheduler_system.go Outdated Show resolved Hide resolved
.changelog/11111.txt Outdated Show resolved Hide resolved
@mikenomitch mikenomitch added this to the 1.1.5 milestone Sep 7, 2021
Defensively deep copy AllocMetric to avoid side effects from shared map
references.
While I don't think this fully encompasses the changes, other bits
like marking sysbatch as dead immediately are new so haven't changed
from a previous release.
@schmichael schmichael merged commit 24b2770 into main Sep 13, 2021
@schmichael schmichael deleted the b-system-no-match branch September 13, 2021 23:06
@schmichael schmichael modified the milestones: 1.1.5, 1.2.0 Sep 17, 2021
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants