Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System jobs are not rescheduled when resources become available #4072

Closed
maihde opened this issue Mar 29, 2018 · 11 comments
Closed

System jobs are not rescheduled when resources become available #4072

maihde opened this issue Mar 29, 2018 · 11 comments

Comments

@maihde
Copy link
Contributor

maihde commented Mar 29, 2018

Nomad version

0.7.1

Operating system and Environment details

Linux

Issue

When a system job is evaluated, if a node has no free resources the task is queued. Later, if resources on the node become available the system task should be allocated.

Reproduction steps

  1. Start a Nomad cluster
  2. Run jobs to consume all of the CPU resources
  3. Run a system job
  4. Notice that the system jobs are Queued
  5. Stop the job started at step 2
  6. Notice the the system jobs remain Queued

Expected behaviour:
System jobs are removed from the Queued state and go into the Running state

Observed behaviour:
System jobs remain in the Queued state.

@maihde
Copy link
Contributor Author

maihde commented Mar 29, 2018

By porting the blocked evaluation logic from generic_scheduler.go into system_scheduler.go I'm able to get things to work as expected.

https://github.com/maihde/nomad/tree/issue-4072

@dadgar
Copy link
Contributor

dadgar commented Mar 29, 2018

@maihde I think you identified the problem correctly. We do want to bring a lot of the improvements from the generic scheduler to the system scheduler. Hopefully we can use some of your work!

@maihde
Copy link
Contributor Author

maihde commented Mar 30, 2018

@dadgar thanks for the feedback. If you would like a pull-request for my patch, let me know.

@jippi
Copy link
Contributor

jippi commented May 17, 2018

@maihde would be nice to get that PR up

@maihde
Copy link
Contributor Author

maihde commented May 17, 2018

@jippi just opened up. Thanks.

@mwalters-workmarket
Copy link

Is there a timeline for the system scheduler rework that will include the fix for this?

@maihde
Copy link
Contributor Author

maihde commented Aug 1, 2019

@jippi @mwalters-workmarket my original pull request was closed because there was a major refactor planned to the schedulers and it was deemed easier to start afresh. If this feature is still on your roadmap I'd be happy to implement the feature against the current master and provide a new pull request.

@burdandrei
Copy link
Contributor

would be cool to revive this

@notnoop
Copy link
Contributor

notnoop commented Jul 9, 2020

Following up here late, sorry. Is this still a relevant issue? I believe Nomad 0.9.4 fixed this issue with #5900 . Can someone confirm?

@notnoop
Copy link
Contributor

notnoop commented Jul 16, 2020

Closing this ticket, as it seems fixed and I'm unable to reproduce it now. Please re-open or open a new one if you believe this to be an error. Thanks!

@notnoop notnoop closed this as completed Jul 16, 2020
@github-actions
Copy link

github-actions bot commented Nov 4, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants