-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to disable rescheduling of lost tasks. Implements issue #10366 #16867
Conversation
311767b
to
5631e93
Compare
5631e93
to
e35afd2
Compare
Hi @DominicLavery! Sorry about the delayed review on this. The issue #10366 was really still a little bit in the design stage, so there's some open questions about how this should interact with the existing (In the meantime, the GitHub Actions run didn't happen here, I think because this PR came in before we fixed the "on PR" bits. You might want to rebase on |
e35afd2
to
3f28e3b
Compare
3f28e3b
to
b35ff6c
Compare
b35ff6c
to
026ba23
Compare
Hey @tgross. Thanks for the response. Sorry to skip ahead in the process a bit, we had a bit of a pressing need for this kind of feature. I'd be happy to rework it based off of the results of your discussions. I agree that I've rebased (and re-enabled GH actions on the fork 😅) but I only get the one check, I'm not sure if that is expected. |
Hey @DominicLavery! Thank you for all the work on this issue. On reviewing the code, it all seems to be working, but on a closer inspection, there are two problems:
We think this is a very interesting feature and there is an internal discussion about how to move on with it. |
Hey @Juanadelacuesta! Thanks for the feedback! On point 2: I hadn't considered that! Happy to work on point 2, and @tgross's point about |
This adds the ability to prevent tasks being rescheduled when their allocations are lost as suggested in #10366
Jobs with lost allocations are left in the running state and require intervention. Currently, jobs can be stopped manually via the CLI or API. e.g.
nomad job stop
. A future improvement would be to implementnomad alloc reschedule
as suggested in the issue.