-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Smart Node Drain #4005
WIP: Smart Node Drain #4005
Conversation
This PR allows marking a node as eligible for scheduling while toggling drain. By default the `nomad node drain -disable` commmand will mark it as eligible but the drainer will maintain in-eligibility.
drainv2: Job Watcher Testing
allow -detach like other commands
Also delay "node complete" after the node has been marked complete to capture a few more alloc events. There are other ways to implement this that could trade off correctness for responsiveness as technically a node is considered drained when all of its allocs have been marked to stop and not when they've actually stopped (which may not happen for a long time).
…nto f-drainv2-node-drainer
@@ -455,9 +455,9 @@ func mergeAutocompleteFlags(flags ...complete.Flags) complete.Flags { | |||
return merged | |||
} | |||
|
|||
// sanitizeUUIDPrefix is used to sanitize a UUID prefix. The returned result | |||
// sanitizeUUIDPrefix is used to sanatize a UUID prefix. The returned result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad merge
@@ -735,6 +741,82 @@ func TestParse(t *testing.T) { | |||
}, | |||
false, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated. bad merge?
I took another crack at rebasing to try to remove the ugly wip commits and fixup the couple (minor) issues I found in this version: #4010 The other benefit of the alternate approach is it's rebased on master and should be easy to continue to rebase if necessary (it shouldn't be 😬 ). I looked at |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR introduces a new stanza called
migrate
that allows a job to specify its run time requirements which are now taken into account during cluster maintenance operations. The new node drainer will taken into account the migrate strategy into account and attempt to avoid service down time.