-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drain v2: add controlled draining #4010
Conversation
cfc5e45
to
94fb0c0
Compare
f78f580
to
af3e8d2
Compare
api/nodes.go
Outdated
// will disable draining. | ||
DrainSpec *DrainSpec | ||
|
||
// MarkEligible marks the node as eligible if removing the drain strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// MarkEligible marks the node as eligible for scheduling if removing the drain strategy.
type DrainSpec struct { | ||
// Deadline is the duration after StartTime when the remaining | ||
// allocations on a draining Node should be told to stop. | ||
Deadline time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its possible I missed it because of the large feature branch, but if not where is the validation logic for this? e,g make sure that deadlne is >=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the validation logic for this? e,g make sure that deadlne is >=0
In the UpdateDrain RPC endpoint itself. A Deadline <= 0 is a force drain: https://github.com/hashicorp/nomad/pull/4010/files#diff-57d6a6426f4ff4ab5d9eecfbca4399daR446
Its possible I missed it because of the large feature branch
I feel like it would be impossible for you to catch it in such a huge PR so no worries.
if alloc.Job.Type == structs.JobTypeBatch && alloc.RanSuccessfully() { | ||
untainted[alloc.ID] = alloc | ||
// Non-terminal allocs that should migrate should always migrate | ||
if alloc.DesiredTransition.ShouldMigrate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also check the alloc currently is in state DesiredStatusRun, because what if the same alloc is revisited again by the reconciler and it has already been migrated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alloc.TerminalStatus()
above will be true if DesiredStatus != "run"
so the alloc will not be migrated.
fix build tag
filterByTainted node should always migrate non-terminal migrating allocs
Chan must be buffered to avoid skipping triggering altogether Also made timing in a test a bit more lenient
Test job watcher
Also drop the New func as it's easy to swap the order of arguments since they're both strings.
drainer was unsetting drain before fsm could read written value
Node drainer would throw off the index checks
f192947
to
4da38fb
Compare
Also switch to require and add t.Helper to appropriate funcs.
4da38fb
to
e10883c
Compare
HealthyDeadline *time.Duration `mapstructure:"healthy_deadline"` | ||
} | ||
|
||
func DefaultMigrateStrategy() *MigrateStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer this being defined in structs like the rest of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we prefer calling the structs version from here or hardcoding defaults in Canonicalize? I see a variety of approaches in this file.
|
||
// Neither deployments nor migrations care about system jobs so never | ||
// watch their health | ||
if alloc.Job.Type == structs.JobTypeSystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be != Service?
Can't have migration on batch or system and update isn't defined on batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I didn't realize batch didn't support update
. Our docs don't mention that: https://www.nomadproject.io/docs/job-specification/update.html
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #2736
Features in this PR:
migrate
stanza to jobs similar toupdate
to control the rate at which jobs are drained from draining nodes.testlog
improvement: setNOMAD_TEST_STDOUT=1
to output test log output to stdout instead oft.Logf
(handy for iterating on a slow test over and over)mock.BatchJob()
- does what it says on the tin. Easier than editingmock.Job()
Future PRs will extend drain testing and add documentation.