-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[experimental] Simple container rescheduling on node failure #1578
Conversation
if label, ok := c.Labels[SwarmLabelNamespace+".reschedule-policy"]; ok { | ||
reschedulePolicy = label | ||
} | ||
|
||
// parse affinities/constraints from env (ex. docker run -e affinity:container==redis -e affinity:image==nginx -e constraint:region==us-east -e constraint:storage==ssd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update comment.
ea7afd3
to
b72d27f
Compare
case "engine_disconnect": | ||
go w.rescheduleContainers(e.Engine) | ||
case "die", "destroy", "kill", "oom", "start", "stop", "rename": | ||
go w.reschedulePendingContainers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this trigger whole cluster reschedule, instead of just this engine? I think there could be a lot of events for a big cluster. Isn't just checking this container enough?
9352993
to
0220f36
Compare
@vieux we are working on the forced cleanup fix in moby/libnetwork#862. |
Signed-off-by: Andrea Luzzardi <[email protected]>
Add rescheduling integration tests. Signed-off-by: Andrea Luzzardi <[email protected]>
fix tests and keep swarm id remove duplicate on node reconnect explicit failure Signed-off-by: Victor Vieux <[email protected]>
Signed-off-by: Victor Vieux <[email protected]>
@@ -0,0 +1,81 @@ | |||
# Docker Experimental Features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vieux This looks like it shouldn't be here it is a backup file
Signed-off-by: Victor Vieux <[email protected]>
LGTM |
LGTM. |
LGTM |
[experimental] Simple container rescheduling on node failure
[experimental] Simple container rescheduling on node failure
See #1488 for details
When a node goes down, swarm will try to reschedule container on another machine.
Depends on moby/moby#19001 for IP stability.
Depends on #1569 for proper cleanup of old containers.