This repository has been archived by the owner on Dec 5, 2017. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 92
scheduler should take action when receiving TASK_LOST for REASON_SLAVE_REMOVED #789
Comments
found while debugging #778 |
should also delete any mirror pods associated with the slave |
I've tried deleting the mirror pods, and they do go away but if...
the kubelet-executor will stay running until suicide timeout. this argues in favor of a lower value for the default suicide timeout threshold. #465 |
FWIW the apiserver has this status about the node (I'm wondering if the replication controller's pod GC is supposed to kick in here?):
|
we need to track additional mesos state as a node condition. upstream doesn't provide a mechanism with which we can inject such a condition. I filed a PR to add such: kubernetes/kubernetes#21521
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
the mesos master has given up on the slave at this point and if the slave process starts up again on the same node it will get a new slave ID. all prior tasks are recorded as LOST so the scheduler should delete the related pods so that they can be rescheduled
The text was updated successfully, but these errors were encountered: