-
Notifications
You must be signed in to change notification settings - Fork 218
master reboot test is failing #262
Comments
There are a few different issues at play here - but the core issue seems to be that the checkpointer is relying on determining local state from the kubelet Unfortunately that kubelet api endpoint will just report state from the last time it was able to successfully contact an api-server. Essentially a cache of the last state reported to api. We didn't have this same issue with the old checkpointer - because it would determine "api is running" by reaching out to an api -- but that isn't reliable in multi-master, or for generic checkpointing. This is a rather disappointing discovery -- as it seems there is no way to easily determine the local pod state if an api-server is not available. Some options moving forward:
|
I am not too familiar with the checkpointer itself and the contraints that surround it but I believe that relying on liveness probes could be a decent solution. It is simple, users are familiar with the concept and they would expect the checkpointer to rely on them. It is also user customizable in a way. Additionally, the project also vendors Kubernetes so the functions that achieve that could be re-used directly (less code to maintain, better integration overall). The downside is that if we do not want to add the requirement of defining probes, this is only part of the solution. |
The other part of this issue is that api-server behavior has changed in v1.5: In earlier versions, the api-server would continually re-try to bind to particular addresses (:443 & :8080). If they were already in use, it would just try again in 15 seconds. The behavior in v1.5 is that the api-server will exit immediately if it is not able to listen on those addresses (and would rely on external mechanisms of systemd/kubelet/etc to restart it again). In terms of the checkpointer - we need a reliable way to determine "real api-server is running, or it is trying to be run". The "trying to be run" is important, because we need to remove an active checkpoint in this situation, so the real server can actually start (and bind on 443/8080). Even if we check the liveness probe, we don't know "what" we are checking (someone happens to be listening on 8080). It could be an active checkpoint, or it could be the active parent. All we get to determine is that one of them happens to be running - but we can't make super reliable / actionable decisions based on just that information (I think it will work to add the liveness check - just not in a particularly clean way). For example, the issue I was seeing was:
One other option which comes to mind after typing this: ensuring that the local docker state (of failed pods) exists for a longer window ( I think this might help because I believe the kubelet determines what pods it needs to restart by inspecting information serialized into the docker containers (otherwise a reboot of a kubelet would mean all local state is lost until api-server is available). So my hunch is that we have an issue where the api-server pod is being garbage collected in the window before kubelet knows to restart it. But if we leave this sitting around longer -- we could have a better recovery window. |
Will experiment waiting for a file lock on the API server start. |
For posterity: There is only minimal state which is actually stored with the docker containers: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/labels.go#L70 So it's not actually possible to recover local state from this info alone (this is mapped to the internal kubelet pod state). Another option @Quentin-M and I came up with is to have the api-server use file locks to coordinate between the parent / checkpoint. This way we don't end up in failure loops when both are running, but only one can successfully listen on host address. |
Should be closed by: #264 |
This commit represents a workaround for kubernetes-retired#262. By maintaining a file lock while the API server is running (either temporary or self-hosted), we prevent the self-hosted API server from starting and trying to bind ports, until the temporary one is stopped. Therefore, we avoid the loop where the self-hosted API server would crash as soon as it is brought up due to the ports already being bound by the stopping temporary server.
This seemed to slip into the master branch during when the tests were temporarily broken over some of the self-hosted-flannel changes. If I reboot a master node, the api-server doesn't come back up. @aaronlevy is already on it and suspects what the problem is.
The text was updated successfully, but these errors were encountered: