-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Podman restart support #2272
Comments
podman doesn't handle restarts by design, it needs to use systemd files for managing containers on restarts. Bear in mind that KIND wraps these containers technologies, if docker supports something out of the box and podman doesn't, is not likely that KIND is going to workaround it, it is by far out of scope of the project, however we work close and have a good relationship with both projects, collaborating and opening bugs if necessary. Are you running podman as rootless? |
Podman also lacks a stable container network identifier which makes managing Kubernetes nodes across restarts problematic. I don't think anyone is planning to work on this feature or has a plan for how it might be possible. |
No, I am running podman as root. that is |
Minikube supports podman and docker using a fork of the kind image yes.
We don't work on that project. I don't work on podman support either. I can't tell you. But I can tell you that podman lacks automatic restart for containers and lacks sufficient networking features to design robust restart. Node addresses will be random and restart support will be a roll of the dice. Stop and start is not what we mean when we say docker has restart support and has a different tracking issue that nobody has contributed to investigating this far. #1867
You're welcome to try but we have no such script.
Kubeadm doesn't support this AIUI. You can't just persist all data and then start a new cluster with it. When stopping and starting or in docker restarting the data is persisted on any anonymous volume already. But not across clusters. We are focused on making starting clusters cheap and quick so tests can be run from a clean state. We don't recommend keeping clusters permanently. |
Thank you @BenTheElder for explaining things in the earlier post! Do you think it will be (or maybe it is already) possible to declare required parameters in the config YAML file? Say if I wan to restart a multi-node cluster running on podman - in addition to the number of nodes I could declare static IP addresses per node... and so on. In other words if podman doesn't provide this functionality is there any way to allow users to make further configuration changes in order to compensate? |
You have to use
|
Hi @BenTheElder could you explain the "Node addresses will be random and restart support will be a roll of the dice" I created an issue in Podman repository to be able to handle kind requirements but it's not clear what Kind is expecting from Podman side. |
Podman networking has changed a lot over the past few years but historically container IPs are random on startup and podman lacked an equivalent mechanism to docker's embedded DNS resolver with resolvable container names. I don't think it's appropriate to file a bug against podman for kind unless there's a specific bug. As you saw in #2998 the other reason we have't had a restart policy for podman is podman didn't support them meaningfully. That has changed a bit. |
Hi @alvinsw Even i'm also facing same error, after creating kind cluster using podman, when we are restring podman stop and start kind cluster not able to reach target endpoint. Almost we migrated docker podman around 1000 developers machine, this is something high priority. please let me if you get any workaround for this. this is my support ticket - #3473 |
Unfortunately podman and docker are NOT direct substitutes and we don't have the bandwidth to spend on this ourselves currently. In your issue, the containers are failing to start outright, at which point no kind code is even running, only podman/crun. We'll continue to review suggested approaches to improving podman implementation in kind and the subsequent PRs. Related: I think podman has had optional support for resolving container names for a while now, we could consider making this a pre-requesite and matching the docker behavior more closely. |
I noticed that on a current podman setup the stop command goes over to a SIGKILL of the container. The systemd in the control-plane container waits by it self for a process (in my case containerd) that do not get stopped 1m30s - but the above podman stop command sends SIGKILL after 10s. Its obvious what that means. The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ... Better would be, to check the cause of containerd not returning immediately when stopped. |
That's not obvious to me, SIGKILL is not even the right signal to tell systemd to exit. https://systemd.io/CONTAINER_INTERFACE/
We could do that, it seems like a behavioral gap versus docker and we should investigate what the actual behavior difference is and try to align them. Help would be welcome identifying what is happening with docker nodes that isn't happening with podman nodes (or perhaps you're running a workload that inhibits shutdown?) |
Just to clarify, podman stop sends the signal that the container has configured (StopSignal) or the default SIGTERM. After the default timeout of 10s it sends the SIGKILL. You are right, systemd/init containers should receive a different signal (37/SIGRTMIN+3). Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument. Looking into my control-plane container it looks like that kind (v0.23.0) does not set the right signal (--stop-signal=37) to stop systemd. But the systemd process does the shutdown also with the used SIGTERM signal, so far. Not sure if it would make a difference. A quick test with podman kill --signal=37 control-plane does not show one. My current problem is that the shutdown hangs here, and continues after the systemd internal timeout (1min 30s):
And this is just a kind test cluster (single node) with a deployment of httpd:latest (replicas: 2) - thats all. Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ... |
We set this in the image. |
It might not, but we should not set different flags in podman versus docker without understanding if we're working around a difference in functionality, on the surface they're supposed to be compatible and kind is less useful as a test tool when the behavior isn't consistent. .. so before doing that, we want to understand if this is an expected difference in behavior, or if we're only working around a podman bug, or if it affects both and we're only mitigating podman but not docker. So far, I have not seen clusters fail to terminate, which suggests a difference in behavior that is possibly a bug OR it's because of something you're running in the cluster (or something different with your host). Ideally we'd reproduce and isolate which aspect (your config, your host, your workload, docker vs podman) is causing the nodes to not exit and deal with the root issue instead of changing the behavior of kind podman nodes to work around an issue we don't understand and haven't seen before. |
Ooh, I don't know where I looked, definitely not at the right place ... its set :-) |
What happened: Cluster does not work anymore after podman container is restarted (eg after host OS boot). The issue is fixed foe docker (#148). Is there a plan to support restart for podman in the near future?
What you expected to happen: Cluster should run again after restarting podman container
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kind version
): 0.11.0kubectl version
): kindest/node:v1.21.1docker info
): podman version 3.1.2/etc/os-release
): Latest ArchLinuxThe text was updated successfully, but these errors were encountered: