Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman restart support #2272

Open
alvinsw opened this issue May 24, 2021 · 16 comments
Open

Podman restart support #2272

alvinsw opened this issue May 24, 2021 · 16 comments
Labels
area/provider/podman Issues or PRs related to podman kind/external upstream bugs kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alvinsw
Copy link

alvinsw commented May 24, 2021

What happened: Cluster does not work anymore after podman container is restarted (eg after host OS boot). The issue is fixed foe docker (#148). Is there a plan to support restart for podman in the near future?

What you expected to happen: Cluster should run again after restarting podman container

How to reproduce it (as minimally and precisely as possible):

kind create cluster
podman stop kind-control-plane
podman start kind-control-plane

Anything else we need to know?:

Environment:

  • kind version: (use kind version): 0.11.0
  • Kubernetes version: (use kubectl version): kindest/node:v1.21.1
  • Docker version: (use docker info): podman version 3.1.2
  • OS (e.g. from /etc/os-release): Latest ArchLinux
@alvinsw alvinsw added the kind/bug Categorizes issue or PR as related to a bug. label May 24, 2021
@aojea
Copy link
Contributor

aojea commented May 24, 2021

podman doesn't handle restarts by design, it needs to use systemd files for managing containers on restarts.

https://github.com/containers/podman/blob/master/docs/source/markdown/podman-generate-systemd.1.md

Bear in mind that KIND wraps these containers technologies, if docker supports something out of the box and podman doesn't, is not likely that KIND is going to workaround it, it is by far out of scope of the project, however we work close and have a good relationship with both projects, collaborating and opening bugs if necessary.

Are you running podman as rootless?
If podman support is "experimental", rootless is even "more experimental", so all the "advance" features may have bugs or simply not be supported at all ...

@aojea aojea added area/provider/podman Issues or PRs related to podman kind/feature Categorizes issue or PR as related to a new feature. kind/external upstream bugs and removed kind/bug Categorizes issue or PR as related to a bug. labels May 24, 2021
@BenTheElder
Copy link
Member

Podman also lacks a stable container network identifier which makes managing Kubernetes nodes across restarts problematic.

I don't think anyone is planning to work on this feature or has a plan for how it might be possible.

@alvinsw
Copy link
Author

alvinsw commented May 25, 2021

No, I am running podman as root. that is kind create cluster is run by root user.
Minikube supports podman and it can still do cluster start and stop using podman.
What makes kind different in this case?
After executing podman start kind-control-plane, can we just manually run a script on the running kind-control-plane container to start everything all over again?
Or would it be easier to add feature where all user data on kind-control-plane container is persisted in the host machine? This means if you delete and create cluster again, the new cluster will still have all the k8s objects from the previous cluster.

@BenTheElder
Copy link
Member

Minikube supports podman and it can still do cluster start and stop using podman.

Minikube supports podman and docker using a fork of the kind image yes.

What makes kind different in this case?

We don't work on that project. I don't work on podman support either. I can't tell you.

But I can tell you that podman lacks automatic restart for containers and lacks sufficient networking features to design robust restart. Node addresses will be random and restart support will be a roll of the dice. Stop and start is not what we mean when we say docker has restart support and has a different tracking issue that nobody has contributed to investigating this far. #1867

After executing podman start kind-control-plane, can we just manually run a script on the running kind-control-plane container to start everything all over again?

You're welcome to try but we have no such script.

Or would it be easier to add feature where all user data on kind-control-plane container is persisted in the host machine? This means if you delete and create cluster again, the new cluster will still have all the k8s objects from the previous cluster

Kubeadm doesn't support this AIUI. You can't just persist all data and then start a new cluster with it.

When stopping and starting or in docker restarting the data is persisted on any anonymous volume already. But not across clusters.

We are focused on making starting clusters cheap and quick so tests can be run from a clean state. We don't recommend keeping clusters permanently.

@vugardzhamalov
Copy link

Thank you @BenTheElder for explaining things in the earlier post!

Do you think it will be (or maybe it is already) possible to declare required parameters in the config YAML file? Say if I wan to restart a multi-node cluster running on podman - in addition to the number of nodes I could declare static IP addresses per node... and so on. In other words if podman doesn't provide this functionality is there any way to allow users to make further configuration changes in order to compensate?

@secustor
Copy link

You have to use podman restart kind-control-plane.

podman start does not reattach the port forwarding.
Interestingly after an implicit stop, like rebooting, you have to start it and then restarting to make it work.

@benoitf
Copy link

benoitf commented Dec 9, 2022

Hi @BenTheElder could you explain the "Node addresses will be random and restart support will be a roll of the dice"

I created an issue in Podman repository to be able to handle kind requirements but it's not clear what Kind is expecting from Podman side.
containers/podman#16797

@BenTheElder
Copy link
Member

Podman networking has changed a lot over the past few years but historically container IPs are random on startup and podman lacked an equivalent mechanism to docker's embedded DNS resolver with resolvable container names.

I don't think it's appropriate to file a bug against podman for kind unless there's a specific bug.

As you saw in #2998 the other reason we have't had a restart policy for podman is podman didn't support them meaningfully. That has changed a bit.

@tppalani
Copy link

tppalani commented Jan 9, 2024

Hi @alvinsw

Even i'm also facing same error, after creating kind cluster using podman, when we are restring podman stop and start kind cluster not able to reach target endpoint. Almost we migrated docker podman around 1000 developers machine, this is something high priority. please let me if you get any workaround for this.

this is my support ticket - #3473

@BenTheElder
Copy link
Member

Almost we migrated docker podman around 1000 developers machine, this is something high priority.

Unfortunately podman and docker are NOT direct substitutes and we don't have the bandwidth to spend on this ourselves currently.

In your issue, the containers are failing to start outright, at which point no kind code is even running, only podman/crun.


We'll continue to review suggested approaches to improving podman implementation in kind and the subsequent PRs.

Related: I think podman has had optional support for resolving container names for a while now, we could consider making this a pre-requesite and matching the docker behavior more closely.

@ehdis
Copy link

ehdis commented May 31, 2024

I noticed that on a current podman setup the stop command goes over to a SIGKILL of the container. The systemd in the control-plane container waits by it self for a process (in my case containerd) that do not get stopped 1m30s - but the above podman stop command sends SIGKILL after 10s. Its obvious what that means.

The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ...

Better would be, to check the cause of containerd not returning immediately when stopped.

@BenTheElder
Copy link
Member

but the above podman stop command sends SIGKILL after 10s. Its obvious what that means.

That's not obvious to me, SIGKILL is not even the right signal to tell systemd to exit. https://systemd.io/CONTAINER_INTERFACE/

The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ...

We could do that, it seems like a behavioral gap versus docker and we should investigate what the actual behavior difference is and try to align them.

Help would be welcome identifying what is happening with docker nodes that isn't happening with podman nodes (or perhaps you're running a workload that inhibits shutdown?)

@ehdis
Copy link

ehdis commented May 31, 2024

Just to clarify, podman stop sends the signal that the container has configured (StopSignal) or the default SIGTERM. After the default timeout of 10s it sends the SIGKILL.

You are right, systemd/init containers should receive a different signal (37/SIGRTMIN+3). Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument. Looking into my control-plane container it looks like that kind (v0.23.0) does not set the right signal (--stop-signal=37) to stop systemd. But the systemd process does the shutdown also with the used SIGTERM signal, so far. Not sure if it would make a difference. A quick test with podman kill --signal=37 control-plane does not show one.

My current problem is that the shutdown hangs here, and continues after the systemd internal timeout (1min 30s):

...
[  OK  ] Removed slice kubelet-kubepods-burstable-pod2954d591_64df_47ec_ac40_236a…ntainer kubelet-kubepods-burstable-pod2954d591_64df_47ec_ac40_236a244177b6.slice.
[  OK  ] Removed slice kubelet-kubepods-burstable-pod50dc3cdf_24ed_44a0_9d5d_9881…ntainer kubelet-kubepods-burstable-pod50dc3cdf_24ed_44a0_9d5d_988129d2591e.slice.
[ ***  ] (2 of 2) Job cri-containerd-3f1ea75a93823c1ffaece11518a124ec8950fcbc7cf9cdaac6fd00c2a415e8dd.scope/stop running (47s / 1min 30s)

And this is just a kind test cluster (single node) with a deployment of httpd:latest (replicas: 2) - thats all.

Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ...

@BenTheElder
Copy link
Member

Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument.

We set this in the image.

@BenTheElder
Copy link
Member

Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ...

It might not, but we should not set different flags in podman versus docker without understanding if we're working around a difference in functionality, on the surface they're supposed to be compatible and kind is less useful as a test tool when the behavior isn't consistent.

.. so before doing that, we want to understand if this is an expected difference in behavior, or if we're only working around a podman bug, or if it affects both and we're only mitigating podman but not docker.

So far, I have not seen clusters fail to terminate, which suggests a difference in behavior that is possibly a bug OR it's because of something you're running in the cluster (or something different with your host).

Ideally we'd reproduce and isolate which aspect (your config, your host, your workload, docker vs podman) is causing the nodes to not exit and deal with the root issue instead of changing the behavior of kind podman nodes to work around an issue we don't understand and haven't seen before.

@ehdis
Copy link

ehdis commented Jun 1, 2024

Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument.

We set this in the image.

Ooh, I don't know where I looked, definitely not at the right place ... its set :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/podman Issues or PRs related to podman kind/external upstream bugs kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants