Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: kube stop cluster and kube start cluster #1867

Closed
felipecrs opened this issue Sep 24, 2020 · 16 comments
Closed

New command: kube stop cluster and kube start cluster #1867

felipecrs opened this issue Sep 24, 2020 · 16 comments
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@felipecrs
Copy link
Contributor

felipecrs commented Sep 24, 2020

This command would simply run docker stop and docker start against all the nodes. Despite I can do it by myself, they seem to restart automatically when stopped. Perhaps kind is creating the containers with --restart always instead of --restart unless-stopped?

It would be better to exist in kind since multiple clusters can co-exist, and kind knows exactly what are the containers which belong to a given cluster.

I use kind in my development environment, which has limited resources. I have a testing cluster set up that I would not like to lose, but I don't use it always, so I could simply start the cluster as needed and keep my laptop cold otherwise. :)

@felipecrs felipecrs added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 24, 2020
@BenTheElder BenTheElder added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Sep 24, 2020
@BenTheElder
Copy link
Member

kind is not setting restart = always.

// https://docs.docker.com/engine/reference/commandline/run/#restart-policies---restart
//
// What we desire is:
// - restart on host / dockerd reboot
// - don't restart for any other reason
//
// This means:
// - no is out of the question ... it never restarts
// - always is a poor choice, we'll keep trying to restart nodes that were
// never going to work
// - unless-stopped will also retry failures indefinitely, similar to always
// except that it won't restart when the container is `docker stop`ed
// - on-failure is not great, we're only interested in restarting on
// reboots, not failures. *however* we can limit the number of retries
// *and* it forgets all state on dockerd restart and retries anyhow.
// - on-failure:0 is what we want .. restart on failures, except max
// retries is 0, so only restart on reboots.
// however this _actually_ means the same thing as always
// so the closest thing is on-failure:1, which will retry *once*
"--restart=on-failure:1",

You can trivially run start/stop against the node containers yourself (kind get nodes | xargs docker stop), but I think you'll find that doesn't behave as you'd expect because of the nested containers.

The podman backend also currently does not support restart.

@BenTheElder
Copy link
Member

I use kind in my development environment, which has limited resources. I have a testing cluster set up that I would not like to lose, but I don't use it always, so I could simply start the cluster as needed and keep my laptop could otherwise. :)

I would strongly prefer to improve the experience of creating new clusters anyhow though. We do not want users becoming highly attached to their kind clusters. Testing should be from a clean state and critical data should not be stored permanently in these clusters. They should start quickly and be disposable.

@felipecrs
Copy link
Contributor Author

I wonder if it's possible to change the restart policy dynamically.

@BenTheElder
Copy link
Member

BenTheElder commented Sep 25, 2020 via email

@felipecrs
Copy link
Contributor Author

Actually, it's doable:

docker update --restart=unless-stopped kind-control-plane

https://docs.docker.com/engine/reference/commandline/update/#update-a-containers-restart-policy

But for some reason it fails:

$ docker update --restart=unless-stopped kind-control-plane
Error response from daemon: Cannot update container 2912eb63c333ce9395d428452474e7d7109167aeffb47196a6484c741705cfcd: runc did not terminate sucessfully: failed to write "a *:* rwm" to "/sys/fs/cgroup/devices/docker/2912eb63c333ce9395d428452474e7d7109167aeffb47196a6484c741705cfcd/devices.allow": write /sys/fs/cgroup/devices/docker/2912eb63c333ce9395d428452474e7d7109167aeffb47196a6484c741705cfcd/devices.allow: invalid argument
: unknown

@BenTheElder BenTheElder added the kind/design Categorizes issue or PR as related to design. label Sep 25, 2020
@BenTheElder
Copy link
Member

In any case on-failure should only be restarting when the container exits uncleanly, and it should only start once. unless-stopped is not a desirable policy, see the code comment above.

It also starts on bootup. If it stops and you reboot it will start, but that's the case with all restart policies except none.

@felipecrs
Copy link
Contributor Author

You're right. The unless-stopped does not help anyhow (the container is indeed restarted when the system starts).

So, for kind stop cluster, we need to first call docker update --restart=no kind-control-plane then docker stop kind control plane.

For kind start cluster:docker update --restart=on-failure:1 kind-control-plane and docker start kind-control-plane.

@felipecrs
Copy link
Contributor Author

However, for some reason docker update does not work with the kind node containers (they work in other containers though):

$ docker update --restart=no test
test
$ docker update --restart=no kind-control-plane
Error response from daemon: Cannot update container 2912eb63c333ce9395d428452474e7d7109167aeffb47196a6484c741705cfcd: runc did not terminate sucessfully: failed to write "a *:* rwm" to "/sys/fs/cgroup/devices/docker/2912eb63c333ce9395d428452474e7d7109167aeffb47196a6484c741705cfcd/devices.allow": write /sys/fs/cgroup/devices/docker/2912eb63c333ce9395d428452474e7d7109167aeffb47196a6484c741705cfcd/devices.allow: invalid argument
: unknown

@BenTheElder
Copy link
Member

BenTheElder commented Sep 25, 2020 via email

@BenTheElder
Copy link
Member

xref: #1913

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 3, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@DylanBowden
Copy link

DylanBowden commented Jul 1, 2022

FYI Running things the other way around works for me

  • docker stop kind control plane then
  • docker update --restart=no kind-control-plane

@BenTheElder
Copy link
Member

xref: #2715

there's been more work on restarts for docker (podman is lacking some functionality), there's more recent discussion in #2715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

5 participants