Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't restart kind-control-plane container on Windows #20254

Closed
rquinio1A opened this issue Oct 4, 2023 · 6 comments
Closed

Can't restart kind-control-plane container on Windows #20254

rquinio1A opened this issue Oct 4, 2023 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. remote Problem is in podman-remote stale-issue windows issue/bug on Windows

Comments

@rquinio1A
Copy link

rquinio1A commented Oct 4, 2023

Issue Description

When restarting the podman machine, I can't restart the kind-control-plane container.

There is an existing issue on kind: kubernetes-sigs/kind#2272

$ podman start kind-control-plane
Error: unable to start container "4056709e878b34ab7e6bae19c00a4527f9e81907fcb699a02a11435dbf4cf3d8": crun: writing file devices.allow: Invalid argument: OCI runtime error

The podman logs of the container is:

$ podman logs 4056709e878b34ab7e6bae19c00a4527f9e81907fcb699a02a11435dbf4cf3d8
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v1
INFO: detected cgroupns
INFO: removing misc controller
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 10.89.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
systemd 247.3-7+deb11u2 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization wsl.
Detected architecture x86-64.
Failed to create symlink /sys/fs/cgroup/net_prio: File exists
Failed to create symlink /sys/fs/cgroup/net_cls: File exists
Failed to create symlink /sys/fs/cgroup/cpuacct: File exists
Failed to create symlink /sys/fs/cgroup/cpu: File exists

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <kind-control-plane>.
Queued start job for default target Graphical Interface.
[  OK  ] Created slice slice used to run Kubernetes / Kubelet.
[  OK  ] Created slice system-modprobe.slice.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Set up automount Arbitrary…s File System Automount Point.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Sockets.
         Mounting Huge Pages File System...
         Mounting Kernel Debug File System...
         Mounting Kernel Trace File System...
         Starting Load Kernel Module configfs...
         Starting Load Kernel Module fuse...
         Starting Journal Service...
         Starting Load Kernel Modules...
         Starting Remount Root and Kernel File Systems...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted Kernel Trace File System.
[email protected]: Succeeded.
[  OK  ] Finished Load Kernel Module configfs.
[email protected]: Succeeded.
[  OK  ] Finished Load Kernel Module fuse.
systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
systemd-modules-load.service: Failed with result 'exit-code'.
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
[  OK  ] Finished Remount Root and Kernel File Systems.
         Mounting FUSE Control File System...
         Starting Apply Kernel Variables...
         Starting Create System Users...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Finished Apply Kernel Variables.
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Finished Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Flush Journal to Persistent Storage.
[  OK  ] Finished Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Basic System.
[  OK  ] Reached target Timers.
         Starting Undo KIND mount hacks...
[  OK  ] Finished Undo KIND mount hacks.
         Starting containerd container runtime...
[  OK  ] Started containerd container runtime.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Steps to reproduce the issue

Steps to reproduce the issue

  1. podman machine init podman-machine-default --rootful --user-mode-networking --memory 4048 --cpus 4 --disk-size 100
  2. podman machine start podman-machine-default
  3. export KIND_EXPERIMENTAL_PROVIDER=podman
  4. kind create cluster --config=scripts/kind-config.yaml

.kind-config.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
    - |
      kind: InitConfiguration
      nodeRegistration:
        kubeletExtraArgs:
          node-labels: "ingress-ready=true"
  extraPortMappings:
    - containerPort: 80
      hostPort: 80
      protocol: TCP
    - containerPort: 443
      hostPort: 443
      protocol: TCP
  1. podman machine stop podman-machine-default
  2. podman machine start podman-machine-default
  3. podman start kind-control-plane

Describe the results you received

Describe the results you receivedError: unable to start container "4056709e878b34ab7e6bae19c00a4527f9e81907fcb699a02a11435dbf4cf3d8": crun: writing file devices.allow: Invalid argument: OCI runtime error

Describe the results you expected

kind-control-plane container starts successfully.

podman info output

$ podman info
host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 93.48
    systemPercent: 2.16
    userPercent: 4.36
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: container
    version: "38"
  eventLogger: journald
  freeLocks: 2046
  hostname: NCEL110973
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.90.1-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: journald
  memFree: 15900639232
  memTotal: 16630345728
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc38.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc38.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.9.2-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.9.2
      commit: 35274d346d2e9ffeacb22cc11590b0266a23d634
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20230908.g05627dc-1.fc38.x86_64
    version: |
      pasta 0^20230908.g05627dc-1.fc38.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.fc38.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 4294967296
  swapTotal: 4294967296
  uptime: 0h 45m 21.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 1081101176832
  graphRootUsed: 6054662144
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.7.0
  Built: 1695839078
  BuiltTime: Wed Sep 27 20:24:38 2023
  GitCommit: ""
  GoVersion: go1.20.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.0

$ podman version
Client:       Podman Engine
Version:      4.6.2
$ wsl --version
WSL version: 1.2.5.0
$ kind version
kind v0.20.0 go1.20.4 windows/amd64

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Issue is specific to Windows (all my team is impacted), other colleague on MacOS doesn't have the issue.

Additional information

No response

@rquinio1A rquinio1A added the kind/bug Categorizes issue or PR as related to a bug. label Oct 4, 2023
@github-actions github-actions bot added the remote Problem is in podman-remote label Oct 4, 2023
@rquinio1A
Copy link
Author

rquinio1A commented Oct 4, 2023

Actually the issue happens only when kind-control-plane container is not stopped before shutting down the podman machine.

If I do:

1. podman stop kind-control-plane
2. podman machine stop
3. podman machine start
4. podman restart kind-control-plane

It works! After step 3 the status of the container is 'Exited'

Whereas with:

1. podman machine stop
2. podman machine start
3. podman restart kind-control-plane

After step 2. the status of the container is 'Created'.

So I suppose podman machine stop is not stopping the containers in a "graceful" way ?

@ashley-cui ashley-cui added the windows issue/bug on Windows label Oct 4, 2023
@ashley-cui
Copy link
Member

Hmm yeah, podman machine stop doesn't touch containers in any meaningful way, only the machine. @baude Do you think we should gracefully stop containers on machine stop, or do you think that should be left to the user?

@baude
Copy link
Member

baude commented Oct 5, 2023

this smells like a bug more than did it stop nicely or not. @mheon @Luap99 what say you?

@Luap99
Copy link
Member

Luap99 commented Oct 5, 2023

stopping not gracefully should never be an issue, starting could be. Podman depends on the alive file to know when to reset all the state on a fresh boot.

After a machine stop/start, please run podman machine ssh stat /run/libpod/alive do not run any other podman commands before that as they would create the alive file. We depend on tmpfs to destroy this file but that might not be the case here as WSL is speical.
If you run podman --log-level debug ps inside the VM as fist podman command you should see the
DEBU[0000] Podman detected system restart - performing state refresh log line which means it works correctly.

But I am guessing the file alive file stays around so podman does not detect a restart thus we end up in an undefined state.

Copy link

github-actions bot commented Nov 5, 2023

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 5, 2023

Since we never heard back, closing.

@rhatdan rhatdan closed this as completed Nov 5, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Feb 4, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. remote Problem is in podman-remote stale-issue windows issue/bug on Windows
Projects
None yet
Development

No branches or pull requests

5 participants