Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child process gets killed by SIGKILL after using cgroup v1 API #40

Closed
Mossaka opened this issue Dec 14, 2022 · 3 comments · Fixed by #142
Closed

child process gets killed by SIGKILL after using cgroup v1 API #40

Mossaka opened this issue Dec 14, 2022 · 3 comments · Fixed by #142
Labels
bug Something isn't working

Comments

@Mossaka
Copy link
Member

Mossaka commented Dec 14, 2022

Containerd Log

time="2022-12-14T09:30:34.243727563Z" level=info msg="CreateContainer within sandbox \"f450952a49060dbe6756fb3638705b7c66404b38b0877741ac319e4edcb825f9\" for container &ContainerMetadata{Name:traefik,Attempt:0,}"
time="2022-12-14T09:30:34.280628605Z" level=info msg="CreateContainer within sandbox \"f450952a49060dbe6756fb3638705b7c66404b38b0877741ac319e4edcb825f9\" for &ContainerMetadata{Name:traefik,Attempt:0,} returns container id \"2a3087458a40f98ef65bbe454da5d84a379f03c1a1e1b19b9b57fd1e3e9885dc\""
time="2022-12-14T09:30:34.281158713Z" level=info msg="StartContainer for \"2a3087458a40f98ef65bbe454da5d84a379f03c1a1e1b19b9b57fd1e3e9885dc\""
time="2022-12-14T09:30:34.350862636Z" level=info msg="StartContainer for \"2a3087458a40f98ef65bbe454da5d84a379f03c1a1e1b19b9b57fd1e3e9885dc\" returns successfully"
time="2022-12-14T09:30:41.365630407Z" level=info msg="CreateContainer within sandbox \"cb2719f323623808ff663e1d0e409530a160cb62e702d0a1c3bc8670046e57fd\" for container &ContainerMetadata{Name:testwasm,Attempt:2,}"
time="2022-12-14T09:30:41.417023160Z" level=info msg="CreateContainer within sandbox \"cb2719f323623808ff663e1d0e409530a160cb62e702d0a1c3bc8670046e57fd\" for &ContainerMetadata{Name:testwasm,Attempt:2,} returns container id \"6ebb8cc29b333a124661983fba2dec5c82e4fc32c9a484212de41e4a3fa1e06e\""
time="2022-12-14T09:30:41.417626869Z" level=info msg="StartContainer for \"6ebb8cc29b333a124661983fba2dec5c82e4fc32c9a484212de41e4a3fa1e06e\""
[INFO] starting instance
[INFO] preparing module
[INFO] opening rootfs
[INFO] setting up wasi
[INFO] opening stdin
[INFO] opening stdout
[INFO] opening stderr
[INFO] building wasi context
[INFO] wasi context ready
[INFO] loading module from file
[INFO] instantiating instnace
[INFO] getting start function
[INFO] starting wasi instance
[INFO] started wasi instance with tid 1794
time="2022-12-14T09:30:41.559211243Z" level=info msg="StartContainer for \"6ebb8cc29b333a124661983fba2dec5c82e4fc32c9a484212de41e4a3fa1e06e\" returns successfully"
[INFO] child 1794 killed by signal SIGKILL, dumped: false
[INFO] wasi instance exited with status 137
time="2022-12-14T09:30:43.108591141Z" level=info msg="shim disconnected" id=6ebb8cc29b333a124661983fba2dec5c82e4fc32c9a484212de41e4a3fa1e06e
time="2022-12-14T09:30:43.108722243Z" level=warning msg="cleaning up after shim disconnected" id=6ebb8cc29b333a124661983fba2dec5c82e4fc32c9a484212de41e4a3fa1e06e namespace=k8s.io
time="2022-12-14T09:30:43.108732343Z" level=info msg="cleaning up dead shim"
time="2022-12-14T09:30:44.500146327Z" level=info msg="RemoveContainer for \"82de028e9dba19dfe45615e0efaa1e73cf35d05734b09aade8489485c5f48a84\""
time="2022-12-14T09:30:44.517400480Z" level=info msg="RemoveContainer for \"82de028e9dba19dfe45615e0efaa1e73cf35d05734b09aade8489485c5f48a84\" returns successfully"
time="2022-12-14T09:31:12.364643823Z" level=info msg="CreateContainer within sandbox \"cb2719f323623808ff663e1d0e409530a160cb62e702d0a1c3bc8670046e57fd\" for container &ContainerMetadata{Name:testwasm,Attempt:3,}"
time="2022-12-14T09:31:12.398472900Z" level=info msg="CreateContainer within sandbox \"cb2719f323623808ff663e1d0e409530a160cb62e702d0a1c3bc8670046e57fd\" for &ContainerMetadata{Name:testwasm,Attempt:3,} returns container id \"0101352d7327f58fc458166c0df7ce439528db33bd5006da002e69bb33d218d0\""
time="2022-12-14T09:31:12.398916606Z" level=info msg="StartContainer for \"0101352d7327f58fc458166c0df7ce439528db33bd5006da002e69bb33d218d0\""
[INFO] starting instance
[INFO] preparing module
[INFO] opening rootfs
[INFO] setting up wasi
[INFO] opening stdin
[INFO] opening stdout
[INFO] opening stderr
[INFO] building wasi context
[INFO] wasi context ready
[INFO] loading module from file
[INFO] instantiating instnace
[INFO] getting start function
[INFO] starting wasi instance
[INFO] started wasi instance with tid 1862
time="2022-12-14T09:31:12.528460632Z" level=info msg="StartContainer for \"0101352d7327f58fc458166c0df7ce439528db33bd5006da002e69bb33d218d0\" returns successfully"
[ERROR] error waiting for pid 1862: ECHILD: No child processes

Notice that there is a log message says "[INFO] child 1794 killed by signal SIGKILL, dumped: false"

How to reproduce?

Setup a k3d cluster image follow the steps in https://github.com/deislabs/containerd-wasm-shims/tree/main/deployments/k3d. Replace the spin & slight shim with wasmtime shim in "config.toml.tmpl"

[plugins.cri.containerd.runtimes.wasmtime]
  runtime_type = "io.containerd.wasmtime.v1"

Once the k3d cluster image is created, we can create a k3d cluster by running
k3d cluster create k3s-default --image k3swithshim --api-port 6550 -p "8081:80@loadbalancer" --agents 1

Then apply the following workloads

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: wasmtime
handler: wasmtime
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wasm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: wasm
  template:
    metadata:
      labels:
        app: wasm
    spec:
      runtimeClassName: wasmtime
      containers:
        - name: testwasm
          image: docker.io/mossaka/wasmtest:2
@Mossaka Mossaka added the bug Something isn't working label Dec 14, 2022
@Mossaka
Copy link
Member Author

Mossaka commented Dec 14, 2022

@cpuguy83 and I briefly investigated this issue and we think it is highly possibly that the issue is caused by cgroup setup. The environment I am in when encountered this issue is using cgroup v1:

/ # mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755,inode64)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,misc)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

@ipuustin
Copy link
Contributor

For cgroup v1, I didn't get the cgroup test to pass. Adding a bit of logging, it seems setting the memory.kmem.limit_in_bytes value fails for some reason. I don't know if this could be related to the original issue though.

---- sandbox::cgroups::tests::test_cgroup stdout ----
Opened file: /sys/fs/cgroup/memory/containerd-wasm-shim-test_cgroup392053437/memory.limit_in_bytes
Opened file: /sys/fs/cgroup/memory/containerd-wasm-shim-test_cgroup392053437/memory.memsw.limit_in_bytes
Opened file: /sys/fs/cgroup/memory/containerd-wasm-shim-test_cgroup392053437/memory.soft_limit_in_bytes
Opened file: /sys/fs/cgroup/memory/containerd-wasm-shim-test_cgroup392053437/memory.swappiness
Opened file: /sys/fs/cgroup/memory/containerd-wasm-shim-test_cgroup392053437/memory.kmem.limit_in_bytes
thread 'sandbox::cgroups::tests::test_cgroup' panicked at 'called `Result::unwrap()` on an `Err` value: Others("failed to apply 
cgroup: error writing cgroup values: Operation not supported (os error 95)")', crates/containerd-shimwasm/src/sandbox/cgroups/mod.rs:374:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I tried creating a cgroup directory by hand, but writing to the memory.kmem.limit_in_bytes seems to fail also from the command line, so this could very well just be an issue on my system (kernel 6.1.6-200.fc37.x86_64).

@Mossaka Mossaka linked a pull request Jun 20, 2023 that will close this issue
4 tasks
@Mossaka
Copy link
Member Author

Mossaka commented Jun 20, 2023

This might no longer be an issue as we are moving to use youki's APIs. I am okay to close this one if #142 merges in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants