Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadlet kube support has issues with cgroup setup #16589

Open
alexlarsson opened this issue Nov 23, 2022 · 20 comments
Open

Quadlet kube support has issues with cgroup setup #16589

alexlarsson opened this issue Nov 23, 2022 · 20 comments

Comments

@alexlarsson
Copy link
Contributor

When handling a regular *.container file, quadlet generates a run command with --cgroups=split. This generates a cgroup hierarchy like this:

├─system.slice (#60)
│ ├─quadlet-minimal.service … (#6284586)
│ │ ├─libpod-payload-91c57c2ae87a8d57123bbb3083b3e158b911c4fd485da131f636f83f979b092c (#6284854)
│ │ │ ├─821732 /run/podman-init -- sleep 60
│ │ │ └─821755 sleep 60
│ │ └─runtime (#6284787)
│ │   └─821728 /usr/bin/conmon --api-version 1 -c 91c57c2ae87a8d57123bbb3083b3e158b911c4fd485da131f636f83f979b092c -u 91c57c2ae87a8d57123bbb3083b3e158b911c4fd485da131f636f83f979b092c -r /usr/bin/crun -b /var/lib/containers/storage/overla>

In other words, all the processes in the service is part of the toplevel quadlet-minimal.service cgroup, which means systemd is aware of them being part of this service.

However, podman kube play doesn't have this, and currently if you run a .kube service the resulting cgroup tree is something like this:

├─system.slice (#60)
│ ├─kube-example.service (#6286295)
│ │ ├─822322 /usr/bin/conmon --api-version 1 -c 407c6dbfe5cec023d18d0f239f7da1de8055ac0ec56a5ed6176fafd0697b7938 -u 407c6dbfe5cec023d18d0f239f7da1de8055ac0ec56a5ed6176fafd0697b7938 -r /usr/bin/crun -b /var/lib/containers/storage/overlay->
│ │ ├─822463 /usr/bin/conmon --api-version 1 -c 241631bf7841ef3194c3b93614409fae56ed4caeab8f43e781202b2af8d6f3e7 -u 241631bf7841ef3194c3b93614409fae56ed4caeab8f43e781202b2af8d6f3e7 -r /usr/bin/crun -b /var/lib/containers/storage/overlay->
│ │ └─822508 /usr/bin/conmon --api-version 1 -c c73f49226551922c7b259c5e3274a7a827959468da466f3f453c55a226563217 -u c73f49226551922c7b259c5e3274a7a827959468da466f3f453c55a226563217 -r /usr/bin/crun -b /var/lib/containers/storage/overlay->
└─machine.slice (#1150)
  ├─machine-libpod_pod_6843607a91b26302e3d72458f74ffebc3302d36e84e700df955218dd9f6726ea.slice (#6286362)
  │ ├─libpod-conmon-aa3b31f0f3c96797726ae69a823f48e448dd044242daacbec4e5a9142662dea9.scope … (#6287139)
  │ │ └─827396 /usr/bin/conmon --api-version 1 -c aa3b31f0f3c96797726ae69a823f48e448dd044242daacbec4e5a9142662dea9 -u aa3b31f0f3c96797726ae69a823f48e448dd044242daacbec4e5a9142662dea9 -r /usr/bin/crun -b /var/lib/containers/storage/overla>
  │ ├─libpod-241631bf7841ef3194c3b93614409fae56ed4caeab8f43e781202b2af8d6f3e7.scope … (#6286737)
  │ │ └─container (#6286790)
  │ │   └─822466 /catatonit -P
  │ ├─libpod-aa3b31f0f3c96797726ae69a823f48e448dd044242daacbec4e5a9142662dea9.scope … (#6316150)
  │ │ └─container (#6316217)
  │ │   └─827398 mysqld
  │ └─libpod-c73f49226551922c7b259c5e3274a7a827959468da466f3f453c55a226563217.scope … (#6287005)
  │   └─container (#6287072)
  │     ├─822510 apache2 -DFOREGROUND
  │     ├─822535 apache2 -DFOREGROUND
  │     ├─822536 apache2 -DFOREGROUND
  │     ├─822537 apache2 -DFOREGROUND
  │     ├─822538 apache2 -DFOREGROUND
  │     └─822539 apache2 -DFOREGROUND
  └─libpod-407c6dbfe5cec023d18d0f239f7da1de8055ac0ec56a5ed6176fafd0697b7938.scope … (#6286469)
    └─container (#6286536)
      └─822324 /catatonit -P

This means that system isn't aware of any processes except the toplevel conmon processes. We really should implement --cgroup=split also for play kube, and have all the cgroup hierarchies that it currently puts in machine.slice be a subdirectory of the kube-example.service cgroup.

This also affects the KillMode= option that quadlet handles for .kube files. It currently limits things to control-group or mixed. But this is not really correct, as control-group means it will just kill the conmon instances (as they are the only thing in the cgroup), which will leak all the other processes. However, once the above is fixed this should be ok.

@rhatdan
Copy link
Member

rhatdan commented Nov 23, 2022

Should we implement

--cgroup-parent and or --cgroupns for podman kube play?

@giuseppe @mheon WDYT?

@vrothberg
Copy link
Member

As mentioned in the other issue (#16592 (comment)), we can set these things along with --service-container.

@vrothberg
Copy link
Member

Reasoning: --service-container is only used and meant when running inside a systemd unit. So we can very well set reasonable defaults for this use case.

@rhatdan
Copy link
Member

rhatdan commented Nov 23, 2022

So basically quadlet kubesupport defaults to running with cgroups=split? And --log-level=passthrough.

@rhatdan
Copy link
Member

rhatdan commented Nov 23, 2022

No options required for podman kube play

@vrothberg
Copy link
Member

vrothberg commented Nov 23, 2022

No options required for podman kube play

Yes. Setting these defaults make sense to me. If other users desire different defaults, we can make it configurable in the future.

@rhatdan
Copy link
Member

rhatdan commented Nov 23, 2022

But bottom line we are going to need
podman kube play --cgroups=split ...

@vrothberg
Copy link
Member

vrothberg commented Nov 23, 2022

No. That's exactly my point.

We are free to set different defaults when running in systemd. And we know that we're running in systemd when the --service-container flag (and when PODMAN_SYSTEMD_UNIT) is set.

@vrothberg
Copy link
Member

if Options.ServiceContainer {
    // Create pod/container with cgroups=split
}

@alexlarsson
Copy link
Contributor Author

What we do need is the backing code for --cgroups=split though.

@vrothberg
Copy link
Member

What we do need is the backing code for --cgroups=split though.

Yes. We need to add it but it doesn't need to be configurable on the CLI. My hope is that this will keep things simple. It's probably best to make all these choices for users.

@rhatdan
Copy link
Member

rhatdan commented Nov 23, 2022

Ok I will change my patch to pay attention to the --service-container flag, and eliminate the options. Most of the patch is not the CLI stuff anyways.

@rhatdan
Copy link
Member

rhatdan commented Dec 6, 2022

Since we are relying on the containers.conf, the read-only PR that is merged into containers.conf, just needs to be worked into Podman at this point, so this can move forward without waiting for me.

@github-actions
Copy link

github-actions bot commented Jan 6, 2023

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 6, 2023

@alexlarsson @vrothberg any update on this?

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 29, 2023

@alexlarsson @ygalblum is this still an issue?

@xofyarg
Copy link

xofyarg commented Dec 29, 2023

Just want to follow up on this issue, as I don't see any RP linked to it, neither mentioned in release notes.

Ok I will change my patch to pay attention to the --service-container flag, and eliminate the options. Most of the patch is not the CLI stuff anyways.

@rhatdan do you know has the patch been merged already?

@rhatdan
Copy link
Member

rhatdan commented Dec 30, 2023

Nope, No idea. Could you check if the latest podman works with --service-container.

@xofyarg
Copy link

xofyarg commented Dec 30, 2023

I tried with 4.8.1 with no luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants