Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix pod cgroup lifecycle #19888

Merged
merged 10 commits into from
Sep 11, 2023
Merged

fix pod cgroup lifecycle #19888

merged 10 commits into from
Sep 11, 2023

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Sep 7, 2023

while investigating 19175, I've found a bunch of issues with the way we handle the pod cgroup:

  • The pod cgroup was not honoring the cgroup limits after a reboot
  • The cgroup once created was leaked, even if the pod is stopped
  • --infra=false was not creating any pod cgroup, and all the limits were ignored.

More details in the individual commits. There is some space for improvements, e.g. not trying to guess the systemd path but we would need to fix it in c/common first, we can do this incrementally later.

Closes: #19175

Does this PR introduce a user-facing change?

Now pod cgroup limits are honored after a reboot

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 7, 2023
@giuseppe giuseppe force-pushed the fix-pod-lifecycle branch 2 times, most recently from 9cf763b to 48e6b6b Compare September 7, 2023 13:22
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming happy tests.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions that might make it easier to diagnose test failures.

Speaking of test failures, the ones in this CI run look real but they're not in test code that I understand.

test/system/200-pod.bats Outdated Show resolved Hide resolved
test/system/200-pod.bats Outdated Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Sep 7, 2023

Changes LGTM

@TomSweeneyRedHat
Copy link
Member

Code changes LGTM, however, the tests are very red.

@giuseppe giuseppe force-pushed the fix-pod-lifecycle branch 5 times, most recently from 1085f2a to de31714 Compare September 8, 2023 11:03
@martinpitt
Copy link
Contributor

martinpitt commented Sep 8, 2023

This regresses cockpit-podman. The pod does not show mounted volumes any more . This is the first time these failed in podman, so we now need to learn how to work together to efficiently debug them.

But let's first see what the other tests say (many are still running) -- if they fail as well, then these are probably easier to debug for you.

@giuseppe
Copy link
Member Author

giuseppe commented Sep 8, 2023

This regresses cockpit-podman. The pod does not show mounted volumes any more . This is the first time these failed in podman, so we now need to learn how to work together to efficiently debug them.

I can take a look right now. Is there a way to replicate the issue from the podman CLI?

@giuseppe
Copy link
Member Author

giuseppe commented Sep 8, 2023

I am trying locally with cockpit now. Do you have a screenshot with the expected output? Or how was the pod created so I can test with a working podman version?

@martinpitt
Copy link
Contributor

A CLI reproducer does not automatically pop out, as c-podman uses the REST API. I (and the whole cockpit team) are travelling back from a sprint right now, and I don't currently have enough internet to investigate with the built COPR here, but I'll do that first thing on Monday.

But that shouldn't actually be that hard: High level, it creates a pod, and queries the Mounts property from the infrastructure.

systemctl --user start podman.socket
podman pod create -v /tmp:/hosttmp p1
curl --unix $XDG_RUNTIME_DIR/podman/podman.sock http://none/v1.12/libpod/pods/p1/json

from that, look at the mounts property, it should have one entry like

"mounts":[{"Type":"bind","Source":"/tmp","Destination":"/hosttmp","Driver":"","Mode":"","Options":["nosuid","nodev","rbind"],"RW":true,"Propagation":"rprivate"}]

That's not actually what cockpit-podman looks at for this test, though, but it 's a good first spot-check. For the "real" thing, get the infra container name from podman ps -a, and introspect that:

curl --unix $XDG_RUNTIME_DIR/podman/podman.sock http://none/v1.12/libpod/containers/2892d5ae6ccf-infra/json

replace 2892d5ae6ccf-infra with the actual name, of course. On current podman, this looks like

"Mounts":[{"Type":"bind","Source":"/tmp","Destination":"/hosttmp","Driver":"","Mode":"","Options":["nosuid","nodev","rbind"],"RW":true,"Propagation":"rprivate"}]

and it looks like with this PR it becomes emtpy.

Are you able to quickly test that with your version? If that succeeds, I'll need to be at a workplace that's better than a moving train 😁

Thanks!

@martinpitt
Copy link
Contributor

With the simple "p1" example that I put above, it's supposed to look like that (you need to click on the disk icon to get the popup):

image

With the integration test case, it should look like this, that pod also has a port mapping (which does work with this PR):

image

@martinpitt
Copy link
Contributor

I tried the above in a F38 VM with the build here, and running commands as root. I. e. system podman, not user (that's also what that test uses):

dnf install https://download.copr.fedorainfracloud.org/results/packit/containers-podman-19888/fedora-38-x86_64/06386739-podman/podman-4.7.0~dev-1.20230908110915905267.pr19888.1707.de31714b6.fc38.x86_64.rpm
podman pod create -v /tmp:/hosttmp p1
systemctl start podman.socket
curl --unix /run/podman/podman.sock http://none/v1.12/libpod/pods/p1/json

And that has no mounts property any more:

"Id":"af07236b92f6fcc9427fd39ac1420fbbfc9b8f5d0f7ae20d1a19a7717587c1c8","Name":"p1","Created":"2023-09-08T12:50:19.293219092Z","CreateCommand":["podman","pod","create","-v","/tmp:/hosttmp","p1"],"ExitPolicy":"continue","State":"Created","Hostname":"","CreateCgroup":true,"CgroupParent":"machine.slice","CgroupPath":"machine.slice/machine-libpod_pod_af07236b92f6fcc9427fd39ac1420fbbfc9b8f5d0f7ae20d1a19a7717587c1c8.slice","CreateInfra":true,"InfraContainerID":"9b365ca8c8ee0b7a0620c4d30f7095e2156b10925ba10625640cf3c420a0b4e1","InfraConfig":{"PortBindings":{},"HostNetwork":false,"StaticIP":"","StaticMAC":"","NoManageResolvConf":false,"DNSServer":null,"DNSSearch":null,"DNSOption":null,"NoManageHosts":false,"HostAdd":null,"Networks":["podman"],"NetworkOptions":null,"pid_ns":"private","userns":"host","uts_ns":"private"},"SharedNamespaces":["ipc","net","uts"],"NumContainers":1,"Containers":[{"Id":"9b365ca8c8ee0b7a0620c4d30f7095e2156b10925ba10625640cf3c420a0b4e1","Name":"af07236b92f6-infra","State":"created"}],"LockNumber":0}

and introspecting the infra container:

curl --unix /run/podman/podman.sock http://none/v1.12/libpod/containers/af07236b92f6-infra/json

has "Mounts":[]

So that's a nice CLI reproducer, and should also be not too hard for your own tests?

Cheers! (I'm really happy that this works as intended!)

move the code to remove the pod cgroup to a separate function.  It is
a preparation for the next patch.

Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
When the pod is stopped, we need to destroy the pod cgroup, otherwise
it is leaked.

Signed-off-by: Giuseppe Scrivano <[email protected]>
accept only the resources to be used by the pod, so that the function
can more easily be used by a successive patch.

Signed-off-by: Giuseppe Scrivano <[email protected]>
do not create the pod cgroup if it already exists.

Signed-off-by: Giuseppe Scrivano <[email protected]>
a pod can use cgroups without an infra container.

Signed-off-by: Giuseppe Scrivano <[email protected]>
This allows to use --share-parent with --infra=false, so that the
containers in the pod can share the parent cgroup.

Signed-off-by: Giuseppe Scrivano <[email protected]>
When the infra container is not created, we can still set limits on
the pod cgroup.

Signed-off-by: Giuseppe Scrivano <[email protected]>
When a container is created and it is part of a pod, we ensure the pod
cgroup exists so limits can be applied on the pod cgroup.

Closes: containers#19175

Signed-off-by: Giuseppe Scrivano <[email protected]>
This test checks that the pod cgroups are created and that the limits
set for a pod cgroup are enforced also after a reboot.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member Author

giuseppe commented Sep 8, 2023

I think the culprit is the following patch:

commit 3aae72966549338cc0fc9658f0c65cd087df50f9
Author: Giuseppe Scrivano <[email protected]>
Date:   Thu Sep 7 23:21:51 2023 +0200

    pod: do not inherit mounts for infra
    
    do not inherit the mounts from containers.conf for the infra
    container.
    
    Signed-off-by: Giuseppe Scrivano <[email protected]>

diff --git a/pkg/specgen/generate/pod_create.go b/pkg/specgen/generate/pod_create.go
index d8759a931..d387d6b0f 100644
--- a/pkg/specgen/generate/pod_create.go
+++ b/pkg/specgen/generate/pod_create.go
@@ -42,6 +42,7 @@ func MakePod(p *entities.PodSpec, rt *libpod.Runtime) (_ *libpod.Pod, finalErr e
                }
                p.PodSpecGen.InfraImage = imageName
                p.PodSpecGen.InfraContainerSpec.RawImageName = imageName
+               p.PodSpecGen.InfraContainerSpec.Mounts = []specs.Mount{}
        }
 
        if !p.PodSpecGen.NoInfra && p.PodSpecGen.InfraContainerSpec != nil {

I'll drop it from the current series

@giuseppe giuseppe marked this pull request as draft September 8, 2023 13:19
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2023
@giuseppe giuseppe marked this pull request as ready for review September 8, 2023 13:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2023
@giuseppe
Copy link
Member Author

giuseppe commented Sep 8, 2023

I've opened a different PR for that issue: #19902

@giuseppe
Copy link
Member Author

giuseppe commented Sep 8, 2023

cockpit is happy again.

@rhatdan
Copy link
Member

rhatdan commented Sep 8, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2023
@martinpitt
Copy link
Contributor

This was very smooth, thanks @giuseppe ! That's exactly how I wanted these to work 🎉

@mheon
Copy link
Member

mheon commented Sep 10, 2023

/hold
CI is stuck (total success is red)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2023
@rhatdan
Copy link
Member

rhatdan commented Sep 10, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2023
@rhatdan rhatdan merged commit 8acd66c into containers:main Sep 11, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 11, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroup is not set: internal libpod error after os reboot.
7 participants